Compare commits

...

5265 Commits

Author SHA1 Message Date
Emil Velikov
7748c3f5eb configure.ac: deprecate --with-egl-platforms over --with-platforms
Currently the former controls more than just EGL. With follow-up commits
we'll unwind and fix things so that one can build the different drivers
with said platform support.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-17 13:37:41 +01:00
Emil Velikov
de128c19ee configure: remove egl platforms check
The configure option is used by more than just EGL and with next commit
we'll rename it accordingly. Thus having the check will (and is atm)
incorrect.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-17 13:13:09 +01:00
Emil Velikov
618a7b984b travis: remove unneeded dri3/present proto requirement
Signed-off-by: Emil Velikov <emil.lvelikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-04-17 13:12:03 +01:00
Emil Velikov
291a9405a5 configure: remove unneeded dri3/present proto requirements
We are not using either of these. The respecive xcb packages are used
instead.

v2: Rebase, reword commit message.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-04-17 13:10:37 +01:00
Kyle Brenneman
ce562f9e3f EGL: Implement the libglvnd interface for EGL (v3)
The new interface mostly just sits on top of the existing library.

The only change to the existing EGL code is to split the client
extension string into platform extensions and everything else. On
non-glvnd builds, eglQueryString will just concatenate the two strings.

The EGL dispatch stubs are all generated. The script is based on the one
used to generate entrypoints in libglvnd itself.

v2: [Kyle]
 - Rebased against master.
 - Reworked the EGL makefile to use separate libraries
 - Made the EGL code generation scripts work with Python 2 and 3.
 - Change gen_egl_dispatch.py to use argparse for the command line arguments.
 - Assorted formatting and style cleanup in the Python scripts.

v3: [Emil Velikov]
 - Rebase
 - Remove separate glvnd glx/egl configure toggles

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-17 13:03:58 +01:00
Tapani Pälli
370df207ca android: add marshal_generated c and h files to generated sources
Fixes: efd63e2 ("mesa: Connect the generated GL command marshalling code to the build.")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-17 12:20:06 +01:00
Emil Velikov
3bcef6aa24 configure.ac: honour --disable-libunwind if the .pc file is present
We should check the presence in order to determine if we should
[implicitly] set the CFLAGS/LIBS

v2: Drop spurious OMX hunk (Eric)

Cc: Eric Anholt <eric@anholt.net>
Reported-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-17 12:05:10 +01:00
Emil Velikov
39c3482205 docs: document the C++14 SWR requirement
Earlier commit bumped the requirement for the SWR driver.

v2: Fold the note with the LLVM 3.9 one (Tim)

Fixes: 3c52a7316a ("swr: [configure.ac/scons] require c++14")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2017-04-17 12:04:22 +01:00
Samuel Pitoiset
84ed2e1192 winsys/amdgpu: init buffer_indices_hashlist with memset()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-17 11:59:17 +02:00
Samuel Pitoiset
af612816bc winsys/amdgpu: simplify amdgpu_cs_add_buffer() a bit
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-17 11:59:17 +02:00
Kenneth Graunke
7c3b8ed878 i965/drm: Delete NULL check in brw_bo_unmap().
I accidentally moved the bo->bufmgr dereference above the NULL check
when cleaning up this code.

While passing NULL to free() is a common pattern...passing NULL to
unmap seems pretty bad.  You really ought to know whether you have
a buffer or not.  We don't want to paper over bugs like that.  So,
just drop the NULL check altogether.

CID: 1405006

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-16 22:58:23 -07:00
Kenneth Graunke
9b71709cb8 intel/decoder: Fix is_header_field starting condition.
Starting positions >= 32 are not part of the header, rather than >.

Caught by Coverity, which found that "bits <<= field->start" may shift
by 32, which has undefined behavior.

CID: 1404968

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-04-16 22:58:23 -07:00
Kenneth Graunke
6142c3e298 i965/drm: Remove dead return in brw_bo_busy()
If ret is 0, we return.  If ret is not 0, we return.  This is dead.

CID: 1405013 (Structurally dead code (UNREACHABLE))

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-16 22:58:22 -07:00
Mauro Rossi
8c79dbe94e android: amd/addrlib: trivial fix for gfx9 support
Fixes the following build error:

external/mesa/src/amd/addrlib/gfx9/gfx9addrlib.cpp:36:10: fatal error: 'gfx9_gb_reg.h' file not found
         ^
1 error generated.

Fixes: 7f160ef "amd/addrlib: import gfx9 support"
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-04-17 14:04:21 +10:00
Jason Ekstrand
4cf079f7f2 nir: Add GLSL_TYPE_[U]INT64 to some switch statements
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-04-16 20:14:42 -07:00
Marek Olšák
2769dadb0f gallium/radeon: always flush asynchronously and wait after begin_new_cs
This hides the overhead of everything in the driver after the CS flush and
before returning from pipe_context::flush.
Only microbenchmarks will benefit.

+2% FPS for glxgears.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-17 01:22:11 +02:00
Marek Olšák
f05f0bb5cb radeonsi: remove local variable 'mod' from si_compile_tgsi_shader
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-17 01:22:11 +02:00
Marek Olšák
bd2cde0c25 radeonsi: add si_shader_selector::vs_needs_prolog
cleanup

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-17 01:22:11 +02:00
Marek Olšák
777f305840 radeonsi: don't set VGT_GS_MODE as part of the GS state
The VS state sets it.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-17 01:22:11 +02:00
Marek Olšák
5438e39fae radeonsi: don't allow user indices with indirect draws
Not possible with GL and it will make future gallium rework easier.
(also it's something I wouldn't like to support)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-17 01:22:11 +02:00
Marek Olšák
1c94d29984 radeonsi: merge two if (indirect) statements
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-17 01:22:11 +02:00
Marek Olšák
bdd6449769 radeonsi: don't mark non-dirty textures with CMASK as compressed
because the compression is skipped with non-dirty textures.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-17 01:22:11 +02:00
Bas Nieuwenhuizen
566f2ed571 docs: Document interaction Fixes tag and stable branches.
For the next time I forget.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-15 11:37:46 +02:00
Timothy Arceri
9f0dd85aa6 glsl: don't run the GLSL pre-processor when we are skipping compilation
This moves the hashing of shader source for the cache lookup to before
the preprocessor.  In our experience, shaders are unlikely to hash the
same after preprocessing if they didn't hash the same before, so we can
skip preprocessing for cache hits.

Improves Deus Ex start-up times with a warm cache from ~30 seconds to
~22 seconds.

Also fixes the leaking of state.

V2: fix indentation

v3: add the value of MESA_EXTENSION_OVERRIDE to the hash of the shader.

Tested-by (v2): Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-04-15 11:36:52 +10:00
Timothy Arceri
c2bc0aa7b1 glsl: delay optimisations on individual shaders when cache is available
Due to a max limit of 65,536 entries on the index table that we use to
decide if we can skip compiling individual shaders, it is very likely
we will have collisions.

To avoid doing too much work when the linked program may be in the
cache this patch delays calling the optimisations until link time.

Improves cold cache start-up times on Deus Ex by ~20 seconds.

When deleting the cache index to simulate a worst case scenario
of collisions in the index, warm cache start-up time improves by
~45 seconds.

V2: fix indentation, make sure to call optimisations on cache
fallback, make sure optimisations get called for XFB.

Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-15 11:36:44 +10:00
Jason Ekstrand
d2d6cf6c83 anv: Add the pci_id into the shader cache UUID
This prevents a user from using a cache created on one hardware
generation on a different one.  Of course, with Intel hardware, this
requires moving their drive from one machine to another but it's still
possible and we should prevent it.

Reviewed-by: Chad Versace <chadversary@chromium.org>
Cc: mesa-stable@lists.freedesktop.org
2017-04-14 17:41:07 -07:00
Philipp Zabel
36f2101723 etnaviv: native fence fd support
This adds native fence fd support to etnaviv, similarly to commit
0b98e84e9b ("freedreno: native fence fd"), enabled for kernel
driver version 1.1 or later.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-04-15 01:47:18 +02:00
Francisco Jerez
96dfc014fd docs: mark GL_ARB_vertex_attrib_64bit and OpenGL 4.2 as supported by i965/gen7+
v2 (Andreas Boll):
- Mark GL 4.1 as supported by i965/gen7+
- Mark GL_ARB_shader_precision as supported by i965/gen7+
- Update release notes

Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 16:13:21 -07:00
Juan A. Suarez Romero
1877982aca i965: enable OpenGL 4.2 in Ivybridge
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 16:13:21 -07:00
Samuel Iglesias Gonsálvez
92d4dc76ea i965: enable ARB_shader_precision in gen7+
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 16:13:21 -07:00
Juan A. Suarez Romero
0aed1212ae i965: enable ARB_vertex_attrib_64bit for gen7+
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 16:13:21 -07:00
George Kyriazis
b9d4256e11 swr: Fix swr osmesa build
Use GALLIUM_SWR to standardize

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-14 18:03:40 -05:00
Wladimir J. van der Laan
6a8d5ab932 etnaviv: SINGLE_BUFFER support on GC3000
This patch adds support for the SINGLE_BUFFER feature on GC3000
GPUs, which allows rendering to a single buffer using multiple pixel
pipes.

This feature is always used when it is available, which means that
multi-tiled formats are no longer being used in that case, and all
buffers will be normal (super)tiled. This mimics the behavior of the
blob on GC3000.

- Because the same format can be used to render to and texture from,
  this avoids an extra resolve pass when rendering to texture.

- i.MX6qp includes a PRE which can scan-out directly from tiled formats,
  avoiding untiling overhead.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-04-15 00:34:13 +02:00
Wladimir J. van der Laan
1dcb1d4925 etnaviv: Update includes from rnndb
Update to etna_viv commit 8486a97.

austriancoder: changed patch to include isa redefinition fix.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-04-15 00:34:08 +02:00
Wladimir J. van der Laan
9e4d049f40 etnaviv: Add chipMinorFeatures4 and 5
Request chipMinorFeatures bitfields 4 and 5 from the
drm driver.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-04-15 00:34:03 +02:00
Philipp Zabel
dda956340c etnaviv: resolve tile status when flushing resource
When passing render buffers from EGL clients to a wayland compositor,
the resource tile status must be resolved because otherwise the tile
status is lost in the transfer and cleared parts of the buffer will
contain old contents.

The same applies when sampling directly from a renderable resource.

lst: Add seqno tracking, to skip flush when not needed.

Fixes: aadcb5e94b35 ("etnaviv: enable TS, but disable autodisable")
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-04-15 00:15:30 +02:00
Philipp Zabel
f30aab7696 etnaviv: stop repeatedly resolving an unchanged resource into its scanout prime buffer
Before resolving a resource into its scanout prime buffer, check that
the prime resource is actually older. If it is not, the resolve is an
expensive no-op, and we better skip it.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-04-15 00:15:27 +02:00
George Kyriazis
d7a1f01db3 swr: Add polygon stipple support
Add polygon stipple functionality to the fragment shader.

Explicitly turn off polygon stipple for lines and points, since we
do them using tris.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-14 17:08:12 -05:00
Samuel Iglesias Gonsálvez
8973ae3162 docs/relnotes: add GL_ARB_gpu_shader_fp64 support on i965/ivybridge
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:10 -07:00
Samuel Iglesias Gonsálvez
ef49dda2df docs: mark GL_ARB_gpu_shader_fp64 and OpenGL 4.0 as supported by i965/gen7+
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:10 -07:00
Samuel Iglesias Gonsálvez
a494afdb8e i965: enable OpenGL 4.0 to Ivybridge/Baytrail
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:10 -07:00
Samuel Iglesias Gonsálvez
cd0a6b2fc2 i965: enable ARB_gpu_shader_fp64 for Ivybridge/Baytrail
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:09 -07:00
Matt Turner
2eeb1b0ad9 i965: Use correct VertStride on align16 instructions.
In commit c35fa7a, we changed the "width" of DF source registers to 2,
which is conceptually fine. Unfortunately a VertStride of 2 is not
allowed by align16 instructions on IVB/BYT, and the regular VertStride
of 4 works fine in any case.

See generated_tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/vs-round-double.shader_test
for example:

cmp.ge.f0(8)    g18<1>DF        g1<0>.xyxyDF    -g8<2>DF        { align16 1Q };
        ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed
cmp.ge.f0(8)    g19<1>DF        g1<0>.xyxyDF    -g9<2>DF        { align16 2N };
        ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed

v2:
- Add spec quote (Curro).
- Change the condition to only BRW_VERTICAL_STRIDE_2 (Curro)

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:09 -07:00
Samuel Iglesias Gonsálvez
d8441e2276 i965/vec4/dce: improve track of partial flag register writes
This is required for correctness in presence of multiple 4-wide flag
writes (e.g. 4-wide instructions with a conditional mod set) which
update a different portion of the same 8-bit flag subregister.

Right now we keep track of flag dataflow with 8-bit granularity and
consider flag writes to have killed any previous definition of the
same subregister even if the write was less than 8 channels wide,
which can cause live flag register updates to be dead
code-eliminated incorrectly.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:09 -07:00
Samuel Iglesias Gonsálvez
c1fc8fad47 i965/vec4: don't do horizontal stride on some register file types
horiz_offset() shouldn't be doing anything for scalar registers,
because all channels of any SIMD instructions will end up reading or
writing the same component of the register, so shifting the register
offset would be wrong.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Re-implement in terms of is_uniform() for
  simplicity.  Pass argument by const reference.  Clarify commit
  message. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:09 -07:00
Matt Turner
21e8e3a848 i965/vec4: Fix exec size for MOVs {SET,PICK}_{HIGH,LOW}_32BIT.
Otherwise for a pack_double_2x32_split opcode, we emit:

   vec1 64 ssa_135 = pack_double_2x32_split ssa_133, ssa_134
mov(8)          g5<1>UD         g5<4>.xUD                       { align16 1Q compacted };
mov(8)          g7<2>UD         g5<4,4,1>UD                     { align1 1Q };
        ERROR: When the destination spans two registers, the source must span two registers
               (exceptions for scalar source and packed-word to packed-dword expansion)
mov(8)          g8<2>UD         g5.4<4,4,1>UD                   { align1 2N };
        ERROR: The offset from the two source registers must be the same
mov(8)          g5<1>UD         g6<4>.xUD                       { align16 1Q compacted };
mov(8)          g7.1<2>UD       g5<4,4,1>UD                     { align1 1Q };
        ERROR: When the destination spans two registers, the source must span two registers
               (exceptions for scalar source and packed-word to packed-dword expansion)
mov(8)          g8.1<2>UD       g5.4<4,4,1>UD                   { align1 2N };
        ERROR: The offset from the two source registers must be the same

The intention was to emit mov(4)s for the instructions that have ERROR
annotations.

See tests/spec/arb_gpu_shader_fp64/execution/vs-isinf-dvec.shader_test
for example.

v2 (Samuel):
- Instead of setting the exec size to a fixed value, don't double it
(Curro).
- Add PICK_{HIGH,LOW}_32BIT to the condition.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Trivial rebase changes. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:09 -07:00
Samuel Iglesias Gonsálvez
f030aaf2fb i965/vec4: use vec4_builder to emit instructions in setup_imm_df()
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Drop useless vec4_visitor dependencies.  Demote to
  static stand-alone function.  Don't write unused components in the
  result.  Use vec4_builder interface for register allocation. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:09 -07:00
Juan A. Suarez Romero
a907c91e93 i965/vec4: consider subregister offset in live variables
Take into account offset values less than a full register (32 bytes)
when getting the var from register.

This is required when dealing with an operation that writes half of the
register (like one d2x in IVB/BYT, which uses exec_size == 4).

v2:
- Take in account this offset < 32 in liveness analysis too (Curro)

v3:
- Change formula in var_from_reg() (Curro)
- Remove useless changes (Curro)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Francisco Jerez
92649a3e67 i965/vec4: fix assert to detect SIMD lowered DF instructions in IVB
On IVB, DF instructions have lowered the SIMD width to 4 but the
exec_size will be later doubled. Fix the assert to avoid crashing in
this case.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Simplify assert.  Except for the 'inst->group % 4
  == 0' part the assertion was redundant with the previous assertion. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez
6e3265eae5 i965/vec4: split VEC4_OPCODE_FROM_DOUBLE into one opcode per destination's type
This way we can set the destination type as double to all these new opcodes,
avoiding any optimizer's confusion that was happening before.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Drop no_spill workaround originally needed due to
  the bogus destination type of VEC4_OPCODE_FROM_DOUBLE. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez
50a5217637 i965/vec4: split d2x conversion and data gathering from one opcode to two explicit ones
When doing a 64-bit to a smaller data type size conversion, the destination should
be aligned to 64-bits. Because of that, we need to gather the data after the
actual conversion.

Until now, these two operations were done by VEC4_OPCODE_FROM_DOUBLE but
now we split them explicitely in two different instructions:
VEC4_OPCODE_FROM_DOUBLE just do the conversion and
VEC4_OPCODE_PICK_LOW_32BIT will gather the data.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Juan A. Suarez Romero
cfaf14a126 i965/vec4: fix VEC4_OPCODE_FROM_DOUBLE for IVB/BYT
In the generator we must generate slightly different code for
Ivybridge/Baytrail, because of the way the stride works in
this hardware.

v2:
- Use stride and don't need to fix dst (Curro)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Juan A. Suarez Romero
be445d3ea3 i965/vec4: keep original type when dealing with null registers
Keep the original type when dealing with null registers. Especially
because we do no want to introduce an implicit conversion between
types that could affect the conditional flags.

This affects especially when the original type is DF, and we are working
on Ivybridge/Baytrail.

v2 (Curro)
- Fix typo.
- Use retype() instead of applying the type directly.
- Remove unneeded retype.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez
a21dc2b500 i965/vec4: split DF instructions and later double its execsize in IVB/BYT
We need to split DF instructions in two on IVB/BYT as it needs an
execsize 8 to process 4 DF values (one GRF in total).

v2:
- Rename helper and make it static inline function (Matt).
- Fix indention and add braces (Matt).

v3:
- Don't edit IR instruction when doubling exec_size (Curro)
- Add comment into the code (Curro).
- Manage ARF registers like the others (Curro)

v4:
- Add get_exec_type() function and use it to calculate the execution
  size.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Fix bogus 'type != BAD_FILE' check.  Take
  destination type as execution type where there is no valid source.
  Assert-fail if the deduced execution type is byte.  Clarify comment
  in get_lowered_simd_width().  Move SIMD width workaround outside of
  'if (...inst->size_written > REG_SIZE)' conditional block, since the
  problem should be independent of whether the amount of data written
  by the instruction is greater or lower than a GRF.  Drop redundant
  is_ivb_df definition.  Drop bogus inst->exec_size < 8 check.
  Simplify channel group assertion. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez
a5399e8b1c i965/fs: lower all non-force_writemask_all DF instructions to SIMD4 on IVB/BYT
The hardware applies the same channel enable signals to both halves of
the compressed instruction which will be just wrong under non-uniform
control flow. Fix this by splitting those instructions to SIMD4.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Francisco Jerez
ebfb703d44 i965/fs: Get 64-bit indirect moves working on IVB.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-04-14 14:56:08 -07:00
Matt Turner
630b84cdc8 i965: Use source region <1,2,0> when converting to DF.
Doing so allows us to use a single MOV in VEC4_OPCODE_TO_DOUBLE instead
of two.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-04-14 14:56:08 -07:00
Juan A. Suarez Romero
3198ce3f96 i965/fs: fix lower SIMD width for IVB/BYT's MOV_INDIRECT
According to the IVB and HSW PRMs:

"2.When the destination requires two registers and the sources are
 indirect, the sources must use 1x1 regioning mode."

So for DF instructions the execution size is not limited by the number
of address registers that are available, but by the EU decompression
logic not handling VxH indirect addressing correctly.

This patch limits the SIMD width to 4 in this case.

v2:
- Fix typo (Matt).
- Fix condition (Curro)

v3:
- Add spec quote (Curro)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Juan A. Suarez Romero
571cbd05eb i965/fs: fix dst stride in IVB/BYT type conversions
When converting a DF to 32-bit conversions, we set dst stride to 2,
to fulfill alignment restrictions because the upper Dword of every
Qword will be written with undefined value.

But in IVB/BYT, this is not necessary, as each DF conversion already
writes 2, the first one the real value, and the second one a 0.
That is, IVB/BYT already set stride = 2 implicitly, so we must set it to
1 explicitly to avoid ending up with stride = 4.

v2:
- Fix typo (Matt)

v3:
- Fix stride in the destination's brw_reg, don't modity IR (Curro)

v4:
- Remove 'is_dst' argument of brw_reg_from_fs_reg() (Curro)
- Fix comment (Curro).
- Relax hstride assert (Curro)

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Minor spelling fixes. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez
af6fc3a8ea i965/fs: rename lower_d2x to lower_conversions
v2:
- Change the name to lower_conversions.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez
dee31311eb Revert "i965/fs: Don't emit SEL instructions for type-converting MOVs."
This reverts commit 7dccd38b40.

d2x pass fixes SEL instructions when there is a type conversion
by doing a SEL without type conversion and then convert the result.
This pass also takes into account the non-uniform control flow.

Then, 7dccd38b40 is not needed anymore.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez
aeecc82d05 i965/fs: generalize the legalization d2x pass
Generalize it to lower any unsupported narrower conversion.

v2 (Curro):
- Add supports_type_conversion()
- Reuse existing intruction instead of cloning it.
- Generalize d2x to narrower and equal size conversions.

v3 (Curro):
- Make supports_type_conversion() const and improve it.
- Use foreach_block_and_inst to process added instructions.
- Simplify code.
- Add assert and improve comments.
- Remove redundant mov.
- Remove useless comment.
- Remove saturate == false assert and add support for saturation
  when fixing the conversion.
- Add get_exec_type() function.

v4 (Curro):
- Use get_exec_type() function to get sources' type.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Matt Turner
94ffeb7fa2 i965: Use <0,2,1> region for scalar DF sources on IVB/BYT.
On HSW+, scalar DF sources can be accessed using the normal <0,1,0>
region, but on IVB and BYT DF regions must be programmed in terms of
floats. A <0,2,1> region accomplishes this.

v2:
- Apply region <0,2,1> in brw_reg_from_fs_reg() (Curro).

v3:
- Added comment explaining the reason (Curro).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez
82d17615f4 i965/fs: clamp exec_size when an instruction has a scalar DF source
Then the SIMD lowering pass will get rid of any compressed instructions with scalar
source (whether force_writemask_all or not) and we avoid hitting the Gen7 region
decompression bug.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Suggested-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Juan A. Suarez Romero
0f1316d4db i965/fs: double regioning parameters and execsize for DF in IVB/BYT
In IVB and BYT, both regioning parameters and execution sizes are measured as
32-bits element size.

So when we have something like:

mov(8) g2<1>DF g3<4,4,1>DF

We are not actually moving 8 doubles (our intention), but 4 doubles.

We need to double the parameters to cope with this issue. However,
horizontal strides don't behave as they're supposed to on IVB
for DF regions, they will cause each 32-bit half of DF sources to be
strided individually, and doubling the value won't make any difference.

v2:
- Use devinfo directly (Matt).
- Use Baytrail instead of Valleview (Matt).
- Use IvyBridge instead of Ivy (Matt)
- Double the exec_size in code emission (Curro)

v3:
- Change hstride doubling by an assert and fix commit log (Curro).
- Substitute remaining compiler->devinfo by devinfo (Curro).

v4:
- Fix comment (Curro).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Juan A. Suarez Romero
79af256388 i965/fs: add helper to retrieve instruction execution type
The execution data size is the biggest type size of any instruction
operand.

We will use it to know if the instruction deals with DF, because in Ivy
we need to double the execution size and regioning parameters.

v2:
- Fix typo in commit log (Matt)
- Use static inline function instead of fs_inst's method (Curro).
- Define the result as a constant (Curro).
- Fix indentation (Matt).
- Add braces to nested control flow (Matt).

v3 (Curro):
- Add get_exec_type() and other auxiliary functions and use them to
  calculate its size.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Fix bogus 'type != BAD_FILE' check.  Fix deduced
  execution type for integer vector types.  Take destination type as
  execution type where there is no valid source.  Assert-fail if the
  deduced execution type is byte.  Move into brw_ir_fs.h header for
  consistency with the VEC4 back-end. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Matt Turner
fd349d29e4 i965: Handle IVB DF differences in the validator.
On IVB/BYT, region parameters and execution size for DF are in terms of
32-bit elements, so they are doubled. For evaluating the validity of an
instruction, we halve them.

v2 (Sam):
- Add comments.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-04-14 14:56:07 -07:00
Iago Toral Quiroga
fbac8b1f94 i965/disasm: also print nibctrl in IVB for execsize=8
4-wide DF operations where NibCtrl applies require and execsize of 8
in IvyBridge/BayTrail.

v2:
- Refactor NibCtrl printing (Matt)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:06 -07:00
Boyan Ding
ff29f488d4 nir: Destination component count of shader_clock intrinsic is 2
This fixes the following error when using ARB_shader_clock on i965:
	vec1 32 ssa_0 = intrinsic shader_clock () () ()
	intrinsic store_var (ssa_0) (clock_retval) (3) /* wrmask=xy */
error: src->ssa->num_components == num_components (nir/nir_validate.c:204)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
2017-04-14 14:54:06 -07:00
Nicolai Hähnle
39f51b5db9 radeonsi: add missing initialization for userptr buffers
Fix the accounting for memory usage of userptr buffers, which has been wrong
forever (or at least for a long time).

Also initialize flags. Without this initialization, the sparse buffer flag
might end up being set, which leads to staging buffers being used unnecessarily
(and incorrectly) in transfers to or from userptr buffers.

This works around VM faults that occur with the radeon kernel module when
running piglit ./bin/amd_pinned_memory decrement-offset map-buffer -auto

Fixes: e077c5fe65 ("gallium/radeon: transfers and invalidation for sparse buffers")
Reported-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-14 23:23:04 +02:00
Fredrik Höglund
c1dd5d0b01 radv: remove the temp descriptor set infrastructure
It is no longer used.

Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-04-14 23:21:24 +02:00
Fredrik Höglund
5ab5d1bee4 radv: use push descriptors in meta
Use push descriptors instead of temp descriptor sets.

Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-04-14 23:21:24 +02:00
Fredrik Höglund
f95caae504 radv: add private push descriptors for meta
This allows meta to use push descriptors without disturbing user
push descriptors.

radv_meta_push_descriptor_set differs from vkCmdPushDescriptorSetKHR
in that partial updates are not supported; all descriptors used in
subsequent draw commands must be pushed at the same time.

Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-04-14 23:21:24 +02:00
Jason Ekstrand
220974b38d anv/blorp: Properly handle VK_ATTACHMENT_UNUSED
The Vulkan driver was originally written under the assumption that
VK_ATTACHMENT_UNUSED was basically just for depth-stencil attachments.
However, the way things fell together, VK_ATTACHMENT_UNUSED can be used
anywhere in the subpass description.  The blorp-based clear and resolve
code has a bunch of places where we walk lists of attachments and we
weren't handling VK_ATTACHMENT_UNUSED everywhere.  This commit should
fix all of them.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
2017-04-14 14:20:42 -07:00
Jason Ekstrand
21d2ca72d8 anv/cmd_buffer: Use the null surface state for ATTACHMENT_UNUSED
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
2017-04-14 14:20:42 -07:00
Jason Ekstrand
02eca8b6f8 anv/cmd_buffer: Always set up a null surface state
We're about to start requiring it in yet another case and calculating
exactly when one is needed is starting to get prohibitively expensive.
A single surface state doesn't take up that much space so we may as well
create one all the time.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
2017-04-14 14:20:42 -07:00
Nicolai Hähnle
d6588d9962 radeonsi: cope with missing disassembly
For robustness and testing purposes.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-14 22:51:07 +02:00
Nicolai Hähnle
d15b1f6e2d gallium/ddebug: dump missing members of pipe_draw_info
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-14 22:50:54 +02:00
Nicolai Hähnle
2ac03e90fb radeonsi: enable ARB_shader_viewport_layer_array
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-04-14 22:50:17 +02:00
Nicolai Hähnle
d5e53f348e radeonsi: handle ignored LAYER and VIEWPORT_INDEX writes
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-04-14 22:50:13 +02:00
Nicolai Hähnle
4127f38bae st/mesa: enable ARB_shader_viewport_layer_array
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-04-14 22:50:09 +02:00
Nicolai Hähnle
f3d2cf6c1f tgsi: clarify TGSI_SEMANTIC_{LAYER,VIEWPORT_INDEX}
Depending on pipe caps they can be writable in all vertex processing
stages, but only the output of the last stage counts.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-04-14 22:50:06 +02:00
Nicolai Hähnle
17f24a9b75 gallium: add PIPE_CAP_TGSI_TES_LAYER_VIEWPORT
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-04-14 22:49:44 +02:00
Nicolai Hähnle
8b5d477aa8 configure.ac: add --enable-sanitize option
Enable code sanitizers by adding -fsanitize=$foo flags for the compiler
and linker.

In addition, this also disables checking for undefined symbols: running
the address sanitizer requires additional symbols which should be provided
by a preloaded libasan.so (preloaded for hooking into malloc & friends
globally), and the undefined symbols check gets tripped up by that.

Running the tests works normally via `make check`, but shows additional
failures with the address sanitizer due to memory leaks that seem to be
mostly leaks in the tests themselves. I believe those failures should
really be fixed. In the mean-time, you can set

export ASAN_OPTIONS=detect_leaks=0

to only check for more serious error types.

v2:
- fail reasonably when an unsupported sanitize flag is given (Eric Engestrom)

Reviewed-by: Bartosz Tomczyk <bartosz.tomczyk86@gmail.com> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-14 22:44:30 +02:00
Jason Ekstrand
e1f6fb8021 anv/cmd_buffer: Flush the VF cache at the top of all primaries
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-04-14 13:35:02 -07:00
Jason Ekstrand
939337e49f anv/blorp: Flush the texture cache in UpdateBuffer
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-04-14 13:35:02 -07:00
Jason Ekstrand
475bab0330 anv: Limit VkDeviceMemory objects to 2GB
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-04-14 13:35:02 -07:00
Jason Ekstrand
4495b917e2 intel/blorp: Add a blorp_emit_dynamic macro
This makes it much easier to throw together a bit of dynamic state.  It
also automatically handles flushing so you don't accidentally forget.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2017-04-14 13:35:02 -07:00
Bruce Cherniak
1832ef6cd9 swr: Enable MSAA in OpenSWR software renderer
This patch enables multisample antialiasing in the OpenSWR software renderer.

MSAA is a proof-of-concept/work-in-progress with bug fixes and performance
on the way.  We wanted to get the changes out now to allow several customers
to begin experimenting with MSAA in a software renderer.  So as not to
impact current customers, MSAA is turned off by default - previous
functionality and performance remain intact.  It is easily enabled via
environment variables, as described below.

It has only been tested with the glx-lib winsys.  The intention is to
enable other state-trackers, both Windows and Linux and more fully support
FBOs.

There are 2 environment variables that affect behavior:

* SWR_MSAA_FORCE_ENABLE - force MSAA on, for apps that are not designed
  for MSAA... Beware, results will vary.  This is mainly for testing.

* SWR_MSAA_MAX_SAMPLE_COUNT - sets maximum supported number of
  samples (1,2,4,8,16), or 0 to disable MSAA altogether.
  (The default is currently 0.)

Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
2017-04-14 15:22:45 -05:00
Bruce Cherniak
91a7f0b3af swr: Removed unnecessary PIPE_BIND flags from swr_is_format_supported
Removed unnecessary and probably wrong PIPE_BIND_SCANOUT and PIPE_BIND_SHARED
flags in favor of check on single PIPE_BIND_DISPLAY_TARGET flag.

Reference llvmpipe change <bee4c7718a3bd57e3d99f0913d9081cd13fe5fd>

Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2017-04-14 15:22:44 -05:00
Bruce Cherniak
97bbb7b6a3 swr: Align swr_context allocation to SIMD alignment.
The context now contains SIMD vectors which must be aligned (specifically
samplePositions in the rastState in the derived state).  Failure to align
can result in segv crash on unaligned memory access in vector
instructions.

Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2017-04-14 15:22:44 -05:00
Tim Rowley
4dcfa83114 swr: update gallium driver docs
v2: add back scons section, mention additional built swr libraries

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-14 15:21:31 -05:00
Grazvydas Ignotas
bffdb434b7 radv: remove irrelevant comment
A leftover from anv.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-04-14 23:16:03 +03:00
Grazvydas Ignotas
1b2fe7ce45 radv: report timestampPeriod correctly
The kernel returns frequency in kHz, so to convert to nanosecond
interval that Vulkan uses the dividend should be 1000000.0 and not
100000.0.

This fixes the GPU graph in DOOM and matches the amdgpu-pro blob.

Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-04-14 23:15:55 +03:00
Rob Clark
9fc3e7137a nir/print: add compute shader info
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2017-04-14 12:46:12 -04:00
Rob Clark
16d493f1e7 gallium/docs: small correction about register files for atomics
These can operate on MEMORY[], in addition to BUFFER[] and IMAGE[]

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-14 12:46:12 -04:00
Rob Clark
0b613c20aa freedreno: enable draw/batch reordering by default
Probably should have flipped the switch a long time ago, since it
doesn't seem to cause any problems and is a nice perf boost in a number
of cases.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-14 12:46:12 -04:00
Rob Clark
b5cc88af5e freedreno/ir3: small re-order
Small re-order of switch statement to handled op-code categories in
order.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-14 12:46:12 -04:00
Rob Clark
75afd2586f freedreno/ir3: move 'keeps' to block level
For things like SSBOs and atomics we'll want to track this at a block
level.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-14 12:46:12 -04:00
Rob Clark
331bd3b5e1 freedreno/ir3: convert dynamic arrays to ralloc
Want to move one of these under ir3_block, so that gives a reason to
migrate the remaining malloc/realloc to ralloc.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-14 12:46:12 -04:00
George Kyriazis
870760e02e swr: add linux to scons build
Make swr compile for both linux and windows.

Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2017-04-14 10:59:46 -05:00
Bas Nieuwenhuizen
e20eb91e2b radv: make sizes & offsets 32 bit in radv_descriptor_update_template_entry.
v2: Also convert the calculations.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
2017-04-14 14:14:07 +02:00
Kenneth Graunke
7c83d44d54 docs: Update MESA_shader_integer_functions spec to version 3.
When publishing this spec on the OpenGL ES registry, Jon Leech noticed
that it didn't actually mention what the ES dependencies and
interactions were.  I looked at extensions_table.h and noted that we
expose it in ES 3.0 contexts, and he added the obvious spec texts.

The updated copy also contains our official extension number.

https://github.com/KhronosGroup/OpenGL-Registry/issues/3

Acked-by: Matt Turner <mattst88@gmail.com>
2017-04-13 23:01:27 -07:00
Bas Nieuwenhuizen
17a75b4da4 radv: Set descriptor set limits.
Properly and with comments this time.

Signed-off-by: Bas Nieuwenhuizen <bansi@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-04-13 22:55:11 +02:00
Bas Nieuwenhuizen
24ccf1a8b6 radv: Increase integer sizes in descriptor sets.
Needed if we want to allow them taking more than 64 KiB. The calculations
of these already used 32 bits.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-04-13 22:55:11 +02:00
Dave Airlie
58dd57cb94 radv: support S8_UINT as a depth/stencil format.
This enables a bunch of NotSupported CTS tests.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-14 05:49:25 +10:00
Dave Airlie
16b2dc0ca1 radv: bump maxGeometryShaderInvocations.
This bumps it to the same level as amdgpu-pro, it also
moves a bunch of dEQP-VK.geometry.instanced.* from
NotSupported to Pass.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-14 05:49:14 +10:00
Axel Davy
442780ea37 st/nine: Fix support for ps 1.4 dw and dz modifiers
RCP was used incorrectly to support NINED3DSPSM_DW and
NINED3DSPSM_DZ. src.x was used as input instead of src.w
or src.z.

Fixes: https://github.com/iXit/Mesa-3D/issues/271

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-04-13 20:05:03 +02:00
Jan Vesely
d8ffe4d0ce clover: Add missing include to compat header
Fixes build failure with LLVM 4

Fixes: a981e68c26
	(clover: Fix build against clang SVN >= r299965)

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-13 13:34:21 -04:00
Nicolai Hähnle
b52721e3b6 gallium/radeon: never use staging buffers with AMD_pinned_memory
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-04-13 17:36:26 +02:00
Nicolai Hähnle
4f7e3fbb50 radeonsi: fix gl_BaseVertex in non-indexed draws
gl_BaseVertex is supposed to be 0 in non-indexed draws. Unfortunately, the
way they're implemented, the VGT always generates indices starting at 0,
and the VS prolog adds the start index.

There's a VGT_INDX_OFFSET register which causes the VGT to start at a
driver-defined index. However, this register cannot be written from
indirect draws.

So fix this unlikely case by setting a bit to tell the VS whether the
draw is indexed or not, so that gl_BaseVertex can be adjusted accordingly
when used.

Fixes a bug in
KHR-GL45.shader_draw_parameters_tests.ShaderMultiDrawArraysParameters.*

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-13 17:31:11 +02:00
Nicolai Hähnle
472c84d1ad radeonsi: provide VS_STATE input to all VS variants
v2: fix incorrect change in get_tcs_out_patch_stride

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-13 17:30:20 +02:00
Nicolai Hähnle
3b9fbcb3b6 radeonsi: change the bit-packing of LS out/TCS in data
Avoid conflicts when merging various VS state bits.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-13 17:30:19 +02:00
Nicolai Hähnle
ff39f0d59c radeonsi: emit VS_STATE register explicitly from si_draw_vbo
We will merge other derived state information into this register.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-13 17:30:18 +02:00
Nicolai Hähnle
8c224d3d9f radeonsi: extract derived tess state emit to higher level
Especially with subsequent changes, this makes it easier to see the
sequence of state emits at the higher level.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-13 17:30:17 +02:00
Nicolai Hähnle
215ceb37b9 radeonsi: drop support for TGSI_SEMANTIC_VERTEXID_NOBASE
It is unused.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-13 17:30:11 +02:00
Bas Nieuwenhuizen
4f7fb25d4e radv: Add more trace points.
Most trace points happen after an operation, so add a trace point
at the start of the command buffer.

Furthermore, add one after a CmdUpdateBuffer using CP_DMA as that
didn't emit one yet.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-04-13 16:06:47 +02:00
Bas Nieuwenhuizen
8a535a8bc0 radv: Ignore CmdUpdateBuffer with size 0.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-04-13 16:06:34 +02:00
Bas Nieuwenhuizen
04c7452d0c radv: Enable query inheritance.
timestamp and pipeline_statistics only do something on begin & end,
so they don't need any action.

Occlusion queries only do something to enable/disable and that
register is set nowhere else so that doesn't need extra support either.
(We technically should fix it to update the reg with the number of
 samples, but that hasn't happened yet, so we only change it to
 enable/disable counting)

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-04-13 16:04:27 +02:00
Bas Nieuwenhuizen
c3f38c8968 radv: enable variableMultisampleRate.
This is only relevant with 0 attachments. In that case we do nothing
on subpass switch already, and the pipeline is the authoritative
source of the number of samples, so this shouldn't change anything.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-04-13 15:48:14 +02:00
Edmondo Tommasina
5589fd89e1 gallium/hud: set the dump file streams to line buffered
Flush the HUD value streams to the dump files after every newline.

v2: check that fopen succeeded  (Julien)

Reviewed-and-Tested-by: Julien Isorce <jisorce@oblong.com>
2017-04-13 12:38:49 +01:00
Dave Airlie
01d0c5a922 radv: fix stencil regression since new addrlib import
The addrlib import meant we'd return after we attempted
to setup the no stencil bits for an S8_UINT, now we break
and use the stencil level info when creating stencil DB
info.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-13 20:32:03 +10:00
Dave Airlie
4bcebe10ca radv: allocate thin textures as linear.
This is ported from radeonsi, and avoids the bug in the
addrlib code. This should probably be something addrlib
does for us, but for now this fixes the regression without
changing addrlib and aligns us with radeonsi.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-13 20:31:38 +10:00
Samuel Pitoiset
9ced105a52 i965: add missing ir_unop_*/ir_binop_* in visit_leave()
Fixes the following Clang warnings.

brw_fs_channel_expressions.cpp:219:12: warning: enumeration values 'ir_unop_ballot', 'ir_unop_read_first_invocation', and 'ir_binop_read_invocation' not handled in switch [-Wswitch]
   switch (expr->operation) {
           ^
1 warning generated.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-04-13 10:06:07 +02:00
Samuel Pitoiset
b6b566b48e st/mesa: fix wrong comparison in update_framebuffer_state()
state_tracker/st_atom_framebuffer.c:208:27: warning: comparison of constant 4294967295 with expression of type 'uint16_t' (aka 'unsigned short') is always false [-Wtautological-constant-out-of-range-compare]
   if (framebuffer->width == UINT_MAX)
       ~~~~~~~~~~~~~~~~~~ ^  ~~~~~~~~
state_tracker/st_atom_framebuffer.c:210:28: warning: comparison of constant 4294967295 with expression of type 'uint16_t' (aka 'unsigned short') is always false [-Wtautological-constant-out-of-range-compare]
   if (framebuffer->height == UINT_MAX)
       ~~~~~~~~~~~~~~~~~~~ ^  ~~~~~~~~
2 warnings generated.

Fixes: eb0fd0e5f8 ("gallium: decrease the size of pipe_framebuffer_state - 96 -> 80 bytes")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-13 10:06:06 +02:00
Samuel Pitoiset
a18bd1373b radeon: fix duplicate 'const' specifier
Fixes the following Clang warning.

In file included from radeon_debug.c:32:
./radeon_common_context.h:500:19: warning: duplicate 'const' declaration specifier [-Wduplicate-decl-specifier]
extern const char const *radeonVendorString;

v2: - do not remove the duplicate 'const' qualifier, fix it

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2017-04-13 10:06:06 +02:00
Samuel Pitoiset
ede273458c svga: remove unused vmw_dri1_intersect_src_bbox()
Fixes the following Clang warning.

vmw_screen_dri.c:130:1: warning: unused function 'vmw_dri1_intersect_src_bbox' [-Wunused-function]
vmw_dri1_intersect_src_bbox(struct drm_clip_rect *dst,
^
1 warning generated.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-13 10:06:05 +02:00
Samuel Pitoiset
fbe2ff7740 llvmpipe: remove unused subpixel_snap() and fixed_to_float()
Fixes the following Clang warnings.

lp_setup_tri.c:55:1: warning: unused function 'subpixel_snap' [-Wunused-function]
subpixel_snap(float a)
^
lp_setup_tri.c:61:1: warning: unused function 'fixed_to_float' [-Wunused-function]
fixed_to_float(int a)
^

v2: - do not remove subpixel_snap() (use !PIPE_ARCH_SSE instead)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-04-13 10:06:05 +02:00
Samuel Pitoiset
12647533fa softpipe: remove unused sp_exec_fragment_shader()
Fixes the following Clang warning.

sp_fs_exec.c:56:1: warning: unused function 'sp_exec_fragment_shader' [-Wunused-function]
sp_exec_fragment_shader(const struct sp_fragment_shader_variant *var)
^
1 warning generated.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-13 10:06:04 +02:00
Samuel Pitoiset
5fbe99ce9f softpipe: remove unused quad_shade_stage()
Fixes the following Clang warning.

sp_quad_fs.c:60:1: warning: unused function 'quad_shade_stage' [-Wunused-function]
quad_shade_stage(struct quad_stage *qs)
^
1 warning generated.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-13 10:06:04 +02:00
Samuel Pitoiset
b885488c22 softpipe: remove unused get_texel_quad_2d()
Fixes the following Clang warning.

sp_tex_sample.c:802:1: warning: unused function 'get_texel_quad_2d' [-Wunused-function]
get_texel_quad_2d(const struct sp_sampler_view *sp_sview,
^
  CC       sp_tile_cache.lo
1 warning generated.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-13 10:06:04 +02:00
Samuel Pitoiset
81ba57f463 trace: remove some unused trace_dump_tag*() functions
Fixes the following Clang warnings.

tr_dump.c:137:1: warning: unused function 'trace_dump_tag' [-Wunused-function]
trace_dump_tag(const char *name)
^
tr_dump.c:168:1: warning: unused function 'trace_dump_tag_begin2' [-Wunused-function]
trace_dump_tag_begin2(const char *name,
^
tr_dump.c:187:1: warning: unused function 'trace_dump_tag_begin3' [-Wunused-function]
trace_dump_tag_begin3(const char *name,
^
  CC       tr_texture.lo
3 warnings generated.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-13 10:06:04 +02:00
Samuel Pitoiset
c53a120a46 draw: remove unused wideline_stage()
Fixes the following Clang warning.

draw/draw_pipe_wide_line.c:48:38: warning: unused function 'wideline_stage' [-Wunused-function]
static inline struct wideline_stage *wideline_stage( struct draw_stage *stage )
                                     ^
1 warning generated.

v2: - remove commented code (Roland Scheidegger)
v3: - remove half_line_width in the struct

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-04-13 10:05:59 +02:00
Samuel Pitoiset
4dfe38aa9c draw: remove unused overflow()
Fixes the following Clang warning.

draw/draw_pipe_vbuf.c:102:1: warning: unused function 'overflow' [-Wunused-function]
overflow( void *map, void *ptr, unsigned bytes, unsigned bufsz )
^
1 warning generated.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-13 09:58:52 +02:00
Samuel Pitoiset
18844005ec mesa: remove some unused functions in the perf monitor area
Fixes the following Clang warnings.

main/performance_monitor.c:157:1: warning: unused function 'index_to_queryid' [-Wunused-function]
index_to_queryid(GLuint index)
^
main/performance_monitor.c:163:1: warning: unused function 'queryid_valid' [-Wunused-function]
queryid_valid(const struct gl_context *ctx, GLuint queryid)
^
main/performance_monitor.c:169:1: warning: unused function 'counterid_to_index' [-Wunused-function]
counterid_to_index(GLuint counterid)
^
3 warnings generated.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-13 09:58:24 +02:00
Samuel Pitoiset
df2dba558c mesa: remove unused clamp_float_to_uint() and clamp_half_to_uint()
Fixes the following Clang warnings.

main/pack.c:470:1: warning: unused function 'clamp_float_to_uint' [-Wunused-function]
clamp_float_to_uint(GLfloat f)
^
main/pack.c:477:1: warning: unused function 'clamp_half_to_uint' [-Wunused-function]
clamp_half_to_uint(GLhalfARB h)
^
2 warnings generated.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-13 09:58:24 +02:00
Samuel Pitoiset
bdb53e240b mesa: remove unused _mesa_unmarshal_BindBufferBase()
Fixes the following Clang warning.

main/marshal.c:209:1: warning: unused function '_mesa_unmarshal_BindBufferBase' [-Wunused-function]
_mesa_unmarshal_BindBufferBase(struct gl_context *ctx, const struct marshal_cmd_BindBufferBase *cmd)
^
1 warning generated.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-04-13 09:58:19 +02:00
Samuel Pitoiset
b3375800d7 virgl: add missing PIPE_CAP_DOUBLES
Fixes the following Clang warning.

virgl_screen.c:60:12: warning: enumeration value 'PIPE_CAP_DOUBLES' not handled in switch [-Wswitch]
   switch (param) {
           ^
1 warning generated.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-13 09:58:05 +02:00
Samuel Pitoiset
d5cd4990cd glsl: simplify apply_image_qualifier_to_variable()
This removes one level of indentation and will improve readability
for bindless images.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-04-13 09:52:55 +02:00
Samuel Pitoiset
6bb0f75bb6 glsl: add validate_fragment_flat_interpolation_input()
Requested by Timothy Arceri.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-04-13 09:52:48 +02:00
Boyan Ding
d02829c94e nvc0: Enable ARB_shader_ballot on Kepler+
readInvocationARB() and readFirstInvocationARB() need SHFL.IDX
instruction which is introduced in Kepler.

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-04-13 02:25:17 -04:00
Boyan Ding
59f6aa8096 nvc0/ir: Implement TGSI_OPCODE_BALLOT and TGSI_OPCODE_READ_*
v2: Check if each channel is masked in TGSI_OPCODE_BALLOT (Ilia Mirkin)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-04-13 02:25:14 -04:00
Boyan Ding
48d00779d0 nvc0/ir: Implement TGSI_SEMANTIC_SUBGROUP_*
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-04-13 02:25:08 -04:00
Boyan Ding
f7787f224f nvc0/ir: Add SV_LANEMASK_* system values.
v2: Add name strings in nv50_ir_print.cpp (Ilia Mirkin)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-04-13 02:25:04 -04:00
Boyan Ding
2a3c4c6bc3 nvc0/ir: Allow 0/1 immediate value as source of OP_VOTE
Implementation of readFirstInvocationARB() on nvidia hardware needs a
ballotARB(true) used to decide the first active thread. This expressed
in gm107 asm as (supposing output is $r0):
	vote any $r0 0x1 0x1

To model the always true input, which corresponds to the second 0x1
above, we make OP_VOTE accept immediate value 0/1 and emit "0x1" and
"not 0x1" in the src field respectively.

v2: Make sure that asImm() is not NULL (Samuel Pitoiset)

v3: (Ilia Mirkin)
Make the handling more symmetric with predicate version in gm107
Use i->getSrc(s)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-04-13 02:24:59 -04:00
Boyan Ding
f1252996f5 gk110/ir: Emit OP_SHFL
v2: Make sure that asImm() is not NULL (Samuel Pitoiset)

v3: Check the range of immediate in OP_SHFL (Ilia Mirkin)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-04-13 02:24:55 -04:00
Boyan Ding
c32e150008 nvc0/ir: Emit OP_SHFL
v2: (Samuel Pitoiset)
Add an assertion to check if the target is Kepler
Make sure that asImm() is not NULL

v3: (Ilia Mirkin)
Check the range of immediate value of OP_SHFL
Use the new setPDSTL API

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-04-13 02:24:52 -04:00
Boyan Ding
d941ef3829 nvc0/ir: Properly handle a "split form" of predicate destination
GF100's ISA encoding has a weird form of predicate destination where its
3 bits are split across whole the instruction. Use a dedicated setPDSTL
function instead of original defId which is incorrect in this case.

v2: (Ilia Mirkin)
Change API of setPDSTL() to handle cases of no output
Fix setting of the highest bit in setPDSTL()

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-04-13 02:24:47 -04:00
Boyan Ding
854554c314 gm107/ir: Emit third src 'bound' and optional predicate output of SHFL
v2: Emit the original hard-coded 0x1c03 when OP_SHFL is used in gm107's
    lowering (Samuel Pitoiset)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-04-13 02:24:30 -04:00
Michel Dänzer
a981e68c26 clover: Fix build against clang SVN >= r299965
clang::LangAS::Offset is gone, the behaviour is as if it was 0.

v2: Introduce and use clover::llvm::compat::lang_as_offset (Francisco
    Jerez)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-13 12:51:24 +09:00
Brian Paul
46f49d6fdc st/mesa: add some _mesa_is_winsys_fbo() assertions
A few functions related to FBOs/renderbuffers should only be used with
window-system buffers, not user-created FBOs.  Assert for that.
Add additional comments.  No piglit regressions.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-12 21:13:23 -06:00
Brian Paul
c36d224921 st/mesa: minor optimization in st_DrawBuffers()
We only do on-demand renderbuffer allocation for window-system FBOs,
not user-created FBOs.  So put the loop inside a conditional.

Plus, add some comments.  No piglit regressions.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-12 21:13:23 -06:00
Timothy Arceri
fbcd709a34 mesa/st: only update samplers for stages that have changed
Might help reduce cpu for some apps that use sso.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-13 12:08:31 +10:00
Vinson Lee
f30f575e7b st/mesa: Fix missing-braces warning.
CXX      state_tracker/st_glsl_to_nir.lo
state_tracker/st_glsl_to_nir.cpp:250:57: warning: suggest braces around initialization of subobject [-Wmissing-braces]
      nir_lower_wpos_ytransform_options wpos_options = {0};
                                                        ^
                                                        {}

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-04-12 15:43:30 -07:00
Alex Smith
4603bea1aa radv: Disable primitive restart for non-indexed draws
According to the Vulkan spec, VkPipelineInputAssemblyStateCreateInfo's
primitiveRestartEnable flag should only apply to indexed draws, however
it was being enabled regardless of the type of draw. This could cause
problems for non-indexed draws with >=65535 vertices if the previous
indexed draw used 16-bit indices.

Fixes corruption of the credits text in Mad Max.

v2: Reset primitive restart state after executing a secondary command
    buffer.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-04-12 20:58:41 +02:00
Matt Turner
ab18578b03 anv: Only define wsi_cbs when VK_USE_PLATFORM_WAYLAND_KHR defined 2017-04-12 11:00:39 -07:00
Marek Olšák
f7b1371d2d Revert "r600g: get rid of dummy pixel shader"
This reverts commit 61e47d92c5.

It causes a hang on RS780.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100663
2017-04-12 17:46:21 +02:00
Bartosz Tomczyk
bb847e78cf mesa: fix memory leak in arb_fragment_program
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-04-12 17:50:36 +10:00
Bas Nieuwenhuizen
c4d43388c0 radv: Hash the immutable samplers.
Since the shader code can include them.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-04-12 07:43:38 +02:00
Bas Nieuwenhuizen
bd91caf863 radv: Use an offset instead of pointers for immutable samplers.
Makes more sense when we hash the layout for the pipeline cache.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-04-12 07:43:25 +02:00
Bas Nieuwenhuizen
b35b5951fc radv: Stop shadowing the result in radv_GetQueryPoolResults.
The outer result was referred to, which meant bugs.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-04-12 07:38:58 +02:00
Bas Nieuwenhuizen
0763453291 radv: Return VK_NOT_READY if the query results are not available.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Fixes: 8475a14302 ("radv: Implement pipeline statistics queries.")
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
2017-04-12 07:38:58 +02:00
Bas Nieuwenhuizen
2dacb727c2 radv: Set query availability bit even if we don't wait.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Fixes: 8475a14302 ("radv: Implement pipeline statistics queries.")
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
2017-04-12 07:38:58 +02:00
Gregory Hainaut
03d1de387e mesa: avoid NULL ptr in prog parameter name
Context: _mesa_add_parameter is sometimes[0] called with a
NULL name as a mean of an unnamed parameter.

Allowing NULL pointer as a name means that it must be NULL checked
each access. So far it isn't always[1] true.

Parameter name is only used for debug purpose (printf) and
to lookup the index/location of the program by the application.

Conclusion, there is no valid reason to use a NULL pointer instead of
an empty string. So it was decided to use an empty string which avoid all
issues related to NULL pointer

[0]: texture gather offsets glsl opcode and st_init_atifs_prog
[1]: at least shader cache, st_nir_lookup_parameter_index and some printfs

Issue found by piglit 'texturegatheroffsets' tests on Nouveau

v4: new patch based on Nicolai/Timothy/ilia discussion
Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-04-12 14:30:28 +10:00
Kenneth Graunke
754b961f38 i965/drm: Use bools for a few flags.
These one bit values are booleans.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-11 21:07:45 -07:00
Kenneth Graunke
44ecbbebe2 i965/drm: Make brw_bo_alloc_tiled flags parameter 32-bit.
unsigned long is a terrible type for a bitfield - if you need fewer
than 32 bits, it wastes 4 bytes.  If you need more, things break on
32-bit builds.  Just use unsigned.

Even that's a bit ridiculous as we only have one flag today.
Still, it's at least somewhat better.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-11 21:07:45 -07:00
Kenneth Graunke
f374b9449e i965/drm: Make BO size a uint64_t rather than unsigned long.
The drm_i915_gem_create ioctl structure uses a __u64 for the size,
so we should probably use uint64_t to match.  In theory, we could
probably have a BO larger than 4GB, using a 48-bit PPGTT - it just
wouldn't be mappable in the CPU's 32-bit address space.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-11 21:07:45 -07:00
Kenneth Graunke
c85d6832fd i965/drm: Make alignment parameter a uint64_t.
Theoretically, with a 48-bit address space, we could have buffers
with an alignment of >= 4GB.  It's a bit silly, but the exec_object
structs (drm_i915_gem_exec_object2) use a __u64 for this, so we may
as well use the same type as the kernel API.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-11 21:07:45 -07:00
Kenneth Graunke
444ab8126d i965/drm: Make stride/pitch a uint32_t.
struct drm_i915_gem_set_tiling's stride field is a __u32.
intel_mipmap_tree::stride is a uint32_t.  Using unsigned long just
doesn't make sense.  Switching also lets us drop many pointless
locals that only existed to deal with the type mismatch.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-11 21:07:45 -07:00
Kenneth Graunke
14fc188460 i965/drm: Fix types for pwrite/pread fields.
The ioctl structs contain __u64 offset and size fields, so make them
uint64_t rather than unsigned long.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-11 21:07:45 -07:00
Kenneth Graunke
193601311c i965/drm: Make brw_bo_alloc_tiled take tiling by value, not pointer.
For some reason we passed tiling by pointer, through several layers,
even though the functions only read the initial value, and never
actually change it.  We even had a do-while loop that executed until
the tiling mode matched - except it always did, so it only ran once.
We then had bogus error handling in case it changed the tiling mode
to something nonsensical...which it never did.

Drop all this nonsense.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-11 21:07:45 -07:00
Timothy Arceri
9bd7184078 mesa/st: remove _mesa_get_fallback_texture() calls
These calls look like leftover from fallback texture support first
being added to the st in 8f6d9e12be and then later being added
to core mesa in 00e203fe17.

The piglit test fp-incomplete-tex continues to work with this
change.

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-12 12:00:35 +10:00
Timothy Arceri
c72170fb1f mesa: use pre_hashed version of search for the mesa hash table
The key is just an unsigned int so there is never any real hashing
done.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-04-12 12:00:35 +10:00
Tim Rowley
d0f381f865 swr: [rasterizer core] Disable 8x2 tile backend
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-11 18:01:03 -05:00
Tim Rowley
31a23a9d9d swr: [rasterizer common] Add _simd_testz_si alias
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-11 18:01:03 -05:00
Tim Rowley
7abd1f9b24 swr: [rasterizer archrast] Fix archrast for MSVC 2017 compiler
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-11 18:01:03 -05:00
Tim Rowley
54d11b3c95 swr: [rasterizer jitter] Remove unused function
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-11 18:01:03 -05:00
Tim Rowley
af909c0200 swr: [rasterizer jitter] Remove HAVE_LLVM tests supporting llvm < 3.8
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-11 18:01:03 -05:00
Tim Rowley
973d38801d swr: [rasterizer common/core] Fix 32-bit windows build
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-11 18:01:03 -05:00
Tim Rowley
217b791a44 swr: [rasterizer core] Fix unused variable warnings
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-11 18:01:03 -05:00
Tim Rowley
da7aa39f93 swr: [rasterizer core] Code formating change
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-11 18:01:03 -05:00
Tim Rowley
c8cc07ca25 swr: [rasterizer core] SIMD16 Frontend WIP - PA
Fix PA NextPrim for SIMD8 on SIMD16.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-11 18:01:03 -05:00
Tim Rowley
08a7136848 swr: [rasterizer core] SIMD16 Frontend WIP - Clipper
Implement widened clipper for SIMD16.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-11 18:01:03 -05:00
Tim Rowley
0033e86b2c swr: [rasterizer core] Multisample sample position setup change
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-11 18:01:03 -05:00
Tim Rowley
4c093869db swr: [rasterizer core] Reduce templates to speed compile
Quick patch to remove some unused template params to cut down
rasterizer compile time.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-11 18:01:03 -05:00
Francisco Jerez
147e71242c i965/fs: Take into account lower frequency of conditional blocks in spilling cost heuristic.
The individual branches of an if/else/endif construct will be executed
some unknown number of times between 0 and 1 relative to the parent
block.  Use some factor in between as weight while approximating the
cost of spill/fill instructions within a conditional if-else branch.
This favors spilling registers used within conditional branches which
are likely to be executed less frequently than registers used at the
top level.

Improves the framerate of the SynMark2 OglCSDof benchmark by ~1.9x on
my SKL GT4e.  Should have a comparable effect on other platforms.  No
significant regressions.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-11 15:28:54 -07:00
Tim Rowley
9a7b257450 swr: return true for PIPE_CAP_DOUBLES
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-04-11 13:16:43 -05:00
Kenneth Graunke
02ccd8f52c i965: Set kernel features before computing max GL version.
We check these bitfields when computing the Haswell max GL version.
We need to set them ahead of time, or they won't exist, and all our
checks will fail.  That sets the max core profile GL version to 4.2.

This introduces the bizarre situation where asking for a GL context
with version 4.3+ fails, but asking for a GL core profile context
with version <= 4.2 actually promotes you a 4.5 context.

GLX_MESA_query_renderer also reported the bogus 4.2 value.
Now it shows 4.5.

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reported-and-tested-by: Rafael Ristovski <rafael.ristovski@gmail.com>
2017-04-11 08:58:16 -07:00
Juan A. Suarez Romero
8d7a82ae32 anv: remove needless VALGRIND_MAKE_MEM_DEFINED
This is already invoked in the following VG_NOACCESS_READ() call.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-11 17:21:57 +02:00
Lucas Stach
4ee7c2c284 etnaviv: enable TS, but disable autodisable
Autodisable seems to cause missed rendering in some cases, but
otherwise TS seems to work properly.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
2017-04-11 16:52:31 +02:00
Lucas Stach
797890bbbd etnaviv: enable TS also on sampler resources
Fixes a performance issue with imported winsys buffers as those are
marked with binding sampler view.

This might require a TS flush on single pipe chips that directly
sample from the rendered buffer, but otherwise seems to work fine.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
2017-04-11 16:52:27 +02:00
Lucas Stach
52f6c8cc31 etnaviv: align TS surface size to number of pixel pipes
The TS surface gets cleared by a tiled RS fill. If the chip has
more than 1 pixel pipe the size of the TS surface needs to be
aligned so that each pipe address matches a tile start, otherwise
the RS will hang.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
2017-04-11 16:52:22 +02:00
Lucas Stach
37622ecc79 etnaviv: avoid using invalid TS
The TS is only valid after it has been initialized by a fast
clear, so it should not be taken into account when blitting
resources that haven't been cleared. Also the blit itself
invalidates the destination TS, as it's not updated and will
retain data from the previous rendering after the blit.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
2017-04-11 16:52:01 +02:00
Samuel Pitoiset
768f81b62b glsl: use the BA1 macro for textureQueryLevels()
For both consistency and new bindless sampler types.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-04-11 10:24:57 +02:00
Samuel Pitoiset
981ba1c89b glsl: use the BA1 macro for textureSamples()
For both consistency and new bindless sampler types.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-04-11 10:24:54 +02:00
Samuel Pitoiset
29082b0b22 glsl: use the BA1 macro for textureCubeArrayShadow()
For both consistency and new bindless sampler types.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-04-11 10:24:51 +02:00
Bas Nieuwenhuizen
8475a14302 radv: Implement pipeline statistics queries.
The devil is in the shader again, otherwise this is
fairly straightforward.

The CTS contains no pipeline statistics copy to buffer
testcases, so I did a basic smoketest.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-04-11 09:33:17 +02:00
Bas Nieuwenhuizen
d2906bc72d radv: Let count be dynamic in radv_break_on_count.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-04-11 09:33:17 +02:00
Bas Nieuwenhuizen
8473193760 radv: Rename query pipeline/set layout.
For using them with both occlusion and pipeline statistics queries.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-04-11 09:33:17 +02:00
Bas Nieuwenhuizen
95743d5b88 radv: Use VK_WHOLE_SIZE for the query buffer bindings.
The buffer sizes are specified just a few lines earlier, so don't
repeat ourselves.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-04-11 09:33:17 +02:00
Bas Nieuwenhuizen
8911dd6d12 radv: Use a shader for occlusion CmdCopyQueryPoolResults.
Use the new occlusion query copy shader.

We don't use the shader for the waiting as a polling loop ineracts badly
with having caching enabled. I noticed on my GPU (Tonga) that the values
are written out in order, so I just use a WAIT_REG_MEM on the last value.

If it turns out other chips don't do that we may need to look a bit more
into this. Having 8 WAIT_REG_MEM packets per query doesn't sound ideal.

This also restricts the availability word in the pool to timestamp queries
only, as occlusion queries don't use it, and pipeline statistic queries
likely won't either.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-04-11 09:33:17 +02:00
Bas Nieuwenhuizen
ce0c8cf941 radv: Add occlusion query shader.
Adds a shader for writing occlusion query results to a buffer, as the
CP packet isn't support on SI or secondary buffers, and doesn't handle
the availability bit (or partial results) nor truncation to 32-bit.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-04-11 09:33:17 +02:00
Kenneth Graunke
50b987c0f0 i965: Fix wonky indentation left by brw_bo_alloc_tiled rename. 2017-04-10 23:25:13 -07:00
Ilia Mirkin
d9cc58d6ec nouveau: when mapping a persistent buffer, synchronize on former xfers
If the buffer is being used, we should wait for those uses to be
complete before returning the map.

Fixes: GL45-CTS.direct_state_access.buffers_functional
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2017-04-11 00:13:55 -04:00
Ilia Mirkin
8036809799 nvc0: increase texture buffer object alignment to 256 for pre-GM107
We currently don't pass the low byte of the address via the surface
info, so in order to work with images, these have to implicitly be
aligned to 256. The proprietary driver also doesn't go out of its way to
provide lower alignment.

Fixes GL45-CTS.texture_buffer.texture_buffer_texture_buffer_range

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-04-11 00:13:55 -04:00
Timothy Arceri
8ffd54fef8 mesa: fix typo and add assert() to _mesa_attach_renderbuffer_without_ref()
This function should only be used with a "freshly created" renderbuffer
so assert RefCount is 1.
2017-04-11 09:57:45 +10:00
Kenneth Graunke
bd84252be6 i965/drm: Add stall warnings when mapping or waiting on BOs.
This restores the performance warnings removed in:

    i965: Drop brw_bo_map[_gtt] wrappers which issue perf warnings.

but adds them for nearly all BO mapping, and also for wait_rendering.

Because we add this to the core bufmgr, we automatically get stall
warnings in all callers, unlike before where only a few callsites used
the wrappers that gave stall warnings.

We also do it a bit differently: we simply measure how long set_domain
takes (the part that stalls), and complain if it's more than 0.01 ms.
We don't bother calling brw_bo_busy(), and we don't measure the mmap
time (which doesn't stall).  This should be more accurate.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-04-10 14:33:18 -07:00
Kenneth Graunke
f053ee78ed i965/drm: Make a set_domain() helper function.
Less boilerplate.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-04-10 14:33:18 -07:00
Daniel Vetter
a99a4979fd i965/batch: Ensure we use a consistent offset in relocs
In theory gcc is free to re-load them, and if a concurrent
execbuf races and updates bo->offset64 then we have a problem:
execbuffer api requires that the ->presumed_offset and the one
we used for the reloc matches. It does not require that the value
is sensible, which means no locks needed, just a consistent load.

Ken said his next series will nuke this, so just hand-roll the
kernel's READ_ONCE idea inline.

FIXME: Most callers of brw_emit_reloc recompute the relocation
themselves, which means this doesn't really fix the race. But the long
term plan is to move to per-context relocation handling, which will
fix this all properly. So leave this for now as just a reminder.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-10 14:33:18 -07:00
Daniel Vetter
7f3c85c21e i965/bufmgr: Garbage-collect vma cache/pruning
This was done because the kernel has 1 global address space, shared
with all render clients, for gtt mmap offsets, and that address space
was only 32bit on 32bit kernels.

This was fixed  in

commit 440fd5283a87345cdd4237bdf45fb01130ea0056
Author: Thierry Reding <treding@nvidia.com>
Date:   Fri Jan 23 09:05:06 2015 +0100

    drm/mm: Support 4 GiB and larger ranges

which shipped in 4.0. Of course you still want to limit the bo cache
to a reasonable size on 32bit apps to avoid ENOMEM, but that's better
solved by tuning the cache a bit. On 64bit, this was never an issue.

On top, mesa never set this, so it's all dead code. Collect an trash it.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-10 14:33:18 -07:00
Daniel Vetter
1f965d3f7a i965/bufmgr: Remove some reuse functions
is_reusable was needed by uxa because it couldn't keep track of its
scanout buffers and used this as a proxy. Disabling reuse is a silly
idea, we set this once at start. Remove both.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-10 14:33:18 -07:00
Daniel Vetter
edd85c1f04 i965/bufmgr: remove start_gtt_access
Iirc this was used by uxa for persistent mmpas of the frontbuffer. For
mesa all the set_domain stuff needed before a synchronized mmap is handled
within the bufmgr, so no reason ever to call this.

Inline the implementation into its only internal user.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-10 14:33:17 -07:00
Daniel Vetter
439edaa4b5 i965/bufmgr: Delete set_tiling
Entirely unused, and really shouldn't be used. The alloc functions already
take care of this. And even in a future where we're not going to
h/v-align tiled buffers in the bufmgr, but only in isl, I think we
still want to adjust the tiling mode in the bufmgr, since that ties in
closely to mmaps and stuff like that.

get_tiling is still needed for the import paths (until we have modifiers
everywhere).

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-10 14:33:17 -07:00
Daniel Vetter
6308121475 i965/bufmgr: Delete alloc_for_render
Entirely unused, mesa instead used the BO_ALLOC_FOR_RENDER flag.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-10 14:33:14 -07:00
Kenneth Graunke
538fa87f40 i965/drm: Use list_for_each_entry_safe in a couple of cases.
Suggested by Chris Wilson.  A tiny bit simpler.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-04-10 14:33:12 -07:00
Kenneth Graunke
10929da5fb i965/drm: Rename intel_bufmgr_gem.c to brw_bufmgr.c.
Matches the class name and the header file name.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:32:32 -07:00
Kenneth Graunke
7aa66e64fe i965/drm: Reindent intel_bufmgr_gem.c and brw_bufmgr.h.
indent -i3 -nut -br -brs -npcs -ce --no-tabs -Tuint32_t -Tuint64_t
plus some manual fixes because those aren't quite the right settings.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:32:30 -07:00
Kenneth Graunke
d30a92738c i965/drm: Rename drm_bacon_bo to brw_bo.
The bacon is all gone.

This renames both the class and the related functions.  We're about to
run indent on the bufmgr code, so no need to worry about fixing bad
indentation.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:32:28 -07:00
Kenneth Graunke
e0d15e9769 i965: Drop brw_bo_map[_gtt] wrappers which issue perf warnings.
The stupid reason for eliminating these functions is that I'm about
to rename drm_bacon_bo_map() to brw_bo_map(), which makes the real
function have the short name, rather than the wrapper.

I'm also planning on reworking our mapping code soon, so we use WC
mappings and proper unsynchronized mappings on non-LLC platforms.
It will be easier to do that without thinking about the stall
warnings and wrappers.

My eventual hope is to put the performance warnings in the BO map
function itself, so all callers gain the warning.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:32:25 -07:00
Kenneth Graunke
dfd81373b6 i965/drm: Rename drm_bacon_reg_read() to brw_reg_read().
Less bacon.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:32:24 -07:00
Kenneth Graunke
662a733dbc i965/drm: Rename drm_bacon_bufmgr to struct brw_bufmgr.
Also stop using typedefs, per Mesa coding style.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:32:21 -07:00
Kenneth Graunke
f5216b25e0 i965: Just use a uint32_t context handle rather than a malloc'd wrapper.
drm_bacon_context is a malloc'd struct containing a uint32_t context ID
and a pointer back to the bufmgr.  The bufmgr pointer is pretty useless,
as everybody already has brw->bufmgr.  At that point...we may as well
just use the ctx_id handle directly.  A number of places already had to
call drm_bacon_gem_context_get_id() to extract the ID anyway.  Now they
just have it.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:32:20 -07:00
Kenneth Graunke
4cb3e4429d i965/drm: Fold drm_bacon_gem_reset_stats into the callers.
We're going to get rid of drm_bacon_context shortly, so we'd have to
change the interface slightly.  It's basically just an ioctl wrapper
that isn't terribly bufmgr-related, so We may as well just combine it
with the code in brw_reset.c that actually uses it.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:32:19 -07:00
Kenneth Graunke
414c9343a2 i965/drm: Rename drm_bacon_gem_bo_bucket to bo_cache_bucket.
No need for a prefix as this struct is local to the .c file.

Less bacon.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:32:17 -07:00
Kenneth Graunke
e46b74d1b5 i965/drm: Drop drm_bacon_* from static functions.
Mesa style is to not use lengthy prefixes for static functions.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:32:16 -07:00
Kenneth Graunke
13596ecb6b i965/drm: Drop drm_bacon_gem_bo_madvise_internal().
The only difference is that it takes an explicit bufmgr rather than
using bo->bufmgr, but there is only one bufmgr per screen so they
should be identical anyway.

Chris says this was added primarly to avoid bo/bo_gem casting,
which was inconvenient.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:32:15 -07:00
Kenneth Graunke
9ee252865e i965/drm: Merge drm_bacon_bo_gem into drm_bacon_bo.
The separate class gives us a bit of extra encapsulation, but I don't
know that it's really worth the boilerplate.  I think we can reasonably
expect the rest of the driver to be responsible.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:32:14 -07:00
Kenneth Graunke
59fdd94b85 i965/drm: Merge bo->handle and bo_gem->gem_handle.
These fields are the same value.  In the bad old days, bo->handle could
have been an identifier from the pre-GEM fake bufmgr, but that's long
gone.  Keep the "gem_handle" name for clarity.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:32:08 -07:00
Kenneth Graunke
eb41aa82c4 i965/drm: Rewrite relocation handling.
The execbuf2 kernel API requires us to construct two kinds of lists.
First is a "validation list" (struct drm_i915_gem_exec_object2[])
containing each BO referenced by the batch.  (The batch buffer itself
must be the last entry in this list.)  Each validation list entry
contains a pointer to the second kind of list: a relocation list.
The relocation list contains information about pointers to BOs that
the kernel may need to patch up if it relocates objects within the VMA.

This is a very general mechanism, allowing every BO to contain pointers
to other BOs.  libdrm_intel models this by giving each drm_intel_bo a
list of relocations to other BOs.  Together, these form "reloc trees".

Processing relocations involves a depth-first-search of the relocation
trees, starting from the batch buffer.  Care has to be taken not to
double-visit buffers.  Creating the validation list has to be deferred
until the last minute, after all relocations are emitted, so we have the
full tree present.  Calculating the amount of aperture space required to
pin those BOs also involves tree walking, which is expensive, so libdrm
has hacks to try and perform less expensive estimates.

For some reason, it also stored the validation list in the global
(per-screen) bufmgr structure, rather than as an local variable in the
execbuffer function, requiring locking for no good reason.

It also assumed that the batch would probably contain a relocation
every 2 DWords - which is absurdly high - and simply aborted if there
were more relocations than the max.  This meant the first relocation
from a BO would allocate 180kB of data structures!

This is way too complicated for our needs.  i965 only emits relocations
from the batchbuffer - all GPU commands and state such as SURFACE_STATE
live in the batch BO.  No other buffer uses relocations.  This means we
can have a single relocation list for the batchbuffer.  We can add a BO
to the validation list (set) the first time we emit a relocation to it.
We can easily keep a running tally of the aperture space required for
that list by adding the BO size when we add it to the validation list.

This patch overhauls the relocation system to do exactly that.  There
are many nice benefits:

- We have a flat relocation list instead of trees.
- We can produce the validation list up front.
- We can allocate smaller arrays and dynamically grow them.
- Aperture space checks are now (a + b <= c) instead of a tree walk.
- brw_batch_references() is a trivial validation list walk.
  It should be straightforward to make it O(1) in the future.
- We don't need to bloat each drm_bacon_bo with 32B of reloc data.
- We don't need to lock in execbuffer, as the data structures are
  context-local, and not per-screen.
- Significantly less code and a better match for what we're doing.
- The simpler system should make it easier to take advantage of
  I915_EXEC_NO_RELOC in a future patch.

Improves performance in Synmark 7.0's OglBatch7:

    - Skylake GT4e: 12.1499% +/- 2.29531%  (n=130)
    - Apollolake:   3.89245% +/- 0.598945% (n=35)

Improves performance in GFXBench4's gl_driver2 test:

    - Skylake GT4e: 3.18616% +/- 0.867791% (n=229)
    - Apollolake:   4.1776%  +/- 0.240847% (n=120)

v2: Feedback from Chris Wilson:
    - Omit explicit zero initializers for garbage execbuf fields.
    - Use .rsvd1 = ctx_id rather than i915_execbuffer2_set_context_id
    - Drop unnecessary fencing assertions.
    - Only use _WR variant of execbuf ioctl when necessary.
    - Shrink the arrays to be smaller by default.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:32:00 -07:00
Kenneth Graunke
e7ab0ea5e7 i965/drm: Make register write check handle execbuffer directly.
I'm about to rewrite how relocation handling works, at which point
drm_bacon_bo_emit_reloc() and drm_bacon_bo_mrb_exec() won't exist
anymore.  This code is already largely not using the batchbuffer
infrastructure, so just go all the way and handle relocations, the
validation list, and execbuffer ourselves.  That way, we don't have
to think the weird case where we only have a screen, and no context,
when redesigning the relocation handling.

v2: Write reloc.presumed_offset + reloc.delta into the batch, rather
    than duplicating the comment, so it's obvious that they match
    (suggested by Chris).  Also add a comment about why we don't do
    any error checking.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:56 -07:00
Kenneth Graunke
6368284a34 i965: Make a screen::aperture_threshold field.
This is the threshold after which drm_intel_bufmgr_check_aperture_space
returns -ENOSPC, signalling that it thinks an execbuf is likely to fail
and we need to roll back and flush the batch.

We'll need this when we rewrite aperture space checking, shortly.
In the meantime, we can also use it in GLX_MESA_query_renderer.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:55 -07:00
Kenneth Graunke
6079f4f16e i965: Make/use a brw_batch_references() wrapper.
We'll want to change the implementation of this shortly.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:54 -07:00
Kenneth Graunke
6537a3ca11 i965: Use brw_emit_reloc() instead of drm_bacon_bo_emit_reloc().
I'm about to make brw_emit_reloc do actual work, so everybody needs
to start using it and not the raw drm_bacon function.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:52 -07:00
Kenneth Graunke
eadd5d1b51 i965: Change intel_batchbuffer_reloc() into brw_emit_reloc().
This renames intel_batchbuffer_reloc to brw_emit_reloc and changes the
parameter naming and ordering to match drm_intel_bo_emit_reloc().

For now, it's a trivial wrapper that accesses batch->bo.  When we
rework relocations, it will start doing actual work.

target_offset should be expanded to a uint64_t to match the kernel,
but for now we leave it as its original 32-bit type.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:51 -07:00
Kenneth Graunke
fbb3297165 i965/drm: Drop GEM_SW_FINISH stuff.
This is only useful when doing an incoherent CPU mapping of the current
scanout buffer.  That's a terrible plan, so we never do it.  We always
use an uncached GTT map.

So, this is useless.  Drop the code.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:49 -07:00
Kenneth Graunke
80761a42e0 i965/drm: Drop code to search for an existing bufmgr.
This functionality was added by libdrm commit
743af59669386cb6e063fa4bd85f0a0b2da86295 (intel: make bufmgr_gem
shareable from different API) in an attempt to solve libva/mesa buffer
sharing problems.  Specifically, this was working around an issue hit
by Chromium, which used the same drm_fd for multiple APIs, and shared
buffers between them.

This code attempted to work around that issue by using the same bufmgr
for both libva and Mesa.  It worked because libdrm_intel was loaded by
both libraries.  However, now that Mesa has forked, we don't have a
common library, and this code cannot work.

The correct solution is to have each API open its own file descriptor
(and get a corresponding buffer manager), and then use PRIME export
and import to share BOs across those APIs.  Then the kernel can manage
those shared resources.  According to Chris, the kernel will pass back
the same handle for a prime FD if the lookup is from the same device FD.

We believe Chromium has since moved to this model.

In Mesa, there is already only one screen per FD, and so there will
only be one bufmgr per FD.  We don't need any of this code.

v2: Add a big warning comment written by Chris Wilson.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:48 -07:00
Kenneth Graunke
b666654201 i965/drm: Unwrap the unnecessary drm_bacon_reloc_target_info struct.
This used to have another field, but now it's just a BO pointer.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:46 -07:00
Kenneth Graunke
2662894baa i965/drm: Switch from uthash to Mesa's hash table.
No performance data has been gathered about this choice.  I just don't
want that many hash tables.  Chris points out that this is not
performance critical - we should not be recreating that many handles
from scratch.  In the past we used a linear list, which became
unreasonable in stress tests that used hundreds of thousands of BOs.
In real usage, it shouldn't matter that much.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:45 -07:00
Kenneth Graunke
ad1b1cce44 i965/drm: Drop bo_gem::kflags.
It's always zero now.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:43 -07:00
Kenneth Graunke
a972c903cb i965/drm: Drop has_exec_async related API.
Mesa doesn't use this yet.  We'll almost certainly want to, but we can
add the functionality back after we clean up the messy drm code.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:42 -07:00
Kenneth Graunke
d606f64e2d i965/drm: Drop softpin support for now.
We may want this eventually, but simplify for now.  We can add it back
later when we actually intend to use it.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:41 -07:00
Kenneth Graunke
0314eed3b1 i965/drm: Drop userptr support for now.
We'll want userptr support for GL_AMD_pinned_memory support someday,
and possibly some other upload optimizations.  Chris says "not in this
form" though.  Drop it and simplify for now - we can add it back later
when we're ready to hook it up fully.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:39 -07:00
Kenneth Graunke
a460e1eb51 i965/drm: Delete engine checks.
This is basically handholding to prevent a bogus caller from trying to
execbuffer on a bogus engine.  i965 already does this correctly.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:37 -07:00
Kenneth Graunke
1dc02da6d7 i965/drm: Drop intel_chipset.h in favor of using gen_device_info.
This moves the PCI ID detection to intel_screen.c and makes
drm_bacon_bufmgr_gem_init() take a devinfo pointer.

We also drop the HAS_LLC query stuff - devinfo has that info already,
without kernel queries, and it makes no sense to have two has_llc flags
set by different mechanisms.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:36 -07:00
Kenneth Graunke
55ee8f36a8 i965/drm: Drop deprecated drm_bacon_bo::offset.
This field was the wrong size, so we replaced it with offset64.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:35 -07:00
Kenneth Graunke
a29fb9b2ee i965/drm: Drop has_wait_timeout.
The wait-ioctl was introduced in kernel v3.6 (20120930) and that is our
current minimum requirement for screen creation.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:33 -07:00
Kenneth Graunke
b97bcf3b6b i965/drm: Assume aperture size query will work.
This query has been available since 2.6.28.  We require 3.6.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:32 -07:00
Kenneth Graunke
c28691ab77 i965/drm: Combine drm_bacon_bufmgr_gem and drm_bacon_bufmgr classes.
The distinction was required when the bufmgr was virtualised, now there
is only one class, we no longer need the distraction of pretending it is
a subclass.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:31 -07:00
Kenneth Graunke
3673b89bf3 i965/drm: Move _drm_bacon_context to intel_bufmgr_gem.c.
This moves us one step closer to killing off intel_bufmgr_priv.h.

We might want to nuke it altogether, since it's basically just a
uint32_t handle, but for now, let's focus on removing files.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:29 -07:00
Kenneth Graunke
b0d1c5983b i965/drm: Drop cliprects and dr4 from execbuf variants.
Legacy DRI1 leftovers.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:28 -07:00
Kenneth Graunke
2c257ff226 i965/drm: Devirtualize the bufmgr.
libdrm_bacon used to have a GEM-based bufmgr and a legacy fake bufmgr,
but that's long since dead (and we never imported it to i965).  So,
drop the extra layer of function pointers.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:27 -07:00
Kenneth Graunke
dca224a9ef i965/drm: Check INTEL_DEBUG & DEBUG_BUFMGR directly.
Eliminates some API around this, and more importantly, the last
field in one bufmgr class.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:25 -07:00
Kenneth Graunke
68cb0c6d92 i965/drm: Use Mesa's macros.h instead of duplicating them.
Replace the duplicated macros imported from libdrm:

   ARRAY_SIZE, MAX2, ALIGN, STATIC_ASSERT

and remove unused ROUND_UP_TO and ROUND_UP_TO_MB.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:24 -07:00
Kenneth Graunke
c5cdb0f405 i965/drm: Use ALIGN, not ROUND_UP_TO.
ROUND_UP_TO handles a NPOT alignment, but all the alignments we use
are power of two anyway, so there's no need.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:23 -07:00
Kenneth Graunke
1d476e64e5 i965/drm: Delete execbuf1 support.
execbuf2 has been around since v2.6.33.  We require v3.6.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:21 -07:00
Kenneth Graunke
ddf01d3f41 i965/drm: Remove Gen2-3 fence accounting.
Since gen4, we do not use fence registers for any GPU access and so
never have to account for the fence during batch construction. All the
related fence functions are unused.

Based on Kristian Høgsberg's patch; commit message by Chris Wilson.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:20 -07:00
Kenneth Graunke
4f698b0049 i965/drm: Remove some unused functions and macros.
Mesa doesn't use these functions or macros, so we can delete them,
and save work refactoring and cleaning them up.  We'll delete a lot
more later, too.

Based on a patch by Kristian Høgsberg.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:18 -07:00
Kenneth Graunke
09b2f6124a i965/drm: Switch to util/list.h instead of libdrm_lists.h.
Both are kernel style lists, so this is trivial.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:16 -07:00
Kenneth Graunke
7c64096b2d i965/drm: Port to Mesa's atomic header.
Drop xf86atomic.h in favor of Mesa's util/u_atomic.h.  We replace the
atomic_t wrapper struct with a bare integer, switch to the 'p_atomic'
naming conventions, and move over the one extra helper.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:13 -07:00
Kenneth Graunke
eed86b975e i965/drm: Use our internal libdrm (drm_bacon) rather than the real one.
Now we can actually test our changes.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:11 -07:00
Kenneth Graunke
91b973e3a3 i965/drm: s/drm_intel/drm_bacon/g
Using drm_intel_* as a prefix is hazardous - we don't want to conflict
with the actual libdrm_intel symbols.  In particular, I think we could
get into trouble during the final megadrivers linking.

So, rename everything to an different yet arbitrary prefix.  bacon and
intel are the same number of characters, so we don't have to reindent
the world.  It's also an homage to Ian's "Bacon Trail" platform.

I was going to use "drm_relic" to poke fun at libdrm being ancient,
and so we could explain the name with a "historical reasons" pun,
but it sounds too much like ralloc.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:09 -07:00
Kenneth Graunke
4ad0758f51 i965/drm: Drop libpciaccess dependencies.
i965 doesn't use drm_intel_get_aperture_sizes(), so we can delete
support for it.  This avoids a build dependency on libpciaccess.

Chris also notes:

"There's a really old bug that hopefully has been closed already
 (although as far as I can tell, it has never been fixed) about
 how using libpciaccess from libdrm_intel breaks the world (since
 libpciaccess uses a singleton that is torn down at the first request
 rather than upon the last user)."

This bug should go away in two commits when we switch over to our
internal copy of libdrm_intel.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84325
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:05 -07:00
Kenneth Graunke
d614135e95 i965/drm: Make libdrm_lists.h compile by defining typeof.
typeof doesn't seem to exist, so this won't compile (but we don't yet
try).  Define it to __typeof__.  This code is going to die soon anyway.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:03 -07:00
Kenneth Graunke
b97c7ef4c8 i965/drm: remove legacy defines, aub functions, and decoder prototypes
We never imported any of this code, so drop the prototypes, unused
enums, and defines.

Based on patches by Emil Velikov.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:31:00 -07:00
Kenneth Graunke
514db96c11 i965: Import libdrm_intel.
This imports commit 19c4cfc54918d361f2535aec16650e9f0be667cd of
libdrm/intel/*.[ch], minus a few files that we're never going to use
(and would immediately delete), plus a few necessary dependencies.

We rename intel_bufmgr.h to brw_bufmgr.h to avoid #include conflicts.
We also fix UTF-8 symbol problems in intel_bufmgr_gem.c comments
because vim keeps trying to fix that every time I edit the file,
and we may as well fix it right away.

Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:30:53 -07:00
Kenneth Graunke
915820cc59 i965: Make sure we don't use CPU maps for the scanout buffer.
Using an incoherent CPU map on the active scanout buffer is really
sketchy - we may need extra flushing via GEM_SW_FINISH, or using
drmModeDirtyFB() and kernel commit a6a7cc4b7db6d (4.10+).

Chris suggests "never ever do that", which seems like a wise plan!

intel_miptree_map_raw() uses CPU maps on linear buffers.

Having a linear scanout buffer should be really rare, and mapping the
front buffer should be similarly rare.  Together, it should basically
never happen.  But, in case it does somehow...make sure that mapping
the scanout buffer always goes through an uncached GTT map.

v2: Add a giant comment written by Chris Wilson.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-10 14:30:49 -07:00
Kenneth Graunke
eb28ce2b0b i965: Stop calling drm_intel_bufmgr_gem_enable_fenced_relocs().
This does nothing on Gen4+, which is the only hardware we support.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:30:44 -07:00
Kenneth Graunke
034b220dc4 i965: Fix GLX_MESA_query_renderer video memory on 32-bit.
On modern systems with 4GB apertures, the size in bytes is 4294967296,
or (1ull << 32).  The kernel gives us the aperture size as a __u64,
which works out great.

Unfortunately, libdrm "helpfully" returns the data as a size_t, which
on 32-bit systems means it truncates the aperture size to 0 bytes.
We've happily reported this value as 0 MB of video memory via
GLX_MESA_query_renderer since it was originally exposed.

This patch bypasses libdrm and calls the ioctl ourselves so we can
use a proper uint64_t, avoiding the 32-bit integer overflow.  We now
report a proper video memory size on 32-bit systems.

Chris points out that the aperture size (CPU mappable size limit)
isn't really the right thing to be checking.  But libdrm_intel uses
it to fail execbuffer, so it is an actual limit for now.  Once that's
fixed we can probably move to something else.  In the meantime, fix
the obvious typecasting bug.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-10 14:30:40 -07:00
Samuel Pitoiset
5bcfe90501 gallium/radeon: add HUD queries for GPU temperature and clocks
Only the Radeon kernel driver exposed the GPU temperature and
the shader/memory clocks, this implements the same functionality
for the AMDGPU kernel driver.

These queries will return 0 if the DRM version is less than 3.10,
I don't explicitely check the version here because the query
codepath is already a bit messy.

v2: - rebase on top of master

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-10 23:06:19 +02:00
Samuel Pitoiset
0f39fb8500 configure.ac: require libdrm_amdgpu 2.4.79
The sensor info requires amdgpu_query_sensor_info().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-10 23:06:17 +02:00
Samuel Pitoiset
def02007cd radeonsi: add new si_check_render_feedback_texture() helper
For bindless.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-10 23:05:41 +02:00
Samuel Pitoiset
fbcc8664fd radeonsi: add new si_decompress_color_texture() helper
For bindless.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-10 23:05:38 +02:00
Samuel Pitoiset
6646212de0 radeonsi: add new depth_needs_decompression() helper
v2: - rename to depth_needs_decompression() instead

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-10 23:05:32 +02:00
Samuel Pitoiset
9cc91ba6d5 radeonsi: add a 'break' in si_check_render_feedback_*()
No need to check all color buffers.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-10 23:05:29 +02:00
Samuel Pitoiset
51d6641700 radeonsi: re-use 'desc' in si_set_shader_image()
No need to compute the offset in the descriptor twice.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-10 23:05:27 +02:00
Samuel Pitoiset
a1c37ff9e4 ac: add unreachable() in ac_build_image_opcode()
To silent the following compiler warning:

common/ac_llvm_build.c: In function ‘ac_build_image_opcode’:
common/ac_llvm_build.c:1080:3: warning: ‘name’ may be used uninitialized in this function [-Wmaybe-uninitialized]
   snprintf(intr_name, sizeof(intr_name), "%s%s%s%s.v4f32.%s.v8i32",
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    name,
    ~~~~~
    a->compare ? ".c" : "",
    ~~~~~~~~~~~~~~~~~~~~~~~
    a->bias ? ".b" :
    ~~~~~~~~~~~~~~~~
    a->lod ? ".l" :
    ~~~~~~~~~~~~~~~
    a->deriv ? ".d" :
    ~~~~~~~~~~~~~~~~~
    a->level_zero ? ".lz" : "",
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~
    a->offset ? ".o" : "",
    ~~~~~~~~~~~~~~~~~~~~~~
    type);
    ~~~~~

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-10 23:02:12 +02:00
Constantine Kharlamov
61e47d92c5 r600g: get rid of dummy pixel shader
The idea is taken from radeonsi. The code mostly was already checking for null
pixel shader, so little checks had to be added.

Interestingly, acc. to testing with GTAⅣ, though binding of null shader happens
a lot at the start (then just stops), but draw_vbo() never actually sees null
ps.

v2: added a check I missed because of a macros using a prefix to choose
a shader.

Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-04-10 22:45:22 +02:00
Constantine Kharlamov
544b40089b r600g: add draw_vbo check for a NULL pixel shader
Taken from radeonsi, required to remove dummy pixel shader in the next patch

Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-04-10 22:45:22 +02:00
Constantine Kharlamov
22de96680c r600g: skip repeating vs, gs, and tes shader binds
The idea is taken from radeonsi. The code lacks some checks for null vs,
and I'm unsure about some changes against that, so I left it in place.

Some statistics for GTAⅣ:
Average tesselation bind skip per frame: ≈350
Average geometric shaders bind skip per frame: ≈260
Skip of binding vertex ones occurs rarely enough to not get into per-frame
counter at all, so I just gonna say: it happens.

v2: I've occasionally removed an empty line, don't do this.
v3: return a check for null tes and gs back, while I haven't figured out
the way to move stride assignment to r600_update_derived_state() (as it
is in radeonsi).

Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-04-10 22:45:22 +02:00
Bartosz Tomczyk
a4019a81ab mesa: use single memcpy when strides match in glReadPixels, texstore code
v2: fix indentation

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-10 14:42:17 -06:00
Jason Ekstrand
da2ac19511 intel/blorp: Use ISL for emitting depth/stencil/hiz
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-10 07:57:21 -07:00
Jason Ekstrand
d3785dcb2f intel/blorp: Emit 3DSTATE_STENCIL_BUFFER before HIER_DEPTH
We're about to replace blorp's emit code with ISL and it emits them in
the other order.  This makes diffing the aubs easier.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-10 07:57:21 -07:00
Jason Ekstrand
f93dc5beee anv: Use ISL for emitting depth/stencil/hiz
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-10 07:57:21 -07:00
Jason Ekstrand
bf95f7c209 intel/isl: Add support for emitting depth/stencil/hiz
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-10 07:57:21 -07:00
Thomas Hindoe Paaboel Andersen
957ccbe04a amd/addrlib: use correct variable name in header
Since the inclusion in 7f160efcde
the header used x_biased, while the implementation used y_biased.
This changes the header to macth the implementation since the
uses of the function seems to expect y_biased.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-10 12:44:59 +10:00
Timothy Arceri
d0791ac2ed mesa/st: take ownership rather than adding reference for new renderbuffers
This avoids locking in the reference calls and fixes a leak after the
RefCount initialisation was change from 0 to 1.

Fixes: 32141e53d1 (mesa: tidy up renderbuffer RefCount initialisation)

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Bartosz Tomczyk <bartosz.tomczyk86@gmail.com>
2017-04-10 10:55:34 +10:00
Timothy Arceri
d9fe82fe41 x11: take ownership rather than adding reference for new renderbuffers
This avoids locking in the reference calls and fixes a leak after the
RefCount initialisation was change from 0 to 1.

Fixes: 32141e53d1 (mesa: tidy up renderbuffer RefCount initialisation)

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-10 10:55:34 +10:00
Timothy Arceri
a85b4e5719 osmesa: tidy up renderbuffer refCount initialisation
32141e53d1 changed _mesa_init_renderbuffer() to set it to 1 for
us.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-10 10:55:34 +10:00
Timothy Arceri
e6d6266e6f swrast: take ownership rather than adding reference for new renderbuffers
This avoids locking in the reference calls and fixes a leak after the
RefCount initialisation was change from 0 to 1.

Fixes: 32141e53d1 (mesa: tidy up renderbuffer RefCount initialisation)

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-10 10:55:34 +10:00
Timothy Arceri
6c02387b2c radeon: take ownership rather than adding reference for new renderbuffers
This avoids locking in the reference calls and fixes a leak after the
RefCount initialisation was change from 0 to 1.

Fixes: 32141e53d1 (mesa: tidy up renderbuffer RefCount initialisation)

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-10 10:55:34 +10:00
Timothy Arceri
1b85009ec1 nouveau: take ownership rather than adding reference for new renderbuffers
This avoids locking in the reference calls and fixes a leak after the
RefCount initialisation was change from 0 to 1.

Fixes: 32141e53d1 (mesa: tidy up renderbuffer RefCount initialisation)

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-10 10:55:34 +10:00
Timothy Arceri
3387f66cab i965: take ownership rather than adding reference for new renderbuffers
This avoids locking in the reference calls and fixes a leak after the
RefCount initialisation was change from 0 to 1.

Fixes: 32141e53d1 (mesa: tidy up renderbuffer RefCount initialisation)

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-10 10:55:34 +10:00
Timothy Arceri
c355675440 i915: take ownership rather than adding reference for new renderbuffers
This avoids locking in the reference calls and fixes a leak after the
RefCount initialisation was change from 0 to 1.

Fixes: 32141e53d1 (mesa: tidy up renderbuffer RefCount initialisation)

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-10 10:55:34 +10:00
Timothy Arceri
074a485d35 mesa: create _mesa_attach_renderbuffer_without_ref() helper
This will be used to take ownership of freashly created renderbuffers,
avoiding the need to call the reference function which requires
locking.

V2: dereference any existing fb attachments and actually attach the
    new rb.

v3: split out validation and attachment type/complete setting into
    a shared static function.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Bartosz Tomczyk <bartosz.tomczyk86@gmail.com>
2017-04-10 10:55:34 +10:00
Ilia Mirkin
89253d5c67 nv50/ir: remove unused swizzle field in ValueRef
The nv50 ir is scalar. Perhaps this was from some early attempts to
integrate the simd aspects of nv30. However at this point it's entirely
unused.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-04-09 14:59:42 -04:00
Boyan Ding
b1b189a0ab nouveau: enable ARB_shader_clock on nv50 and nvc0
v2: Also enable support on nv50

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-04-09 13:03:13 -04:00
Boyan Ding
6c3dd8f0ed nv50/ir: Handle TGSI_OPCODE_CLOCK
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
[imirkin: make zero mov non-fixed]
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-04-09 13:03:13 -04:00
Boyan Ding
e2e2c69927 gm107/ir: Emit SV_CLOCK system value
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-04-09 13:03:13 -04:00
Ben Widawsky
6e907812f8 gbm: Assert modifiers and count are copacetic
The API/entry point in mesa already checks the correct behavior,
however, it's possible to be handled by another implementation and those
implementations should not be able to abuse a weird combination of count
and pointer.

This fixes CID 1403193

Cc: Mark Janes <mark.a.janes@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-09 09:29:57 -07:00
Gustaw Smolarczyk
a2eae66b8b st/mesa: Use compressed fog mode for atifs.
Signed-off-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-04-08 20:29:58 +02:00
Gustaw Smolarczyk
8a4b93b1d9 mesa/main/ff_frag: Use compressed TexEnv Combine state.
Along the way, add missing GL_ONE source support and drop non-existing
GL_ZERO and GL_ONE operand support.

Signed-off-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-04-08 20:29:58 +02:00
Gustaw Smolarczyk
f7c9bf0c6b mesa/main/ff_frag: Use compressed fog mode.
Signed-off-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-04-08 20:29:58 +02:00
Gustaw Smolarczyk
837ad2dc38 mesa/main: Maintain compressed TexEnv Combine state.
Signed-off-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-04-08 20:29:58 +02:00
Gustaw Smolarczyk
6fa34de830 mesa/main: Maintain compressed fog mode.
Signed-off-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-04-08 20:29:58 +02:00
Gustaw Smolarczyk
c9b2938aec mesa/main/ff_frag: Don't retrieve format if not necessary.
Signed-off-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-04-08 20:29:58 +02:00
Gustaw Smolarczyk
885012aab2 mesa/main/ff_frag: Use gl_texture_object::TargetIndex.
Instead of computing it once again using _mesa_tex_target_to_index.

Signed-off-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-04-08 20:29:58 +02:00
Gustaw Smolarczyk
a86891a9a9 mesa/main/ff_frag: Store nr_enabled_units only once.
Signed-off-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-04-08 20:29:58 +02:00
Gustaw Smolarczyk
0e89ab0d6e mesa/main/ff_frag: Simplify get_fp_input_mask.
Change it into filter_fp_input_mask transform function that instead of
returning a mask, transforms input.

Also, simplify the case of vertex program handling by assuming that
fp_inputs is always a combination of VARYING_BIT_COL* and VARYING_BIT_TEX*.

Signed-off-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-04-08 20:29:58 +02:00
Gustaw Smolarczyk
f5e685da06 mesa/main/ff_frag: Don't bother with VARYING_BIT_FOGC.
It's not used.

Signed-off-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-04-08 20:29:58 +02:00
Gustaw Smolarczyk
03b9b3c471 mesa/main/ff_frag: Remove unused struct.
Signed-off-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-04-08 20:29:58 +02:00
Gustaw Smolarczyk
ceb5ba9d1d mesa/main/ff_frag: Reduce the size of nr_enabled_units.
Since it holds values from 0 to 8, 4 bits will suffice.

Signed-off-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-04-08 20:29:58 +02:00
Gustaw Smolarczyk
439eca951f mesa/main/ff_frag: Remove enabled_units.
Its only usage is easily replaced by nr_enabled_units. As for cache key
part, unit[i].enabled should be enough.

Signed-off-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-04-08 20:29:58 +02:00
Gustaw Smolarczyk
3cc91537fa mesa/main/ff_frag: Use correct constant.
Since fixed-function shaders are restricted to MAX_TEXTURE_COORD_UNITS
texture units, use this constant instead of MAX_TEXTURE_UNITS. This
reduces the array size from 32 to 8.

Signed-off-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-04-08 20:29:57 +02:00
Jason Ekstrand
098ca9949d intel/isl: Use genx_bits.h instead of a hand-rolled table
This gets rid of one piece of ugliness with the way ISL handles surface
emitting surface states.  I've never liked that hand-rolled table but it
was the best we had at the time.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-07 22:34:04 -07:00
Jason Ekstrand
b85d75b3e8 intel/genxml/bits: Emit per-container _length helpers
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-07 22:34:04 -07:00
Jason Ekstrand
f97e251ab2 intel/genxml/bits: Emit per-field _start helpers
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-07 22:34:04 -07:00
Jason Ekstrand
430e697868 intel/genxml/bits: Pull the function emit code into a helper block
The helper block is extremely general.  It takes an string property name
and an object that supports three methods: has_prop, iter_prop, and
get_prop.  This way we can easily generalize it to emit more different
types of getter functions.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-07 22:34:04 -07:00
Jason Ekstrand
2d52e65d03 intel/genxml/bits: Refactor to add a container class
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-04-07 22:34:04 -07:00
Ilia Mirkin
57a744025a nvc0/ir: fix overwriting of offset register with interpolateAtOffset
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2017-04-07 23:31:01 -04:00
Jason Ekstrand
bc68aa42bd anv: Use subpass dependencies for flushes
Instead of figuring it all out ourselves, just use the information given
to us by the client.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-07 19:24:14 -07:00
Jason Ekstrand
e5bbf8be36 anv/pass: Record required pipe flushes
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-07 19:24:14 -07:00
Jason Ekstrand
0039d0cf27 anv/pass: Use anv_multialloc for allocating the anv_pass
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-07 19:24:14 -07:00
Jason Ekstrand
415633a722 anv/descriptor_set: Use anv_multialloc for descriptor set layouts
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-07 19:24:14 -07:00
Jason Ekstrand
e5c29b8c27 anv: Add a helper for doing mass allocations
We tend to try to reduce the number of allocation calls the Vulkan
driver uses by doing a single allocation whenever possible for a data
structure.  While this has certain downsides (usually code complexity),
it does mean error handling and cleanup is much easier.  This commit
adds a nice little helper struct for getting rid of some of that
complexity.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-07 19:24:14 -07:00
Jason Ekstrand
82695c32b6 anv: Add helpers for converting access flags to pipe bits
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-04-07 19:24:14 -07:00
Timothy Arceri
9d69416a7e mesa: simplify and optimise vertex bindings tracking
We only need to update it if something changes. Also
_mesa_bind_vertex_buffer() will update the mask when binding to a
NULL or default buffer so no need to do that update here.

Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-04-08 11:18:50 +10:00
Timothy Arceri
bfabef0e71 glsl: fix lower jumps for nested non-void returns
Fixes the case were a loop contains a return and the loop is
nested inside an if.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
https://bugs.freedesktop.org/show_bug.cgi?id=100303
2017-04-08 11:18:32 +10:00
Ilia Mirkin
5dd490f134 gallium: fix some math formulas to display better
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-07 20:20:17 -04:00
Ilia Mirkin
60f5766db4 nvc0/ir: fix LSB/BFE/BFI implementations
Overwriting the src register is a very bad idea - it logically maps onto
the TGSI registers, and so is effectively overwriting the source values.

Reported-by: Boyan Ding <boyan.j.ding@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2017-04-07 20:20:16 -04:00
Nicolai Hähnle
c05cf9cf1b util: fix swizzle of INSTANCEID system value
radeonsi added stricter checking for correct swizzles in debug builds.

Reported-by: Michel Dänzer <michel.daenzer@amd.com>
Fixes: 4cf2942777 ("radeonsi: support 64-bit system values")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-08 00:44:52 +02:00
Bruce Cherniak
07b5b5cfd4 st/glx: Add awareness for multisample pixel formats to st/glx-xlib.
In preparation for enabling MSAA in OpenSWR, the state trackers need to
be aware of multisample pixel formats for software renderers.  This patch
allows glx-xlib to query the renderer for support of pixel
formats with multisample, and create multisample resources.

This change is benign to softpipe and llvmpipe, as is_format_supported
returns FALSE for any sample_count > 1.  OpenSWR does the same at the
moment, but that will change soon.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-04-07 16:50:58 -05:00
Tim Rowley
7bd5057fd1 swr: fix unused variable warnings
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-07 16:50:41 -05:00
Brian Paul
8046c247de glx: silence uninitialized var warning
Signed-off-by: Brian Paul <brianp@vmware.com>
2017-04-07 13:46:44 -06:00
Brian Paul
ee3f75f538 st/mesa: silence unused/uninitialized var warnings
Signed-off-by: Brian Paul <brianp@vmware.com>
2017-04-07 13:46:44 -06:00
Brian Paul
c77c381fae gallivm: init vars to silence gcc warnings
Silence warnings about using possibly uninitialized values.

Signed-off-by: Brian Paul <brianp@vmware.com>
2017-04-07 13:46:44 -06:00
Charmaine Lee
16bd2c6d04 svga: add context pointer to the invalidate surface interface
With this patch, we will specify the current context
when we invalidate the surface before the surface is
put back to the recycled surface pool. This allows the
winsys layer to use the specified context to do the
invalidation rather than using the last context that
referenced the surface. This prevents race condition if
the last referenced context is now made current in another thread.

Tested with MTT glretrace, NobelClinicianViewer.

Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2017-04-07 13:46:44 -06:00
Brian Paul
e000b17f87 winsys/svga: use c11 thread types/functions
Gallium no longer has wrappers for mutexes and condition variables.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-04-07 13:46:44 -06:00
Thomas Hellstrom
0864f9c77a winsys/svga: Resolve command submission buffer contention v3
If two contexts wanted to access the same buffer at the same time, it would
end up on two validation lists simultaneously, which might cause a
PIPE_ERROR_RETRY when trying to validate it from one context while the other
context already had it validated but not yet fenced.

In that situation we could spin until the error goes away, or apply various
more or less expensive locking schemes to save cpu.
Here we use a scheme that briefly locks after fencing but avoids locking on
validation in the non-contended case.

v2:
Make sure we broadcast not only on releasing buffers after fencing, but also
after releasing buffers in the pb_validate_validate error path.
v3:
Don't broadcast on PIPE_ERROR_RETRY because that would increase the chance
of starvation.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
2017-04-07 13:46:44 -06:00
Brian Paul
0baa372b6f svga: remove pre-SVGA3D_HWVERSION_WS8_B1 code
3D wasn't officially supported before virtual HW version 8 so we can
remove this old code.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-04-07 13:46:44 -06:00
Brian Paul
690fe77835 st/wgl: sort strings in stw_extension_string[] array
Trivial.
2017-04-07 13:46:44 -06:00
Charmaine Lee
b1c964447a svga: remove redundant surface propagation
Currently, surface propagation for colliding render target resource is
done at framebuffer emit time for vgpu10. This patch
adds the surface propagation for non-vgpu10 path to emit_fb_vgpu9()
and removes the redundant surface copy at set time.

Tested with MTT glretrace, piglit, NobelClinicianViewer, Turbine, Cinebench.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
2017-04-07 13:46:44 -06:00
Charmaine Lee
35a748e79c svga: Fix zslice index to svga_texture_copy_handle_resource()
The zslice index to svga_texture_copy_handle_resource() is not adjusted
and should be a signed integer.

This patch fixes piglit tests for non-vgpu10 including
   spec@arb_framebuffer_object@fbo-generatemipmap-3d
   spec@glsl-1.20@execution@tex-miplevel-selection gl2:texture* 3d

Tested with MTT piglit and glretrace
2017-04-07 13:46:44 -06:00
Brian Paul
5637a497a3 svga: specify include path for git_sha1.h for out-of-src builds
If we're doing an out-of-src build, we need to specify the #include
patch to find git_sha1.h

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-04-07 13:46:44 -06:00
Brian Paul
c78fc70e8c st/wgl: pseudo-implementation of WGL_EXT_swap_control
This implementation is based on querying the time just before swap/present
and doing a Sleep() if needed.  There is no sync to vblank or actual
coordination with the GPU.  This isn't perfect, but basically works.

We've had some request for this functionality, and it sounds like there
are some Windows GL apps that refuse to start if the driver doesn't
advertise this extension.

Note: NVIDIA's Windows OpenGL driver advertises the WGL_EXT_swap_control
string both with wglGetExtensionsStringEXT() and with
glGetString(GL_EXTENSIONS).  We're only advertising it with the former at
this time.

Tested with asst. Mesa demos, Google Earth, Lightsmark, etc.

VMware bug 1591534.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2017-04-07 13:46:43 -06:00
Charmaine Lee
ab96d1baf4 svga: Fix out-of-sync backing surface
When a backing surface is reused, it is possible that
the original surface has been changed. So before the backing surface
is bound again, we need to sync up the surface.
This patch creates a new helper function svga_texture_copy_handle_resource()
to sync up the backing surface resource.

This patch, together with the backing surface dirty bit fix, fixes
the rendering corruption in NobelClinicianViewer when rotating the model.

Also tested with MTT glretrace, piglit, Cinebench, Turbine.

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-07 13:46:43 -06:00
Charmaine Lee
a08e3b88ab svga: add a reset flag to svga_propagate_surface()
The reset flag specifies if the dirty bit needs to be reset
after the surface is propagated to the texture. This is used
to make sure that the dirty bit is not reset and stay unset
before the surface is unbound.

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-07 13:46:43 -06:00
Charmaine Lee
02c9bf2d54 svga: add the has_backed_views flag
The new has_backed_views flag specifies if any of the render target
views or depth stencil view is a backing surface view.
The flag is used in svga_propagate_rendertargets() so it can return early
if there is no surface to propagate.

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-07 13:46:43 -06:00
Charmaine Lee
a421d45e61 svga: only destroy render target view from a context that created it
A texture can be destroyed from a different context from which it is
created, but destroying the render target view from a different context
will cause svga device errors. Similar to shader resource view,
this patch skips destroying render target view or depth stencil view
from a non-parent context.

Fixes driver errors running NobelClinician Viewer application.

Tested with NobelClinician Viewer, MTT piglit, glretrace.

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-07 13:46:43 -06:00
Charmaine Lee
b4c4ee0762 svga: disable rasterization if rasterizer_discard is set or FS undefined
With this patch, rasterization will be disabled if the
rasterizer_discard flag is set or the fragment shader
is undefined due to missing position output from the
vertex/geometry shader.

Tested with piglit test glsl-1.50-geometry-primitive-id-restart.
Also tested with full MTT glretrace and piglit.

v2: As suggested by Roland, to properly disable rasterization, besides
    setting FS to NULL, we will also need to disable depth and stencil test.

v3: As suggested by Brian, set SVGA_NEW_DEPTH_STENCIL_ALPHA dirty bit
    in svga_bind_rasterizer_state() if the rasterizer_discard flag is
    changed.

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-07 13:46:43 -06:00
Charmaine Lee
fed72ff6cb svga: do not emulate wide points in GS when doing transform feedback
Emulating wide points in geometry shader when doing transform feedback
is problematic. This patch disables the emulation.

Tested with piglit test ext_transform_feedback-points.
Also tested with MTT glretrace, mesa demos pointblast and spriteblast.

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-07 13:46:43 -06:00
Jason Ekstrand
4e17b59f6c anv/query: Use snooping on !LLC platforms
Commit b2c97bc789 which made us start
using a busy-wait for individual query results also messed up cache
flushing on !LLC platforms.  For one thing, I forgot the mfence after
the clflush so memory access wasn't properly getting fenced.  More
importantly, however, was that we were clflushing the whole query range
and then waiting for individual queries and then trying to read the
results without clflushing again.  Getting the clflushing both correct
and efficient is very subtle and painful.  Instead, let's side-step the
problem by just snooping.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-07 12:17:20 -07:00
Emil Velikov
5318d1ff94 anv: provide anv_gem_busy() stub for the tests
Otherwise linking way fail.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100600
Fixes: f195d40eca ("anv/device: Add a helper for querying whether a BO is busy")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
2017-04-07 19:45:58 +01:00
Rob Clark
3b32ec3ba6 gallium/util: tweak backtrace format with libunwind
To work with addr2line.sh we also need the relative offset within the
DSO.  And addr2line.sh gets confused by the leading stackframe number.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-07 08:23:02 -04:00
Rob Clark
91dfa02125 gallium/util: cache symbol lookup with libunwind
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-07 08:23:02 -04:00
Rob Clark
7c69ea553b gallium/util: fix missing limit check in libunwind backtrace
Fixes: 70c272004f ("gallium/util: libunwind support")
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-07 08:23:02 -04:00
Timothy Arceri
8046a944d0 mesa: fix renderbuffer leak
We don't need to call _mesa_reference_renderbuffer() for the first
assignment as refCount starts at 1. For swrast we work around the
fact we will indirectly call _mesa_reference_renderbuffer() by
resetting refCount to 0.

Fixes: 32141e53d1 (mesa: tidy up renderbuffer RefCount initialisation)

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-04-07 19:48:10 +10:00
Samuel Iglesias Gonsálvez
1c934bc71b anv/blorp: sample input attachments with resolves on BDW
On Broadwell we still need to do a resolve between the subpass
that writes and the subpass that reads when there is a
self-dependency because HW could not see fast-clears and works
on the render cache as if there was regular non-fast-clear surface.

Fixes 16 tests on BDW:

dEQP-VK.renderpass.formats.*.input.clear.store.self_dep*

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-07 07:49:43 +02:00
Fredrik Höglund
fd0f539e60 radv: don't call radeon_check_space in radv_BindDescriptorSets
This appears to be a leftover from an earlier version of this function.
Nothing is emitted into the CS.

Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-04-07 00:54:46 +02:00
Fredrik Höglund
c1f8c83cb6 radv: implement VK_KHR_descriptor_update_template
All offsets and strides are precomputed by
radv_CreateDescriptorUpdateTemplateKHR and stored in the template.

v2: Move the new struct declarations from radv_descriptor_set.h
    to radv_private.h (Bas)

Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-04-07 00:54:46 +02:00
Fredrik Höglund
c6487bc48b radv: implement VK_KHR_push_descriptor
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-04-07 00:54:46 +02:00
Fredrik Höglund
3b33f03913 radv: replace an assertion with a conditional
Replace the !binding_layout->immutable_samplers assertion in
radv_update_descriptor_sets with a conditional.

The Vulkan specification does not say that it is illegal to update
a sampler descriptor when it is immutable; only that pImageInfo is
ignored.

This change is also needed for push descriptors, because valid
descriptors must be pushed for all bindings accessed by shaders,
including immutable sampler descriptors.

Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-04-07 00:54:46 +02:00
Fredrik Höglund
a6e94a87cb radv: refactor radv_UpdateDescriptorSets
Move the implementation into a separate function that takes a
cmd_buffer and a dstSetOverride parameter.

When cmd_buffer is not NULL, radv_update_descriptor_sets calls
cs_add_buffer directly instead of updating the buffer list.

This will be used to implement VK_KHR_push_descriptor.

Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-04-07 00:54:46 +02:00
Samuel Pitoiset
bedd89429f gallium/radeon: fix typo in radeon_winsys.h
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-07 00:48:19 +02:00
Samuel Pitoiset
7839243085 mesa/main: simplify _mesa_IsRenderbuffer()
_mesa_lookup_renderbuffer() already checks if 'id' is non-zero.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-04-07 00:48:01 +02:00
Timothy Arceri
93d7014c1d mesa: stop abstracting texture object hashtable locking
This doesn't do anything useful so just remove it.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-04-07 08:03:02 +10:00
Timothy Arceri
31cb6fd0a3 mesa: stop abstracting buffer object hashtable locking
This doesn't do anything useful so just remove it.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-04-07 08:02:54 +10:00
Jason Ekstrand
c9c39812b9 i965/blorp: Bump the batch space estimate
Commit f938354362 recently increased the
alignment on vertex buffer data from 32 to 64.  This caused us to
consume a bit more batch than we were before and we now go over the
estimate by a small amount on certain blits on gen8+.  This commit bumps
then gen8 batch estimate by a bit to compensate.  Haswell and older
still seems to be well within the limit.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100582
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-04-06 13:32:29 -07:00
Jordan Justen
0370350d11 intel/aubinator: Stop searching after a custom handler is found
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-04-06 13:26:08 -07:00
Jordan Justen
d5bd0e411e intel/gen_decoder: return -1 for unknown command formats
Decoding with aubinator encountered a command of 0xffffffff. With the
previous code, it caused aubinator to jump 255 + 2 dwords to start
decoding again.

Instead we can attempt to detect the known instruction formats. If the
format is not recognized, then we can advance just 1 dword.

v2:
 * Update aubinator_error_decode
 * Actually convert the length variable returned into a *signed* integer
   in aubinator.c, intel_batchbuffer.c and aubinator_error_decode.c.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-04-06 13:26:08 -07:00
Jordan Justen
7c33372f82 intel/gen_decoder: Fix length for Media State/Object commands
From BDW PRM, Volume 6: Command Stream Programming, 'Render Command
Header Format'.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-04-06 13:26:08 -07:00
Jordan Justen
3c77a57222 intel/aubinator_error_decode: Fix structure decode data
The call to gen_print_group should provide a pointer to the beginning
of the the structure data, not the start of the batch data.

Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-04-06 13:25:38 -07:00
Nicolai Hähnle
2357e7a202 st/pbo: select the right swizzle for instance IDs
The system value only has an X component, and radeonsi started
checking that in debug builds.

Reported-by: Michel Dänzer <michel.daenzer@amd.com>
Fixes: 4cf2942777 ("radeonsi: support 64-bit system values")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-06 20:26:27 +02:00
Jason Ekstrand
b2c97bc789 anv/query: Busy-wait for available query entries
Before, we were just looking at whether or not the user wanted us to
wait and waiting on the BO.  Some clients, such as the Serious engine,
use a single query pool for hundreds of individual query results where
the writes for those queries may be split across several command
buffers.  In this scenario, the individual query we're looking for may
become available long before the BO is idle so waiting on the query pool
BO to be finished is wasteful. This commit makes us instead busy-loop on
each query until it's available.

This significantly reduces pipeline bubbles and improves performance of
The Talos Principle on medium settings (where the GPU isn't overloaded
with drawing) by around 20% on my SkyLake gt4.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Eero Tamminen <eero.t.tamminen@intel.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
2017-04-05 21:17:11 -07:00
Jason Ekstrand
f195d40eca anv/device: Add a helper for querying whether a BO is busy
This is a bit more efficient than using GEM_WAIT with a timeout of 0.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-05 21:17:11 -07:00
Tim Rowley
d5157ddca4 swr: [rasterizer core] SIMD16 Frontend WIP
Implement widened binner for SIMD16

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-05 18:20:45 -05:00
Tim Rowley
b8515d5c0f swr: [rasterizer core] Enable 8x2 backend
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-05 18:20:45 -05:00
Tim Rowley
c1b7a5780d swr: [rasterizer codegen] remove copy of mako
mako is already a mesa build requirement, extra copy not needed.

Tested building against mesa build baseline (mako-0.8.0).

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-05 18:20:45 -05:00
Tim Rowley
97dab87a22 swr: [rasterizer core/memory] Move intrinics to _simd functions
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-05 18:20:19 -05:00
Tim Rowley
117fc582f8 swr: [rasterizer core] Programmable sample position support
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-05 18:19:25 -05:00
Tim Rowley
3c52a7316a swr: [configure.ac/scons] require c++14
New C++ features used by upcoming swr changes.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-05 18:19:16 -05:00
Tim Rowley
e5fdfcf836 swr: [rasterizer core] Fix center sample pattern
Fix long hidden bug in rasterizer handling of center sample pattern.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-05 18:19:10 -05:00
Tim Rowley
c12b61d158 swr: [rasterizer core/memory] Fix missing avx512 storetile
Fix pre-processor macro handing to eliminate silently missing
implementation for AVX512.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-05 18:19:04 -05:00
Tim Rowley
cd6c200223 swr: [rasterizer core] SIMD16 Frontend WIP
Implement widened VS output for SIMD16

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-04-05 18:18:36 -05:00
Timothy Arceri
1bfeb65397 mesa: use internal function when deleting buffers
This avoids validation and looking up the buffer target for a second time.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-06 08:25:36 +10:00
Timothy Arceri
8feb5bb402 mesa: rework bind_buffer_object()
This allows internal users to pass buffer objects directly and
allows for KHR_no_error support to be more easily added.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-06 08:25:36 +10:00
Timothy Arceri
d1c1544a49 mesa: small texstate tidy up
Possibly more efficient, either way it makes the code easier to
follow.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-06 08:25:36 +10:00
Timothy Arceri
32141e53d1 mesa: tidy up renderbuffer RefCount initialisation
42aaa548 changed the renderbuffer initialisation of RefCount from
1 to 0.

This is inconsitent with how we use RefCount elsewhere. Also every
driver implementation of NewRenderbuffer() calls
_mesa_init_renderbuffer() so its safe to set it there.

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-06 08:17:10 +10:00
Christian Gmeiner
e75001811e Revert "etnaviv: Cannot render to rb-swapped formats"
This reverts commit 658568941d.

With the help of shader variants we can render to rb-swapped
formats now. Fixes about 60 piglits.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-04-05 19:58:25 +02:00
Christian Gmeiner
7f62ffb68a etnaviv: add support for rb swap
If we render to rb swapped format we will create a shader variant doing
the involved swizzing in the pixel shader.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-04-05 19:58:22 +02:00
Christian Gmeiner
8d9a31ef97 etnaviv: adapt shader-db output for variant support
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
2017-04-05 19:58:18 +02:00
Christian Gmeiner
20fa8f1989 etnaviv: bring back shader-db traces
If shader-db run, create a standard variant immediately
(as otherwise nothing will trigger the shader to be
actually compiled).

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
2017-04-05 19:58:13 +02:00
Christian Gmeiner
7d2a806266 etnaviv: add etna_shader_key and generate variants if needed
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-04-05 19:58:10 +02:00
Christian Gmeiner
9da54fdcb5 etnaviv: pass a preallocated variant to compiler
In the long run the compiler needs to know the specifc variant
'key' in order to compile appropriate assembly. With this commit
the variant knows its shader and we are able pass the preallocated
variant into etna_compile_shader(..). This saves us from passing
extra ptrs everywhere.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
2017-04-05 19:58:07 +02:00
Christian Gmeiner
ffd4762310 etnaviv: make specs const
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-04-05 19:58:03 +02:00
Christian Gmeiner
ecc2474e59 etnaviv: add struct etna_shader_state
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
2017-04-05 19:57:59 +02:00
Christian Gmeiner
65e9bd2703 etnaviv: add basic shader variant support
This commit adds some basic infrastructure to handle shader
variants. We are still creating exactly one shader variant
for each shader.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
2017-04-05 19:57:56 +02:00
Christian Gmeiner
59b459ac17 etnaviv: s/etna_shader/etna_shader_variant
Prep work to add shader variant support.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-04-05 19:57:52 +02:00
Christian Gmeiner
54e367bf0e etnaviv: remove not needed forward declarations
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-04-05 19:57:47 +02:00
Emil Velikov
13181abc6d gallium/util: honour LIBUNWIND_CFLAGS
Fixes: 70c272004f ("gallium/util: libunwind support")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-05 18:42:56 +01:00
Rhys Kidd
115e684792 travis: Add radeonsi to continuous integration
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-05 18:19:51 +01:00
Rhys Kidd
787ab42716 travis: Add radv vulkan driver to continuous integration
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-05 18:19:28 +01:00
Emil Velikov
a6840efc09 anv: provide required gem stubs for the tests
Introduce stubs to anv_gem_stub.c that match the anv_gem.c ones.
Otherwise we may get link-time errors, when building the tests.

v2: Introduce all the missing stubs at once.

Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Vinson Lee <vlee@freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100574
Fixes: c964f0e485 ("anv: Query the kernel for reset status")
Fixes: 651ec926fc ("anv: Add support for 48-bit addresses")
Fixes: 060a6434ec ("anv: Advertise larger heap sizes")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
---
I've intentionally kept the order the same identical to the anv_gem.c.
This way we can easily grep & diff in the future ;-)
2017-04-05 17:54:38 +01:00
Emil Velikov
8307124829 configure.ac: pthread-stubs is not a thing on GNU/kFreeBSD
As mentioned on the xcb mailing list, the platform uses the GLIBC
forwarding mechanism.

https://lists.freedesktop.org/archives/xcb/2016-November/010896.html

Cc: Andreas Boll <andreas.boll.dev@gmail.com>
Reported-by: Andreas Boll <andreas.boll.dev@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-05 17:47:41 +01:00
Aaron Watry
4d0399f175 st/clover: Fix build after shrink of pipe_box
Fixes: 3dfe61e ("gallium: decrease the size of pipe_box - 24 -> 16 bytes")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100569
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Tested-by: Vinson Lee <vlee@freedesktop.org>
2017-04-05 09:19:48 -05:00
Alex Deucher
d921af62f5 radeonsi: add new polaris10 pci id
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-04-05 10:13:08 -04:00
Nicolai Hähnle
9e1b2e4d97 radeonsi: enable ARB_shader_ballot
Require LLVM 5.0 or later because LLVM 4.0 is easily fooled into
putting the lane select of llvm.amdgcn.readlane into a VGPR and then
fails to continue to compile.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 15:29:44 +02:00
Nicolai Hähnle
8b13b11f11 radeonsi: optimization barriers to work around LLVM deficiencies
Notably, llvm.amdgcn.readfirstlane and llvm.amdgcn.icmp may be hoisted
out of loops or if/else branches in cases like

  if (cond) {
    v = readFirstInvocationARB(x);
    ... use v ...
  } else {
    v = readFirstInvocationARB(x);
    ... use v ...
  }
===>
  v = readFirstInvocationARB(x);
  if (cond) {
    ... use v ...
  } else {
    ... use v ...
  }

The optimization barrier is a heavy hammer to stop that until LLVM
is taught the semantics of the intrinsic properly.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 15:29:44 +02:00
Nicolai Hähnle
24d4fbe226 radeonsi: strengthen emit_optimization_barrier
LLVM will lift inline assembly out of if-else-blocks if both paths have
the same inline assembly. Prevent this by adding an irrelevant unique
text to the assembly.

This requires the LLVM assembly parser to be initialized.

Furthermore, allow forcing subsequent computations to happen after the
optimization barrier by defining a data dependency.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 15:29:43 +02:00
Nicolai Hähnle
5c4602f4a2 radeonsi: emit TGSI_OPCODE_READ_*
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 15:29:43 +02:00
Nicolai Hähnle
b46e3a30b7 radeonsi: emit TGSI_OPCODE_BALLOT
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 15:29:43 +02:00
Nicolai Hähnle
a3075f4799 radeonsi: implement TGSI_SEMANTIC_SUBGROUP_*
64-bit system values are stored as v2i32 to simplify the fetch logic.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 15:29:43 +02:00
Nicolai Hähnle
4cf2942777 radeonsi: support 64-bit system values
For simplicitly, always store system values as 32-bit values or arrays
of 32-bit values. 64-bit values are unpacked and packed accordingly.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 15:29:43 +02:00
Nicolai Hähnle
1ee57b16be radeonsi: bump RADEON_LLVM_MAX_SYSTEM_VALUES
ARB_shader_ballot introduces 7 new system values that can be used
in all shader stages.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 15:29:42 +02:00
Nicolai Hähnle
ee2d93eb92 st/mesa: enable ARB_shader_ballot
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 15:29:42 +02:00
Nicolai Hähnle
84039cc1c3 st/glsl_to_tgsi: implement ARB_shader_ballot system variables
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 15:29:42 +02:00
Nicolai Hähnle
76e3dba289 st/glsl_to_tgsi: implement ARB_shader_ballot builtin functions
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 15:29:41 +02:00
Ilia Mirkin
08bd0aa507 tgsi: add SUBGROUP_* semantics
v2: add documentation (Nicolai)

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 15:29:41 +02:00
Ilia Mirkin
3650d7455f tgsi: add BALLOT/READ_* opcodes
v2 (Nicolai):
- BALLOT isn't per-channel
- expand the documentation (also for VOTE_*)

v3:
- only BALLOT returns a 64-bit lanemask (Boyan)
- relax the requirement on READ_INVOC: the invocation number to read
  from must be uniform within a sub-group. This matches the
  GL_ARB_shader_ballot spect (and the v_readlane instruction of AMD
  GCN)

v4:
- hopefully really fix the doc of VOTE_* returns (Ilia)

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
2017-04-05 15:29:34 +02:00
Nicolai Hähnle
d3e6f6d7f7 gallium: add PIPE_CAP_TGSI_BALLOT
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 15:29:31 +02:00
Nicolai Hähnle
b5711d5e1a glsl: add gl_SubGroup*ARB builtins
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 15:25:56 +02:00
Nicolai Hähnle
961b8e9afe glsl: add ARB_shader_ballot builtin functions
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 15:25:54 +02:00
Nicolai Hähnle
d37b7b5232 glsl: add ARB_shader_ballot operations
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 15:25:51 +02:00
Nicolai Hähnle
b8440ec9fa glsl: add ARB_shader_ballot enable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 15:25:48 +02:00
Nicolai Hähnle
4fdb691f10 mesa: add GL_ARB_shader_ballot boilerplate
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 15:25:40 +02:00
Emil Velikov
2c4c47dcb7 swr: automake: add gen_common.py to the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-05 13:16:28 +01:00
Emil Velikov
e664cfc5a7 intel: genxml: automake: include gen_bits_header.py in the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-05 13:16:28 +01:00
Emil Velikov
e180680980 intel: genxml: automake: polish automake rules
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-05 13:16:28 +01:00
Emil Velikov
e2adec3a17 amd/addrlib: automake: add all headers to the tarball
Fixes: 7f160efcde ("amd/addrlib: import gfx9 support")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-05 13:16:28 +01:00
Nicolai Hähnle
570e50af4b radeonsi: enable ARB_sparse_buffer
v2:
- fill in DRM version requirement
- disable on SI due to CP DMA faults

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:44:32 +02:00
Nicolai Hähnle
aee473eb01 radeonsi: disable SDMA clears and copies for sparse buffers
VM faults cannot be disabled for SDMA on <= VI.

We could still use SDMA by asking the winsys about which parts of the
buffers are committed. This is left as a potential future improvement.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:37:19 +02:00
Nicolai Hähnle
0a685ce9a7 gallium/radeon: implement pipe->resource_commit
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:37:19 +02:00
Nicolai Hähnle
e077c5fe65 gallium/radeon: transfers and invalidation for sparse buffers
Sparse buffers can never be mapped by the CPU.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:37:19 +02:00
Nicolai Hähnle
5969a373a1 gallium/radeon: implement sparse buffer creation
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:37:19 +02:00
Nicolai Hähnle
47e59a7e36 winsys/amdgpu: sparse buffer debugging helpers
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:37:19 +02:00
Nicolai Hähnle
0baee15596 winsys/amdgpu: take fences when freeing a backing buffer
We never add fences to backing buffers during submit. When we free a
backing buffer, it must inherit the sparse buffer's fences, so that it
doesn't get re-used prematurely via the cache.

v2:
- remove pipe_mutex_*

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:37:18 +02:00
Nicolai Hähnle
79dae12b41 winsys/amdgpu: add sparse buffers to CS
... and implement the corresponding fence handling.

v2:
- add missing bit in amdgpu_bo_is_referenced_by_cs_with_usage
- remove pipe_mutex_*

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:37:18 +02:00
Nicolai Hähnle
667da4eaed winsys/amdgpu: sparse buffer creation / destruction / commitment
This is the bulk of the buffer allocation logic. It is fairly simple and
stupid. We'll probably want to use e.g. interval trees at some point to
keep track of commitments, but Mesa doesn't have an implementation of those
yet.

v2:
- remove pipe_mutex_*
- fix total_backing_pages accounting
- simplify by using the new VA_OP_CLEAR/REPLACE kernel interface

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:37:18 +02:00
Nicolai Hähnle
e348248647 winsys/amdgpu: add sparse buffer data structures
v2:
- remove pipe_mutex_*
- use a simple page commitment array

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:37:18 +02:00
Nicolai Hähnle
f3e514361c winsys/amdgpu: extend amdgpu_add_fence to allow adding multiple fences
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:37:18 +02:00
Nicolai Hähnle
ae4f442304 winsys/amdgpu: build handles and flags list late on submit thread
This probably has only minor performance effects, but it simplifies some
subsequent code slightly.

Ideally, it could also be used to simplify the handling of slab buffers
in the same way, but unfortunately that's not possible as long as we need
indices for relocations.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:37:17 +02:00
Nicolai Hähnle
0e476f6c03 winsys/amdgpu: share common code in amdgpu_add_fence_dependencies
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:37:17 +02:00
Nicolai Hähnle
1c125fdef0 winsys/amdgpu: extract amdgpu_do_add_real_buffer
We will use it for delayed adding of sparse buffers' backing buffers.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:37:17 +02:00
Nicolai Hähnle
a338f427ac winsys/radeon: sparse buffers will not be supported
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:37:17 +02:00
Nicolai Hähnle
c2637a17d9 radeon/winsys: add sparse buffer interface
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:37:17 +02:00
Nicolai Hähnle
d9bc4d8305 st/mesa: plumbing for sparse buffers
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:37:16 +02:00
Nicolai Hähnle
2599b23f7c st/mesa: enable ARB_sparse_buffer when supported
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:37:16 +02:00
Nicolai Hähnle
634266c952 trace: add resource_commit pass-through
v2: fix return type to bool (Marek)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:37:16 +02:00
Nicolai Hähnle
0e1c75acae ddebug: add resource_commit pass-through
v2: fix return type to bool (Marek)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:37:16 +02:00
Nicolai Hähnle
d6e6fa01a5 gallium: add sparse buffer interface and capability
v2:
- explain the resource_commit interface in more detail

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:37:04 +02:00
Nicolai Hähnle
4e6feacf6a mesa: implement sparse buffer commitment
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:31:02 +02:00
Nicolai Hähnle
d6fcbe1c2a mesa: implement sparse storage buffer allocation
v2:
- spec quote and style (Ian)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:31:01 +02:00
Nicolai Hähnle
94227684c4 mesa: implement SPARSE_BUFFER_PAGE_SIZE_ARB
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:31:01 +02:00
Nicolai Hähnle
d085c7ce7c mesa: Add GL_ARB_sparse_buffer boilerplate
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:31:01 +02:00
Nicolai Hähnle
a0970de839 configure.ac: require libdrm_amdgpu 2.4.77
The sparse buffer implementation requires amdgpu_bo_va_op_raw.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:30:42 +02:00
Matt Turner
d5ee55f028 mesa: Replace program locks with atomic inc/dec.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-04-05 14:54:49 +10:00
Jason Ekstrand
060a6434ec anv: Advertise larger heap sizes
Instead of just advertising the aperture size, we do something more
intelligent.  On systems with a full 48-bit PPGTT, we can address 100%
of the available system RAM from the GPU.  In order to keep clients from
burning 100% of your available RAM for graphics resources, we have a
nice little heuristic (which has received exactly zero tuning) to keep
things under a reasonable level of control.

Reviewed-by: Kristian H. Kristensen <krh@bitplanet.net>
2017-04-04 18:33:52 -07:00
Jason Ekstrand
651ec926fc anv: Add support for 48-bit addresses
This commit adds support for using the full 48-bit address space on
Broadwell and newer hardware.  Thanks to certain limitations, not all
objects can be placed above the 32-bit boundary.  In particular, general
and state base address need to live within 32 bits.  (See also
Wa32bitGeneralStateOffset and Wa32bitInstructionBaseOffset.)  In order
to handle this, we add a supports_48bit_address field to anv_bo and only
set EXEC_OBJECT_SUPPORTS_48B_ADDRESS if that bit is set.  We set the bit
for all client-allocated memory objects but leave it false for
driver-allocated objects.  While this is more conservative than needed,
all driver allocations should easily fit in the first 32 bits of address
space and keeps things simple because we don't have to think about
whether or not any given one of our allocation data structures will be
used in a 48-bit-unsafe way.

Reviewed-by: Kristian H. Kristensen <krh@bitplanet.net>
2017-04-04 18:33:52 -07:00
Jason Ekstrand
439da38d18 anv: Replace anv_bo::is_winsys_bo with a uint32_t flags
Reviewed-by: Kristian H. Kristensen <krh@bitplanet.net>
2017-04-04 18:33:52 -07:00
Jason Ekstrand
f938354362 i965/blorp: Align vertex buffers to 64B
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-04-04 18:33:52 -07:00
Jason Ekstrand
5d1ba2cb04 anv/blorp: Align vertex buffers to 64B
This fixes issues seen when adding support for full 48-bit addresses.
The 48-bit addresses themselves have nothing to do with it other than
that it caused the kernel to place buffers slightly differently so they
interacted differently with the caches.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-04-04 18:33:52 -07:00
Jason Ekstrand
c964f0e485 anv: Query the kernel for reset status
When a client causes a GPU hang (or experiences issues due to a hang in
another client) we want to let it know as soon as possible.  In
particular, if it submits work with a fence and calls vkWaitForFences or
vkQueueQaitIdle and it returns VK_SUCCESS, then the client should be
able to trust the results of that rendering.  In order to provide this
guarantee, we have to ask the kernel for context status in a few key
locations.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-04 18:33:52 -07:00
Jason Ekstrand
82573d0f75 anv: Check for device loss at the end of WaitForFences
It's possible that the device could have been lost while we were
waiting.  We should let the user know if this has happened.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-04 18:33:51 -07:00
Jason Ekstrand
c6f69eea6a anv/pipeline: Properly handle unset gl_Layer and gl_ViewportIndex
When the shader does not set one of these values, they are supposed to
get a default value of 0.  We have hardware bits in 3DSTATE_CLIP for
this but haven't been setting them.  This fixes the intermittent failure
of dEQP-VK.geometry.layered.3d.render_to_default_layer.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-04-04 18:33:51 -07:00
Jason Ekstrand
3503b2714b i965/fs: Always provide a default LOD of 0 for TXS and TXL
We already provide a default LOD for textureQueryLevels and texture() on
non-fragment stages.  However, there are more cases where one is needed
such as textureSize(gsampler2DMS*) in SPIR-V.  Instead of trying to list
out all of the cases one at a time, just provide the default for all TXS
and TXL operations.  This fixes a shader validation error in the new
Sascha deferredmultisampling demo which uses textureSize(gsampler2DMS).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100391
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-04-04 18:33:35 -07:00
Kenneth Graunke
c5bf7cb529 mesa: Require mipmap completeness for glCopyImageSubData(), sometimes.
This patch makes glCopyImageSubData require mipmap completeness when the
texture object's built-in sampler object has a mipmapping MinFilter.

Fixes (on i965):
dEQP-GLES31.functional.debug.negative_coverage.*.buffer.copy_image_sub_data

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-04-04 17:35:18 -07:00
Vinson Lee
c161a10462 libgl-xlib: Link with libunwind.
Fix linking error.

  CXXLD    libGL.la
../../../../src/gallium/auxiliary/.libs/libgallium.a(u_debug_stack.o): In function `debug_backtrace_capture':
src/gallium/auxiliary/util/u_debug_stack.c:59: undefined reference to `_Ux86_64_getcontext'
src/gallium/auxiliary/util/u_debug_stack.c:60: undefined reference to `_ULx86_64_init_local'
src/gallium/auxiliary/util/u_debug_stack.c:62: undefined reference to `_ULx86_64_step'
src/gallium/auxiliary/util/u_debug_stack.c:71: undefined reference to `_ULx86_64_get_proc_info'
src/gallium/auxiliary/util/u_debug_stack.c:73: undefined reference to `_ULx86_64_get_proc_name'
src/gallium/auxiliary/util/u_debug_stack.c:65: undefined reference to `_ULx86_64_step'

Fixes: 70c272004f ("gallium/util: libunwind support")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100562
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2017-04-04 16:47:41 -07:00
Jason Ekstrand
1fde054b8f intel/isl: Refactor and clerify gen8 alignment calculations
Adding the actual table from the docs makes it clearer exactly what the
restrictions are.  In particular, it becomes clear that compressed
textures ignore the alignment parameters in RENDER_SURFACE_STATE.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-04-04 14:51:57 -07:00
Francisco Jerez
0de17f52a5 drirc: Set glsl_zero_init for Kerbal Space Program.
This fixes the stripes of garbage rendered on the floor of the vehicle
assembly building among other rendering issues.  The reason for the
misrendering seems to be that some of the GLSL shaders used by the
application use variables before initializing them, incorrectly
assuming that they will be implicitly set to zero by the
implementation.

Acked-by: Matt Turner <mattst88@gmail.com>
2017-04-04 14:13:03 -07:00
Lionel Landwerlin
e8d9b76f63 intel: tools: add aubinator_error_decode tool
This is pretty much the same tool as what i-g-t has, only with a more
fancy decoding of the instructions/registers. It also doesn't support
anything before gen4.

v2 (from Matt): Drop authors
                Remove undefined automake variable

v3: Fix incorrect offsets for dword > 1 (Jordan)

v4: Fix decompression error with large blobs (Jordan)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2017-04-04 21:22:26 +01:00
Lionel Landwerlin
567d77885e intel: genxml: add RING_BUFFER_CTL registers
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-04 21:22:26 +01:00
Lionel Landwerlin
6f260ff049 intel: genxml: add FAULT_REG register
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-04 21:22:26 +01:00
Lionel Landwerlin
ca2771fa18 intel: genxml: add gen7 ERR_INT register
v2: add register to gen7.5 (Matt)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-04 21:22:26 +01:00
Lionel Landwerlin
84613bf6d5 intel: genxml: add ACTHD registers
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-04 21:22:26 +01:00
Lionel Landwerlin
0f195f22aa intel: genxml: add GFX_ARB_ERROR_RPT register
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-04 21:22:26 +01:00
Lionel Landwerlin
d1a7a54d77 intel: genxml: add INSTDONE registers
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-04 21:22:26 +01:00
Marek Olšák
18b12bf533 targets: export radeon winsys_create functions to silence LLVM warning
It silences the following radeonsi LLVM warning due to a previous
commit adding an LLVM workaround:
  "mesa: for the -simplifycfg-sink-common option: may only occur zero or one
   times!"

Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by; Emil Velikov <emil.velikov@collabora.com>
2017-04-04 22:15:47 +02:00
Constantine Kharlamov
6ee486899b r600g: check rasterizer primitive states like in radeonsi
Specifically, non-line primitives skipped, and defaulting to reset on
each packet.

The skip of non-line primitives saves ≈110 resetting of
PA_SC_LINE_STIPPLE register per frame in Kane&Lynch2.

Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-04-04 22:15:47 +02:00
Constantine Kharlamov
7ade08e2a8 r600g: extract a code into a r600_emit_rasterizer_prim_state()
Also change gs_output_prim type: unsigned → pipe_prim_type. The idea of
the code is mostly taken from radeonsi. The new code operating on
prev/curr rast_primitives saves ≈15 reloads of PA_SC_LINE_STIPPLE per
frame in Kane&Lynch2

Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-04-04 22:15:47 +02:00
Constantine Kharlamov
fa8bc90990 r600g/radeonsi: use the correct types (taken from pipe_draw_info)
Note: si_shader.h has also "type" variable that should be changed to
"enum pipe_prim_type", however it triggers a bunch of warnings about
unhandled switches, so due not knowing the correct way to handle them, I
decided to leave it as is.

Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-04-04 22:15:47 +02:00
Constantine Kharlamov
ef62a7651c r600g: remove duplicate memset by using a pointer, and constify args
Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-04-04 22:15:47 +02:00
Elie TOURNIER
ba5b1ab3e0 glsl: remove unused file
udivmod64 appears in src/compiler/glsl/builtin_int64.h and src/compiler/glsl/udivmod.h
The second file seems unused.
Fix commit 6b03b345eb

This change doesn't affect shader-db.

Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-04-04 18:37:42 +01:00
Marek Olšák
6ca46c3d77 radeonsi: access gallivm through ctx in most places
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-04 16:55:21 +02:00
Marek Olšák
04e4fe594b radeonsi: use ctx->types instead of bld->types etc.
even vec_type is f32.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-04 16:55:19 +02:00
Marek Olšák
7a5e6dcba5 radeonsi: use i32_0/1 instead of *int_bld.zero/one in most places
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-04 16:55:16 +02:00
Marek Olšák
7216e1d8af gallium: decrease the size of pipe_draw_info - 88 -> 80 bytes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-04 11:14:43 +02:00
Marek Olšák
295f4f56cb gallium: decrease the size of pipe_vertex_element - 16 -> 8 bytes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-04 11:14:43 +02:00
Marek Olšák
e6428092f5 gallium: decrease the size of pipe_resource - 64 -> 48 bytes
Some other changes needed here.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-04 11:14:43 +02:00
Marek Olšák
3dfe61ed6e gallium: decrease the size of pipe_box - 24 -> 16 bytes
Also:

pipe_transfer: 48 -> 40 bytes.
pipe_blit_info = 176 -> 160 bytes.

v2: add a comment at pipe_box

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-04 11:14:43 +02:00
Marek Olšák
9869a3b3ba gallium: decrease the size of pipe_sampler_view - 48 -> 32 bytes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-04 11:14:43 +02:00
Marek Olšák
4648bc2a8f gallium: decrease the size of pipe_surface - 48 -> 40 bytes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-04 11:14:43 +02:00
Marek Olšák
eb0fd0e5f8 gallium: decrease the size of pipe_framebuffer_state - 96 -> 80 bytes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-04 11:14:43 +02:00
Marek Olšák
19bc74f513 gallium: decrease the size of pipe_stream_output_info - 532 -> 268 bytes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-04 11:14:43 +02:00
Marek Olšák
15ff2f7aa9 gallium: decrease the size of pipe_rasterizer_state - 36 -> 32 bytes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-04 11:14:43 +02:00
Marek Olšák
18e760346a amd/addrlib: second update for Vega10 + bug fixes
Highlights:
- Display needs tiled pitch alignment to be at least 32 pixels
- Implement Addr2ComputeDccAddrFromCoord().
- Macro-pixel packed formats don't support Z swizzle modes
- Pad pitch and base alignment of PRT + TEX1D to 64KB.
- Fix support for multimedia formats
- Fix a case "PRT" entries are not selected on SI.
- Fix wrong upper bits in equations for 3D resource.
- We can't support 2d array slice rotation in gfx8 swizzle pattern
- Set base alignment for PRT + non-xor swizzle mode resource to 64KB.
- Bug workaround for Z16 4x/8x and Z32 2x/4x/8x MSAA depth texture
- Add stereo support
- Optimize swizzle mode selection
- Report pitch and height in pixels for each mip
- Adjust bpp/expandX for format ADDR_FMT_GB_GR/ADDR_FMT_BG_RG
- Correct tcCompatible flag output for mipmap surface
- Other fixes and cleanups

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-04 11:14:43 +02:00
Marek Olšák
3e7d62a774 radeonsi: use i32_0 and i32_1 more
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-04 11:14:43 +02:00
Marek Olšák
29adaa19ac radeonsi: remove most uses of lp_build_const*
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-04 11:14:43 +02:00
Marek Olšák
7cec96a038 radeonsi: clean up 'radeon_bld' references
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-04 11:14:43 +02:00
Marek Olšák
0fb5a505fa radeonsi: fix broken texture filtering on SI-CIK since GFX9 changes
Don't clear state[7] on SI-CIK, and only do the meta stuff on VI+.
Fixes: 5abf60076c ("radeonsi/gfx9: image descriptor changes in mutable fields")

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100531
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-04 11:14:43 +02:00
Juan A. Suarez Romero
1bcdf74cdd bin/get-fixes-pick-list.sh: fix typo
Replace "nore" by "more".

Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2017-04-04 09:05:44 +02:00
Mauro Rossi
72175bd2a5 android: intel: genxml: fix genX_xml.h generation rules
Recent changes in Makefile.sources merged the aubinator files in
a unique list of generated files and genxml/genX_xml.h is now needed
to avoid the following building error:

ninja: error: '.../genxml/genX_xml.h', needed by '.../genxml/genX_xml.h',
missing and no known rule to make it
build/core/ninja.mk:148: recipe for target 'ninja_wrapper' failed

Fixes: 0f83c05 "intel: genxml: compress all gen files into one"
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-04-04 09:10:46 +03:00
Jason Ekstrand
405ef7bb33 intel/vec4: Add some fall through comments
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-03 16:58:35 -07:00
Bartosz Tomczyk
64b3aa7ad8 mesa/glthread: Avoid unnecessary batch reallocation
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-04-04 09:56:52 +10:00
Bas Nieuwenhuizen
6e5e8a2e49 radv: Increase descriptor limits.
We supported more generally. Decreased the dynamic buffers though, as
we only support 16 for uniform+storage.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2017-04-04 01:47:47 +02:00
Bartosz Tomczyk
95720851e2 mesa/glthread: fix misaligned address access
Address sanitizer reports lot of misaligned access:
SUMMARY: AddressSanitizer: undefined-behavior main/marshal.c:276:31 in
main/marshal.c:276:31: runtime error: load of misaligned address 0x631000104866 for type
'const GLuint' (aka 'const unsigned int'), which requires 4 byte alignment
0x631000104866: note: pointer points here
 92 88 00 00 00 00  00 00 4a 03 0c 00 93 88  00 00 00 00 00 00 02 01  0c 00 40 8d 00 00 00 00  00 00
             ^
SUMMARY: AddressSanitizer: undefined-behavior main/marshal_generated.c:28725:12 in
main/marshal_generated.c:28726:12: runtime error: member access within misaligned address 0x6310003fc874 for type
'struct marshal_cmd_VertexAttribPointer', which requires 8 byte alignment
0x6310003fc874: note: pointer points here
  01 00 00 00 7a 02 20 00  00 00 00 00 be be be be  be be be be be be be be  be be be be be be be be
              ^
SUMMARY: AddressSanitizer: undefined-behavior main/marshal_generated.c:28726:12 in
main/marshal_generated.c:28726:12: runtime error: store to misaligned address 0x6310003fc87c for type
'GLint' (aka 'int'), which requires 8 byte alignment
0x6310003fc87c: note: pointer points here
  00 00 00 00 be be be be  be be be be be be be be  be be be be be be be be  be be be be be be be be

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-04-04 09:39:03 +10:00
Bartosz Tomczyk
bcb63ee63e glsl: Fix blob memory leak
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-04-04 09:22:29 +10:00
Bas Nieuwenhuizen
a4c4efad89 radv: Rework guard band calculation.
We want the guardband_x/y to be the largerst scalars such that each
viewport scaled by that amount is still a subrange of [-32767, 32767].

The old code has a couple of issues:
1) It used scissor instead of viewport_scissor, potentially taking into
   account a viewport that is too small and therefore selecting a scale
   that is too large.
2) Merging the viewports isn't ideal, as for example viewports with
   boundaries [0,1] and [1000, 1001] would allow a guardband scale of ~30k,
   while their union [0, 1001] only allows a scale of ~32.

The new code just determines the guardband per viewport and takes the minimum.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2017-04-03 23:03:46 +02:00
Bas Nieuwenhuizen
d64f689f61 radv: Enable VK_KHR_incremental_present.
Just enabling the driver-independent implementation that Jason did.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-03 23:00:07 +02:00
Jason Ekstrand
0817110969 anv: Implement VK_KHR_incremental_present
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-04-03 13:51:08 -07:00
Jason Ekstrand
be1ecd8c6e vulkan/wsi/wayland: Pass damage through to the compositor
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-04-03 13:51:08 -07:00
Jason Ekstrand
f82b6c6272 vulkan/wsi: Plumb present regions through the common code
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2017-04-03 13:51:08 -07:00
Jason Ekstrand
3598a2907c vulkan/wsi: Fix some line wrapping
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-04-03 13:51:08 -07:00
Dave Airlie
22b116171f radv: fix interp at sample code.
Interp at sample needs to use the center, since the sample
positions it retrieves are relative to the center.

This fixes a bunch of CTS tests with multisample_interpolation.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-04 05:55:21 +10:00
Dave Airlie
1171b304f3 radv: overhaul fragment shader sample positions.
The current code was broken, and I decided to redesign it instead.

This puts the sample positions for all samples into the queue
constant descriptor buffer after all the spill/ring descriptors.

It then uses a single offset register to point how far into the
samples the samples for num_samples are. This saves one user sgpr
and means we only generate the sample position data in the rare
single case where we need it currently.

This doesn't fix the failing CTS tests without the followup
fix.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-04 05:55:15 +10:00
Lionel Landwerlin
471c1bc7cc aubinator/gen_decoder/i965: decode instructions from dword 0
Some packets like 3DSTATE_VF_STATISTICS, 3DSTATE_DRAWING_RECTANGLE,
3DPRIMITIVE, PIPELINE_SELECT, etc... have configurable fields in
dword0, we probably want to print those.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-03 20:45:34 +01:00
Lionel Landwerlin
04f2e80257 intel: gen_decoder: store pointer to current decoded field in iterator
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-03 20:45:34 +01:00
Dave Airlie
1e9e747d00 radv/ac: fix texture derivative ordering
The ordering NIR gives us is correct for the hw, this fixes:
dEQP-VK.glsl.texture_functions.texturegrad.* (mainly trigged
on isampler/usampler 3d textures.).

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-04 05:39:10 +10:00
Dave Airlie
303d22f319 radv/ac: round cube array coordinate before fixup.
This fixes:
dEQP-VK.glsl.texture_functions.texture.samplercubearray*

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-04 05:39:07 +10:00
Dave Airlie
5821f676ee radv: move to using common buffer load format.
Get rid of usage of SI.vs.load.input.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-04 05:37:52 +10:00
Brian Paul
b98ec1e920 util: fix MSVC warning in u_align_u32()
To silence
C:\Users\Brian\projects\mesa\src\util/u_vector.h(41) : warning C4146: unary
minus operator applied to unsigned type, result still unsigned

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2017-04-03 13:09:05 -06:00
Brian Paul
960f640c7a util: #include "c99_compat.h" to fix Windows build
Otherwise, we were getting the definition for 'inline' by chance from
some other preceeding #include.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2017-04-03 13:09:05 -06:00
Brian Paul
0fb2c16b3b util: s/SHA1_H/MESA_SHA1_H/
To follow the convention of other header include guards.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2017-04-03 13:09:05 -06:00
Brian Paul
7348df81b8 svga: add comment on svga_buffer_hw_storage_map()
Trivial.
2017-04-03 13:09:05 -06:00
Rhys Kidd
1572d11d89 travis: Support LLVM 3.8+ on Trusty-based Travis-CI via apt-get not apt addon
Per comments by Travis-CI, the apt addon is only really needed for the
container-based Precise builds, as they don't yet support Trusty on that platform.

Mesa currently uses Trusty fully-virtualized environment (due to sudo: required).

See further:
https://docs.travis-ci.com/user/trusty-ci-environment/#Fully-virtualized-via-sudo%3A-required
https://github.com/travis-ci/apt-source-whitelist/pull/205#issuecomment-216054237

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2017-04-03 19:43:12 +01:00
Grazvydas Ignotas
a6a38a038b util/u_atomic: provide 64bit atomics where they're missing
There are still some distributions trying to support unfortunate people
with old or exotic CPUs that don't have 64bit atomic operations. When
compiling for such a machine, gcc conveniently inserts a library call to
a helper, but it's implementation is missing and we get a linker error.
This allows us to provide our own implementation, which is marked weak
to prefer a better implementation, should one exist.

v2: changed copyright, some style adjustments
v3: [mattst88] Print results with AC_MSG_CHECKING/AC_MSG_RESULT

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93089
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-03 10:52:41 -07:00
Rob Clark
70c272004f gallium/util: libunwind support
It's kinda sad that (a) we don't have debug_backtrace support on !X86
and that (b) we re-invent our own crude backtrace support in the first
place.  If available, use libunwind instead.  The backtrace format is
based on what xserver and weston use, since it is nice not to have to
figure out a different format.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-03 11:32:17 -04:00
Rob Clark
c3c884c49c gallium/util: clean up stack frame printing
Prep work for next patch.

Ideally 'struct debug_stack_frame' would be opaque, but it is embedded
in a bunch of places.  But at least we can treat it opaquely.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-03 11:32:17 -04:00
Samuel Pitoiset
0c0b29591c st/mesa: add st_convert_image()
Should be used by the state tracker when glGetImageHandleARB()
is called in order to create a pipe_image_view template.

v3: - move the comment to *.c
v2: - make 'st' const
    - describe the function

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-03 11:29:31 +02:00
Samuel Pitoiset
90534e9dba st/mesa: make 'st' const in st_mesa_format_to_pipe_format()
This avoids a compilation warning since st_convert_image()
requires 'st' to be const.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-03 11:29:18 +02:00
Bartosz Tomczyk
8d919ba384 mesa/glthread: Call unmarshal_batch directly in glthread_finish
Call it directly when batch queue is empty. This avoids costly thread
synchronisation. This commit improves performance of games that have
previously regressed with mesa_glthread=true.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-03 10:33:31 +10:00
Timothy Arceri
dbdd7231c2 mesa: disable glthread when DEBUG_OUTPUT_SYNCHRONOUS is enabled
We could re-enable it also but I haven't tested that yet, and I'm
not sure we care much anyway.

V2: don't disable it from with the call itself. We need a custom
    marshalling function or we get stuck waiting for thread to
    finish.

V3: tidy up redundant code copied from generated version.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-03 09:31:11 +10:00
Grazvydas Ignotas
a0f0f3958e amd/addrlib: fix optimized build warnings
All the -Wunused-but-set-variable ones.
Found a way to do it with a oneliner.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-04-03 00:48:26 +02:00
Grazvydas Ignotas
8e42038d87 radeonsi: use unreachable to fix a warning
si_state.c: In function ‘si_make_texture_descriptor’:
si_state.c:3240:25: warning: ‘num_format’ may be used uninitialized
si_state.c:3240:12: warning: ‘data_format’ may be used uninitialized

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-04-03 00:46:35 +02:00
Constantine Kharlamov
dc6b3c031e r600g: Add more (un)likely functions
1-st is obvious because of assert, 2-nd stolen frmo si_draw_vbo(),
and 3-rd is just a small refactoring.

Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-04-03 00:36:25 +02:00
Constantine Kharlamov
807de52054 r600g: Remove intermediate assignment of pipe_draw_info
It removes a need to copy whole struct every call for no reason.  Comparing
objdump -d output for original and this patch compiled with -O2, shows reduce
of the function by 16 bytes.

Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-04-03 00:36:25 +02:00
Constantine Kharlamov
4408e1ca53 r600g: Use separate index_bias variable
Needed to get rid of a separate struct allocation in the next patch, because
the one in argument is a constant, and don't allow changing its fields.

Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-04-03 00:36:25 +02:00
Ilia Mirkin
cb518f2fb2 nv30: fp/rast may be null when validating fb/scissor due to clear
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-04-02 11:03:00 -04:00
Ilia Mirkin
1184fba86e nvc0: fragprog may not be set when e.g. clearing
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-04-02 10:58:32 -04:00
Ilia Mirkin
7a0c1eee0c nv50: don't assume a rast is set when validating for clears
Clears can happen before a rast is set, which can in turn cause scissors
and fragprog to be validated. Make sure that we handle this case.

Reported-by: Andrew Randrianasulu <randrianasulu@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-04-02 10:58:32 -04:00
Dave Airlie
03a67fbbf7 radv: fix order of the guardband register emission.
y is vert, x is horiz.

Noticed in visual inspection compared to radeonsi.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-02 20:17:30 +10:00
Edward O'Callaghan
f9387a223d mesa/main: Fix memset in formatquery.c
v2: We explicitly set each member to -1 over using a confusing
memset().

Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-04-02 15:18:38 +10:00
Samuel Pitoiset
515165ff0e radeonsi: add load_image_desc()
Similar to load_sampler_desc(). Same deal for bindless.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-01 18:07:49 +02:00
Samuel Pitoiset
2f44402386 radeonsi: rework the load_sampler_desc() helpers
Will be more convenient for bindless because the 64bit handle is
actually the base_ptr of the descriptor (ie. 'list' will be fetched
from TGSI_FILE_CONSTANT/TGSI_FILE_TEMPORARY instead).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-01 18:07:49 +02:00
Samuel Pitoiset
8a3ef8c65d gallivm: add lp_build_emit_fetch_src() helper
lp_build_emit_fetch() is useful when the source type can be
infered from the instruction opcode.

However, for bindless samplers/images we can't do that easily
because tgsi_opcode_infer_src_type() returns TGSI_TYPE_FLOAT for
TEX instructions, while we need TGSI_TYPE_UNSIGNED64 if the
resource register is bindless.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-01 18:07:49 +02:00
Andres Gomez
8b10bf273d docs: add news item and link release notes for 17.0.1
Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-04-01 18:51:40 +03:00
Andres Gomez
f4d2f3aa30 docs: add sha256 checksums for 17.0.3
Signed-off-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit 71d2f05a9e)
2017-04-01 18:50:08 +03:00
Andres Gomez
5fa3f63036 docs: add release notes for 17.0.3
Signed-off-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit 7f34ecae7f)
2017-04-01 18:50:06 +03:00
Erik Faye-Lund
86a9ddfef7 glsl: ir_explog_to_explog2 is no more
Since 63684a9a ("glsl: Combine many instruction lowering passes
into one.", Thu Nov 18 2010), we no longer have anything called
ir_explog_to_explog2. So it's only confusing to have those
references there.

Update with the appropriate method, so people can grep for it in
the current tree if they encounter it.

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-01 13:39:52 +02:00
Erik Faye-Lund
99d8b933fd gallium/docs: remove documentation of removed arg
geom was removed in e968975 ("gallium: remove the geom_flags param
from is_format_supported", Tue Mar 8 00:01:58 2011 +0100), but the
documentation of it was left over. Let's bring the documentation up
to date.

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-01 13:39:52 +02:00
Erik Faye-Lund
c33807463e st/mesa: avoid aliasing violation in st_cb_perfmon.c
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-01 13:39:52 +02:00
Michal Srb
52f9ccefcb st: Add cubeMapFace parameter to st_finalize_texture.
st_finalize_texture always accesses image at face 0, but it may not be
set if we are working with cubemap that had other face set.

This fixes crash in piglit
same-attachment-glFramebufferTexture2D-GL_DEPTH_STENCIL_ATTACHMENT.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-04-01 09:03:23 +02:00
Jason Ekstrand
d6fccb4c09 vulkan: Bump the header and XML to the latest public version 2017-03-31 22:41:43 -07:00
Karol Herbst
baaae8cb81 nv50/ir: also do PostRaLoadPropagation for FMA
Helps Feral-ported games, due to their use of fma()

shader-db changes:
total instructions in shared programs : 3934925 -> 3934327 (-0.02%)
total gprs used in shared programs    : 481563 -> 481563 (0.00%)
total local used in shared programs   : 27469 -> 27469 (0.00%)
total bytes used in shared programs   : 36061888 -> 36056504 (-0.01%)

                local        gpr       inst      bytes
    helped           0           0         228         228
      hurt           0           0           0           0

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-03-31 23:57:16 -04:00
Karol Herbst
7d007824a3 gm107/ir: add LIMM form of mad
v2: renamed commit
    reordered modifiers
    add assert(dst == src2)
v3: reordered modifiers again
v5: no rounding bit for limms

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-03-31 23:57:15 -04:00
Karol Herbst
ad638514e3 gk110/ir: add LIMM form of mad
v2: renamed commit
    reordered modifiers
    add assert(dst == src2)
v3: removed wrong neg mod emission

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-03-31 23:57:14 -04:00
Karol Herbst
d346b8588c nv50/ir: implement mad post ra folding for nvc0+
changes for GpuTest /test=pixmark_piano /benchmark /no_scorebox /msaa=0
/benchmark_duration_ms=60000 /width=1024 /height=640:

score: 1026 -> 1045

changes for shader-db:
total instructions in shared programs : 3943335 -> 3934925 (-0.21%)
total gprs used in shared programs    : 481563 -> 481563 (0.00%)
total local used in shared programs   : 27469 -> 27469 (0.00%)
total bytes used in shared programs   : 36139384 -> 36061888 (-0.21%)

                local        gpr       inst      bytes
    helped           0           0        3587        3587
      hurt           0           0           0           0

v2: removed TODO
    reorderd to show changes without RA modification
    removed stale debugging print() call
v3: remove predicate checks
    enable only for gf100 ISA

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-03-31 23:57:13 -04:00
Karol Herbst
d6ce325147 nv50/ir: restructure and rename postraconstantfolding pass
we might want to add more folding passes here, so make it a bit more generic

v2: leave the comment and reword commit message
v4: rename it to PostRaLoadPropagation

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-03-31 23:57:12 -04:00
Karol Herbst
f2a4d881fe nvc0/ir: also do ConstantFolding for FMA
Helps mainly Feral-ported games, due to their use of fma()

shader-db changes:
total instructions in shared programs : 3941587 -> 3940749 (-0.02%)
total gprs used in shared programs    : 481511 -> 481460 (-0.01%)
total local used in shared programs   : 27469 -> 27481 (0.04%)
total bytes used in shared programs   : 36123344 -> 36115776 (-0.02%)

                local        gpr       inst      bytes
    helped           2          48         243         243
      hurt           2           3          32          32

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-03-31 23:57:10 -04:00
Karol Herbst
fac921db63 nvc0/ir: disable support for LIMMs on MAD/FMA
I hit an assert in the emiter while toying around with optimizations, because
ConstantFolding immediated a big int into a mad.

There is special handling for FMA/MAD in insnCanLoad, which is broken. With
this patch the special path should be not hit anymore. Anyway, the constraints
for the LIMMS can't be guarenteed in SSA form and I have patches pending to
use it via a post-SSA optimization pass.

As a result, immediates get immediated for int mad/fmas as well.

changes in shader-db:
total instructions in shared programs : 3943335 -> 3941587 (-0.04%)
total gprs used in shared programs    : 481563 -> 481511 (-0.01%)
total local used in shared programs   : 27469 -> 27469 (0.00%)
total bytes used in shared programs   : 36139384 -> 36123344 (-0.04%)

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
[imirkin: remove extra bit from insnCanLoad as well]
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-03-31 23:57:08 -04:00
Lyude
31970ab9a6 nvc0: Add support for NV_fill_rectangle for the GM200+
This enables support for the GL_NV_fill_rectangle extension on the
GM200+ for Desktop OpenGL.

Signed-off-by: Lyude <lyude@redhat.com>

Changes since v1:
- Fix commit message
- Add note to reldocs
Changes since v2:
- Remove unnessecary parens in nvc0_screen_get_param()
- Fix sorting in release notes
- Don't execute FILL_RECTANGLE method on pre-GM200+ GPUs

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-03-31 21:41:36 -04:00
Lyude
82e0c5f484 st/mesa: Add support for NV_fill_rectangle
Signed-off-by: Lyude <lyude@redhat.com>

Changes since v1:
- Fix commit name

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-03-31 21:41:32 -04:00
Lyude
1cc7352c4c gallium: Add NV_fill_rectangle to pipe state
Signed-off-by: Lyude <lyude@redhat.com>

Changes since v1:
- Fix accidental widening of bitfields

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-03-31 21:41:29 -04:00
Lyude
ffe2bd676f gallium: Add a cap to check if the driver supports fill_rectangle
Changes since v1:
- Add pipe caps for etnaviv, freedreno, swr and virgl

Signed-off-by: Lyude <lyude@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-03-31 21:41:24 -04:00
Lyude
54af467334 mesa: Add support for GL_NV_fill_rectangle
Since we don't have the bits required to support this in OpenGLES yet,
this only enables support for Desktop OpenGL

Signed-off-by: Lyude <lyude@redhat.com>

Changes since v1:
- Simply _mesa_PolygonMode() a little bit
- Fix formatting in OpenGL spec excerpts
- Move polygon mode checking into _mesa_valid_to_render()
Changes since v3:
- Improve error message for invalid drawings with GL_FILL_RECTANGLE_NV

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-03-31 21:41:20 -04:00
Lyude
a7cb2b40ed glapi: Add GL_NV_fill_rectangle
Signed-off-by: Lyude <lyude@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-03-31 21:41:08 -04:00
Marek Olšák
150736b5c3 gallium: remove support for predicates from TGSI (v2)
Neved used.

v2: gallivm: rename "pred" -> "exec_mask"
    etnaviv: remove the cap
    gallium: fix tgsi_instruction::Padding

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-04-01 00:06:41 +02:00
Dave Airlie
c011fe7452 radv: enable tessellation shaders.
This enables tessellation shaders and sets some values for
the maximums.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:17:25 +10:00
Dave Airlie
cb1518e96b radv/ac: setup lds for tessellation
This seems to get lost in the rebases, should fix
the tessellation demos, crash in llvm.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:17:15 +10:00
Dave Airlie
3f0d69af20 radv: add ia_multi_vgt_param tessellation support.
This just ports the relevant radeonsi pieces.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:17:08 +10:00
Dave Airlie
b4495b71c6 radv/cmd: emit tessellation state.
This emits the tessellation shaders and state to the command stream.

It contains the logic to emit the LS/HS shaders.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:16:57 +10:00
Dave Airlie
60fc0544e0 radv/pipeline: handle tessellation shader compilation
So tess shaders have some circular dependencies,

TCS needs the TES primitive mode
TES needs the TCS vertices out

This builds the nir for each shader first to get the
info, executes a tes specific nir pass, then builds
the LLVM shaders.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:16:51 +10:00
Dave Airlie
aaabdd6bc6 radv/ac: handle writing out tess factors.
This ports the code from radeonsi to build the if/endif,
and ports the tess factor emission code. This code has
an optimisation TODO that we can deal with later.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:16:47 +10:00
Dave Airlie
94f9591995 radv/ac: add support for TCS/TES inputs/outputs.
This adds support for the tessellation inputs/outputs to the
shader compiler, this is one of the main pieces of the patch.

It is very similiar to the radeonsi code (post merge we should
consider if there are better sharing opportunities). The main
differences from radeonsi, is that we can have "compact" varyings
for clip/cull/tess factors, and we have to add special handling
for these.

This consists of treating the const index from the deref different
depending on the compactness.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:16:42 +10:00
Dave Airlie
5ab1289b48 radv/ac: add clip support for tess eval shader.
As this may be the last shader to emit clip distances.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:16:37 +10:00
Dave Airlie
326b9bc6dc radv/ac: hook up tessellation intrinsics.
This just adds support for the nir intrinsics that tessellation uses.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:16:32 +10:00
Dave Airlie
d8ab71b207 radv/ac: hook up shader information handling for tessellation
This hooks up the tessellation shader info to the nir values
and ctx generated ones.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:16:27 +10:00
Dave Airlie
4c60c68bd1 radv/pipeline: start calculating tess stage.
This calculates the pipeline state for tessellation.

It moves the gs ring calculation down to below
where the tessellation shaders will be compiled,
as it needs the info from those shaders.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:16:19 +10:00
Dave Airlie
823b55a8a9 radv: add tessellation support to variant code.
This just fills out the rsrc registers for tess shaders.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:16:14 +10:00
Dave Airlie
f239f59778 radv: add tessellation support to shader naming
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:16:08 +10:00
Dave Airlie
5b40eab00a radv: add tess ctrl stage barrier workaround for SI.
This just ports the workaround from radeonsi.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:16:04 +10:00
Dave Airlie
3a633cc2cb radv/ac: add support for patch inputs to unique index code.
This add support for tessellation patch inputs to the code
that finds the unique parameter index.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:15:57 +10:00
Dave Airlie
aeb49bc2b9 radv: port polaris vgt vertex reuse workaround.
This ports the VGT_VERTEX_REUSE register settings
for Polaris GPUs from radeonsi.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:15:51 +10:00
Dave Airlie
46a820b383 radv: configure tessellation distribution register.
This just takes the radeonsi values.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:15:45 +10:00
Dave Airlie
60326a7afc radv/ac: setup tessellation shader inputs.
This just configures all the register inputs for the tessellation
related stages.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:15:41 +10:00
Dave Airlie
3968162751 radv/ac: setup tess rings on compiler side.
This just sets up the necessary pointers on the compiler
side for the rings needed for tessellation.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:15:35 +10:00
Dave Airlie
46e52df34d radv: add tessellation ring allocation support. (v2)
This patch adds support for the offchip rings for storing
tessellation factors and attribute data.

It includes the register setup for the TF ring

v2: always do tess ring size calcs (Bas)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:15:30 +10:00
Dave Airlie
bbfb62df16 radv: add support for some device specific tess information.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:15:26 +10:00
Dave Airlie
2b3c4bcc1f radv/ac: add tess changes to shader keys/info
This adds the tess pieces for shader keys and shader info,
it adds the necessary bits to the vertex key/info as well.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:15:22 +10:00
Dave Airlie
a4b039db04 radv: add tess shader stage user data support.
This just adds support for tess to the shader stage conversion
and emits the per-stage descriptors/constants for tess stages.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:15:15 +10:00
Dave Airlie
a5136a97f7 radv: use defines for ring descriptor offsets.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:15:12 +10:00
Dave Airlie
0604284e3f radv: add helper function to denote if tess is enabled on a pipeline.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:14:59 +10:00
Dave Airlie
97e0ff30c0 radv: handle clip dist in es outputs.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:14:53 +10:00
Dave Airlie
6279646306 radv: drop unneeded start
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:14:39 +10:00
Dave Airlie
a58d03a5a2 radv: fixup geometry clip emission since using the geom pass
Fixes: 2b35b60d: radv: move to using nir clip/cull merge pass.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-04-01 07:14:38 +10:00
Marek Olšák
744317c9d2 radeonsi/gfx9: allow CMASK fast clear with RB+
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-31 21:41:57 +02:00
Marek Olšák
ea59521475 radeonsi/gfx9: don't compare src_va w/ dst_va for CP_DMA_CLEAR
src_va contains the clear value in this case.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-31 21:41:57 +02:00
Marek Olšák
e3cb67dc6b radeonsi/gfx9: fix 1D array fetches with derivs, bias, or Z compare value
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-31 21:41:57 +02:00
Marek Olšák
6ab2042761 radeonsi/gfx9: fix and enable single-sample CMASK fast clear
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-31 21:41:57 +02:00
Marek Olšák
d4bb4583b0 radeonsi/gfx9: fix and enable MSAA compression
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-31 21:41:57 +02:00
Marek Olšák
06d725ab2f radeonsi/gfx9: disable CE
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-31 21:41:57 +02:00
Marek Olšák
35aaccaf81 radeonsi/gfx9: fix linear mipmap CPU access
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-31 21:41:57 +02:00
Marek Olšák
322eb13f09 radeonsi: add tests verifying that VM faults don't hang
GFX9 hangs instead of writing VM faults to dmesg.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-31 21:41:57 +02:00
Marek Olšák
283c31afa1 radeonsi: unify HS max_offchip_buffers workarounds
Vulkan doesn't set more than 508.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-31 21:41:57 +02:00
Marek Olšák
829bd77235 radeonsi: adjust checking for SC bug workarounds
no change in behavior, just making sure that no later chips will use
the workarounds

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-31 21:41:56 +02:00
Brian Paul
2936f5c37e glsl: use -O1 optimization for builtin_functions.cpp with MinGW
Some versions of MinGW-w64 such as 5.3.1 and 6.2.0 produce bad code
with -O2 or -O3 causing a random driver crash when running programs
that use GLSL.  Most Mesa demos in the glsl/ directory trigger the
bug, but not the fragcoord.c test.

Use a #pragma to force -O1 for this file for later MinGW versions.
Luckily, this is basically one-time setup code.  I suspect the bug
is related to the sheer size of this file.

This should let us move to newer versions of MinGW-w64 for Mesa.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-31 13:36:25 -06:00
Brian Paul
15bb0511d6 tnl: remove unused var to silence warning
Trivial.
2017-03-31 13:30:54 -06:00
Neha Bhende
2e24a11f1d st/wgl: Replace variable name hdc with hDrawDC
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-03-31 13:30:54 -06:00
Brian Paul
7d0aac2392 st/wgl: add support for WGL_ARB_make_current_read
This adds the wglMakeContextCurrentARB() and wglGetCurrentReadDCARB()
functions.

Signed-off-by: Brian Paul <brianp@vmware.com>
2017-03-31 13:30:54 -06:00
Brian Paul
7753f040fa stw/wgl: add null context check in wglBindTexImageARB()
To avoid dereferencing a null pointer in case wglMakeCurrent() wasn't
called.  Found while debugging SWKOTOR game.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-03-31 13:30:53 -06:00
Marek Olšák
7d2fa8dc10 radeonsi: decompress DCC in set_sampler_view instead of create_sampler_view (v2)
v2: don't add a new decompress helper function

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-31 20:57:53 +02:00
Marek Olšák
8c7d1ded19 radeonsi: decompress DCC in set_framebuffer_state instead of create_surface (v2)
for threaded gallium, which can't use pipe_context in create_surface

v2: don't add a new decompress helper function

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-31 20:57:53 +02:00
Nicolai Hähnle
d10fbe5159 st/glsl_to_tgsi: fix 64-bit integer bit shifts
Fix a bug that was caused by a type mismatch in the shift count between
GLSL and TGSI. I briefly considered adjusting the TGSI semantics, but
since both LLVM and AMD GCN require both arguments to be of the same type,
it makes more sense to keep TGSI as-is -- it reflects the underlying
implementation better.

I'm also sending out piglit tests that expose this error.

v2: use the right number of components for the temporary register

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-31 18:15:50 +02:00
Nicolai Hähnle
c22841d8d2 tgsi: fix printing of 64-bit integer immediates
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-31 18:15:47 +02:00
Lionel Landwerlin
74a80d579d intel: genxml: fix out of tree builds
v2: use Emil's recommendation
    change rule to closer to genxml/genX_bits.h

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-31 15:29:57 +01:00
Thomas Hellstrom
18e2aa063c gbm/dri: Check dri extension version before flush after unmap
The commit mentioned below required the __DRI2FlushExtension to have
version 4 or above, for GBM functionality. That broke GBM with some
classic dri drivers. Relax that requirement so that we only flush
after unmap if we have version 4 or above. Drivers that require the flush
for correct functionality should implement the desired version.

Fixes: ba8df228 ("gbm/dri: Flush after unmap")
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Dylan Baker <dylan@pnwbakers.com>
2017-03-31 10:25:46 +02:00
Nicolai Hähnle
02112c3ef7 radeonsi: implement ARB_shader_group_vote
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-31 07:56:27 +02:00
Nicolai Hähnle
cd3f386069 radeonsi: enable ARB_shader_clock
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-31 07:56:27 +02:00
Nicolai Hähnle
2290535d62 radeonsi: emit TGSI_OPCODE_CLOCK
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-31 07:56:26 +02:00
Nicolai Hähnle
65b542a7cc st/mesa: implement ARB_shader_clock
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-31 07:56:26 +02:00
Ilia Mirkin
94ec847cb0 tgsi: add CLOCK opcode
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-31 07:56:26 +02:00
Nicolai Hähnle
d0c7f924a3 gallium: add PIPE_CAP_TGSI CLOCK
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-31 07:56:25 +02:00
Nicolai Hähnle
44125b29d1 glsl: fix clockARB builtin function
The underlying intrinsic is defined to always have a uvec2 return type.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-31 07:56:25 +02:00
Tapani Pälli
3535b87a1a anv: change BLOCK_POOL_MEMFD_SIZE to 1GB
This allows us to run 32bit Vulkan apps on Android, ftruncate
call would fail on 2GB (max size being 2GB - 1).

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-31 08:43:28 +03:00
Tapani Pälli
2398770c87 android: add libmesa_genxml as dep to libmesa_isl
This is to fix following compile error with libmesa_isl:
   mesa/src/intel/isl/isl.c:28:10: fatal error: 'genxml/genX_bits.h' file not found

Fixes: f0eaf38 ("genxml: New generated header genX_bits.h (v6)")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emli Velikov <emil.velikov@collabora.com>
2017-03-31 08:42:54 +03:00
Timothy Arceri
3e524cfa47 mesa: remove MESA_GLSL=opt
This is unused.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emli.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-31 13:43:38 +11:00
Timothy Arceri
2caa3aa1f4 mesa: remove MESA_GLSL=no_opts env option
This is confusing because is only applys to GL_ARB_vertex/fragment_program,
and because of that its also not very useful.

If someone requires this for debugging they can just make an ad-hoc
code change.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-31 13:43:38 +11:00
Timothy Arceri
94224950dd mesa: move FLUSH_VERTICES() call to meta
There is no need for this to be in the common code.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-31 13:43:38 +11:00
Timothy Arceri
2e70de7d2f mesa/vbo: remove redundant _mesa_is_bufferobj() calls
This is already called inside the vbo_exec_vtx_{unmap,map}()
functions.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-31 11:54:37 +11:00
Timothy Arceri
3ef1ff6270 mesa/glthread: add async support to ARB_gpu_shader_int64 uniform functions
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-31 11:54:36 +11:00
Timothy Arceri
eb3df0e838 mesa/glthread: add async support to ARB_gpu_shader_fp64 uniform functions
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-31 11:54:35 +11:00
Lionel Landwerlin
469da094e1 aubinator: enable snb/ilk through --gen
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-03-31 01:25:33 +01:00
Lionel Landwerlin
0f83c05149 intel: genxml: compress all gen files into one
Combining all the files into a single string didn't make any
difference in the size of the aubinator binary.

With this change we now also embed gen4/4.5/5 descriptions, which
increases the aubinator size by ~16Kb.

v2 (Lionel): rebase makefiles

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-03-31 01:24:56 +01:00
Bas Nieuwenhuizen
0f3de89a56 radv: Use the guard band.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-30 22:21:14 +02:00
Bas Nieuwenhuizen
8a53e6e4c5 radv: Prepare for not using the guard band for lines & points.
Vulkan Clipping is defined in terms of vertices, the scissor based
clipping happens on pixels. There is a difference with points and
lines, as a vertex can be outside the viewport while some pixels are in.
On Vulkan thoise pixels shouldn't be drawn, while they would be with
the guardband.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-30 22:21:14 +02:00
Bas Nieuwenhuizen
76603aa90b radv: Drop the default viewport when 0 viewports are given.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-30 22:21:14 +02:00
Bas Nieuwenhuizen
4083a2ddcb radv: Set proper viewport & scissor for meta draws.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-30 22:21:14 +02:00
Lyude
42f2bccd11 mesa: Fix trailing whitespace in polygon.c
Signed-off-by: Lyude <lyude@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-03-30 11:59:51 -07:00
Lyude
043ee96059 mesa: Fix gross indenting in _mesa_PolygonMode()
Signed-off-by: Lyude <lyude@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-03-30 11:59:51 -07:00
Lyude
a1ce8a3fe2 r300: Fix indenting in r300_get_param()
Signed-off-by: Lyude <lyude@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-03-30 11:59:51 -07:00
Lyude
e5c6c421c4 vc4: Fix indenting in vc4_screen_get_param()
Signed-off-by: Lyude <lyude@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-03-30 11:59:51 -07:00
Kenneth Graunke
e113dfabad intel: Add INTEL_CFLAGS to aubinator CFLAGS.
It still needs intel_aub.h.  Fixes the build.
2017-03-30 11:58:00 -07:00
Jason Ekstrand
fbcf92a278 nir: Add support for 8 and 16-bit types
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2017-03-30 11:34:45 -07:00
Jason Ekstrand
28e41506a6 nir/constant_expressions: Don't switch on bit size when not needed
For opcodes such as the nir_op_pack_64_2x32 for which all sources and
destinations have explicit sizes, the bit_size parameter to the evaluate
function is pointless and *should* do nothing.  Previously, we were
always switching on the bit_size and asserting if it isn't one of the
sizes in the list.  This generates way more code than needed and is a
bit cruel because it doesn't let us have a bit_size of zero on an ALU op
which shouldn't need a bit_size.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2017-03-30 11:34:45 -07:00
Jason Ekstrand
b69b44d222 nir/constant_expressions: Pull the guts out into a helper block
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2017-03-30 11:34:45 -07:00
Kenneth Graunke
f5e5c0c101 i965: Stop using legacy dri_bufmgr_* and intel_* names.
Eric renamed these from dri_bufmgr_* and intel_bufmgr_* to drm_intel_*
in libdrm commit 4b9826408f65976a1a13387beda748b65e03ec52, circa 2008,
but we've been using the legacy names this whole time.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-30 11:16:34 -07:00
Emil Velikov
3df993e1a2 intel: automake: move INTEL_CFLAGS as applicable
Only common/decoder.[ch] requires it [for intel_aub.h].

v2: The code was moved to from intel/tools to intel/common,
update accordingly.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-30 19:07:28 +01:00
Emil Velikov
4ffb394961 intel: android: remove libdrm_intel requirement
The only part which requires libdrm_intel tools/aubinator is not built
on Android.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-30 19:07:23 +01:00
Marek Olšák
331714d72e Partially revert "amd/addrlib: silence warnings" to fix builds with DEBUG
This partially reverts commit 8a74140a21.
2017-03-30 19:17:39 +02:00
Marek Olšák
681adbc18c ddebug: implement clear_texture
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 18:53:42 +02:00
Marek Olšák
83d3e6fbff radeonsi: fix an unused-variable warning in a release build 2017-03-30 17:22:25 +02:00
Marek Olšák
bb2e05885d vdpau: fix a maybe-uninitialized warning 2017-03-30 17:14:47 +02:00
Marek Olšák
65732a8ff6 softpipe: fix a maybe-uninitialized warning
/home/marek/dev/mesa-main/src/gallium/drivers/softpipe/sp_compute.c:178:
 warning: 'grid_size' may be used uninitialized in this function
 [-Wmaybe-uninitialized]
2017-03-30 17:14:47 +02:00
Marek Olšák
9f5dbbe030 gallivm: fix a maybe-uninitialized warning
/home/marek/dev/mesa-main/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c:3598:
 warning: 'level' may be used uninitialized in this function [-Wmaybe-uninitialized]
       out1 = lp_build_cmp(&leveli_bld, PIPE_FUNC_GREATER, level, last_level);
            ^
2017-03-30 17:14:47 +02:00
Marek Olšák
3b1934d9b6 gallium/radeon: s/dcc_disable/disable_dcc/
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-03-30 16:09:39 +02:00
Marek Olšák
45a71d5de5 radeonsi: handle incompatible DCC formats in resource_copy_region
Required because of later commits.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
2017-03-30 16:09:39 +02:00
Marek Olšák
b05b8587ae radeonsi: remove a workaround for inexact *8_SNORM blits
All tests pass on Fiji now. This prevents DCC disablement due to
incompatible DCC formats due to the fallback.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
2017-03-30 16:09:39 +02:00
Marek Olšák
a955ee788f gallium/radeon: add and use a new helper vi_dcc_enabled
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-03-30 16:09:37 +02:00
Marek Olšák
f7bd51626e gallium/radeon: formalize that r600_query_hw_add_result doesn't need a context
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-03-30 16:09:36 +02:00
Marek Olšák
d76c306162 radeonsi: don't make a copy of pipe_index_buffer in draw_vbo
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-03-30 16:09:32 +02:00
Marek Olšák
abb25fb18e gallium/util: use const in u_index_modify helpers
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-03-30 16:09:29 +02:00
Samuel Pitoiset
7d99f48b5e winsys/amdgpu: remove AMDGPU_INFO_NUM_EVICTIONS
This is now exposed with libdrm_amdgpu 2.4.76.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 15:27:13 +02:00
Marek Olšák
675af982e1 radeonsi: add Vega10 PCI IDs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Boyuan Zhang
cb8b84e3d0 radeon/uvd: set correct vega10 db pitch alignment
Create new function to get correct alignment based on Asics, and change
the corresponding decode message buffer and dpb buffer size calculations

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-03-30 14:44:33 +02:00
Leo Liu
5eba761fee radeon/vce: add vce support for firmware 53.19.4
v2: squashed with other similar commits

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-03-30 14:44:33 +02:00
Leo Liu
ed48b399f1 radeon/vce: adapt gfx9 surface to vce
Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-03-30 14:44:33 +02:00
Leo Liu
6c7870fee8 winsys/surface: add height pitch for gfx9
Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2017-03-30 14:44:33 +02:00
Leo Liu
c89e771c9c radeon/uvd: clear message buffer when reuse
As required by firmware

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-03-30 14:44:33 +02:00
Leo Liu
c836f2ce28 radeon/uvd: adapt gfx9 surface to uvd
Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-03-30 14:44:33 +02:00
Leo Liu
9d5db4e8f4 radeon/uvd: add uvd soc15 register
Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
474468fbf9 radeonsi/gfx9: disable features that don't work
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
8ea3da0706 radeonsi/gfx9: only allow GL 3.1
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
7695ea0c02 radeonsi/gfx9: add linear address computations for texture transfers
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
172b05a37e radeonsi/gfx9: don't generate LS and ES states
these shaders don't exist on GFX9

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
eb22f5bf6f radeonsi/gfx9: SPI_SHADER_USER_DATA changes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
f4ab7a5415 winsys/amdgpu: set/get BO tiling flags for GFX9
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
7d88233f84 radeonsi/gfx9: handle pitch and offset overrides for texture_from_handle
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
de55e57e29 radeonsi/gfx9: set/validate GFX9 BO metadata
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
bd1da6b339 radeonsi/gfx9: add radeon_surf.gfx9.surf_offset
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
3685a12bad radeonsi/gfx9: don't write mipmap level offsets to BO metadata
GFX9 doesn't have (usable) mipmap offsets.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
9c100bd693 radeonsi/gfx9: flush CB & DB caches with an EOP TS event
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
6e0d64712a radeonsi/gfx9: use ACQUIRE_MEM
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
81aa21d732 radeonsi/gfx9: only use CE RAM for most-used descriptors
because the CE RAM size decreased to 4 KB.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
86f13c7363 radeonsi/gfx9: emit FLUSH_DFSM where required
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
ad93d72c34 radeonsi/gfx9: emit BREAK_BATCH in emit_framebuffer_state
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
405bacd820 radeonsi/gfx9: fix MIP0_WIDTH & MIP0_HEIGHT for compressed texture blits
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
354285afa0 radeonsi/gfx9: fix textureSize/imageSize for 1D textures
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
566defad13 radeonsi/gfx9: add a workaround for 1D depth textures
The same workaround is used by Vulkan.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
fc3c503b5d radeonsi/gfx9: enable clamping for Z UNORM formats promoted to Z32F
so that shaders don't have to do it.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
5abf60076c radeonsi/gfx9: image descriptor changes in mutable fields
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
c8ffec4f4b radeonsi/gfx9: FMASK image descriptor changes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
d60f72a9f0 radeonsi/gfx9: image descriptor changes in immutable fields
The border color swizzle logic was copied from Vulkan. It doesn't make any
sense to me, but it passes all piglits except the stencil ones.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
dfd2b54948 radeonsi/gfx9: DB changes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
94819a3e6c radeonsi/gfx9: CB changes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
272b50a6f4 radeonsi/gfx9: do DCC clears on non-mipmapped textures only
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
aba8e0ea68 radeonsi/gfx9: update can_sample_z/s flags
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
054dcbe42c radeonsi/gfx9: pass correct parameters to buffer_get_handle
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
710aaed52b radeonsi/gfx9: update si_set_optimal_micro_tile_mode
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
7fcad40ca5 radeonsi/gfx9: don't check array_mode for allowing TC-compatible HTILE
GFX9 supports this with all modes except linear.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
6f09b0d076 radeonsi/gfx9: update HTILE/CMASK/FMASK allocators
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
281542c690 radeonsi/gfx9: stub testdma - array_mode_to_string
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
a0e8b73594 radeonsi/gfx9: update r600_print_texture_info
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
b25d7c2cbf gallium/radeon: move pre-GFX9 radeon_bo_metadata.* to u.legacy.*
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
9b365d497a winsys/amdgpu: set num_tile_pipes, pipe_interleave_bytes for GFX9
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
493de7f935 winsys/amdgpu: wire up new addrlib for GFX9
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
e572835fea winsys/amdgpu: update amdgpu_addr_create for GFX9
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
a71139470c winsys/amdgpu: rename GFX6 surface functions
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
9ca33ab78e gallium/radeon: add GFX9 surface info to radeon_surf
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
ba2e7c68ce gallium/radeon: move pre-GFX9 radeon_surf.* members to radeon_surf.u.legacy.*
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
641b79774a radeonsi/gfx9: allow Z16_UNORM for TC-compatible HTILE
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
a4f0a1099f radeonsi/gfx9: draw changes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
b39fade67c radeonsi/gfx9: pad shader binaries by 128 bytes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
5271d12a6e radeonsi/gfx9: trivial shader and ring changes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
0aae4f4764 radeonsi/gfx9: sampler state changes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
71eca0780a radeonsi/gfx9: add a scissor bug workaround
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
b576df4017 radeonsi/gfx9: rasterizer changes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
be8eba0625 radeonsi/gfx9: disable the 2-bit format fetch fix
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
31b1042276 radeonsi/gfx9: set NUM_RECORDS correctly
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
5f4659260e radeonsi/gfx9: ELEMENT_SIZE change
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
d214b95e9a radeonsi/gfx9: enable ETC2
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
6d21fd51b6 radeonsi/gfx9: disable RB+ on Vega10
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
2862300d9e radeonsi/gfx9: init_config changes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
b054718218 radeonsi/gfx9: don't set PA_SC_RASTER_CONFIG*
The registers don't exist on GFX9.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
de7967a27a radeonsi/gfx9: Gather4 no longer needs the workaround
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
71ad666414 radeonsi/gfx9: CP DMA changes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
7690196135 radeonsi/gfx9: query changes - EVENT_WRITE and SET_PREDICATION
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
ea9cf0a322 radeonsi/gfx9: EVENT_WRITE_EOP -> RELEASE_MEM
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
3e3d4f5e1d radeonsi/gfx9: INDIRECT_BUFFER change
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
9680a75489 radeonsi/gfx9: enable SDMA buffer copying & clearing
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
c9b004af58 radeonsi/gfx9: handle GFX9 in a few places
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
92112ec296 radeonsi/gfx9: don't read back non-existent SRBM registers
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
ef97cc0cae radeonsi/gfx9: add IB parser support
Both GFX6 and GFX9 fields are printed next to each other in parsed IBs.

The Python script parses both headers like one stream and tries to merge
all definitions.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
9338ab0afd radeonsi/gfx9: set the LLVM processor, require LLVM 5.0
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
68d6d097f1 radeonsi/gfx9: add GFX9 and VEGA10 enums
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
5691e14735 amd: GFX9 packet changes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
ecbdfbeb05 amd: define event types for GFX9
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
00e777b61c amd: add texture format definitions for GFX9
the DATA_FORMAT and NUM_FORMAT fields are the same, but some of the enums
differ, thus add GFX6 and GFX9 suffixes, so that the IB parser can show
enums for both.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
e6c520362d amd: resolve remaining definition conflicts with gfx9d.h
Add _GFX6 and _GFX9 suffixes to conflicting definitions.

sid.h and gfx9d.h can now be included in the same file.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
7e7043c31c amd: normalize register definition formatting
This resolves trivial conflicts with gfx9d.h caused by different formatting.
Some fields are also renamed.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
db04d4ccaa amd: import GFX9 register definitions
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
a3556c0f06 radeonsi: code shuffling in si_init_depth_surface
use fewer local variables, re-order the assignments, so that the GFX9 diff
is smaller here.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
8a74140a21 amd/addrlib: silence warnings 2017-03-30 14:44:33 +02:00
Nicolai Hähnle
7f160efcde amd/addrlib: import gfx9 support 2017-03-30 14:44:33 +02:00
Kevin Furrow
047d6daf10 amd/addrlib: Not all ETC2 formats are 128bpp... add new ETC2 formats to differentiate between 64 and 128bpp formats. 2017-03-30 14:44:33 +02:00
Kevin Furrow
1360018c1c amd/addrlib: Fix selection of swizzle modes for 3D compressed images. 2017-03-30 14:44:33 +02:00
Kevin Furrow
9705e3b72c amd/addrlib: Add support for ETC2 and ASTC formats. 2017-03-30 14:44:33 +02:00
Joe Ma
a489cdb20f amd/addrlib: Bump version to 6.02 2017-03-30 14:44:33 +02:00
Frans Gu
e736edf63d amd/addrlib: Adjust slie size after pitch and actual height adjustment 2017-03-30 14:44:33 +02:00
Frans Gu
588e5bbf3d amd/addrlib: Apply input pitch after internal pitch aligning 2017-03-30 14:44:33 +02:00
Nicolai Hähnle
11f1306207 amdgpu/addrlib: Bump version to 6.01
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Nicolai Hähnle
a136926eef amdgpu/addrlib: Seperate 2 dcc related workarounds by different flags
1) dccCompatible for padding MSAA surface to support fast clear
2) dccPipeWorkaround for padding surface to support dcc
2017-03-30 14:44:33 +02:00
Nicolai Hähnle
48bf5d0800 amdgpu/addrlib: Fix the issue that tcCompatible HTILE slice size is not calculated correctly 2017-03-30 14:44:33 +02:00
Nicolai Hähnle
33c25655c1 amdgpu/addrlib: Add a new output flag to notify client that the returned tile index is for PRT on SI
If this flag is set for mip0, client should set prt flag for sub mips,
so that address lib can select the correct tile index for sub mips.
2017-03-30 14:44:33 +02:00
Xavi Zhang
fa906a888b amdgpu/addrlib: add matchStencilTileCfg and tcCompatible fixes
The usage should be client first call AddrComputeSurfaceInfo() on
depth surface with flag "matchStencilTilecfg", AddrLib will use
2DThin1 tile index for depth as much as possible and do not down grade
unless alignment requirement cannot be met.

1. If there is a matched 2DThin1 tile index for stencil which make
sure they will share same tile config parameters, then return the
stencil 2DThin1 tile index as well.
2. If using 2DThin1 tile mode cannot make sure such thing happen, and
TcCompatible flag was set, then ignore this flag then try 2DThin1 tile
mode for depth and stencil again.
3. If 2DThin1 tile mode cannot make sure depth and stencil to have
same tile config parameters, then down grade depth surface tile mode
to 1DThin1.
4. If depth surface's tile mode was 1DThin1, then return 1DThin1 tile
index for stencil.
5. If depth surface's tile mode is PRT, then return invalid tile index
to stencil since their tile config parameters will never be met.

Client driver then check the returned tile index of stencil -- if it
is not invalid tile index, then call AddrComputeSurfaceInfo() on
stencil surface with the returned stencil tile index to get full
output information. Please note, client needs to set flag
"useTileIndex" when AddrLib get created.
2017-03-30 14:44:33 +02:00
Frans Gu
6764d96eaa amdgpu/addrlib: Adjust bank equation bit order based on macro tile aspect ratio settings
By this way, we can have valid equation for 2D_THIN1 tile mode.
Add flag "preferEquation" to return equation index without adjusting
input tile mode.
2017-03-30 14:44:33 +02:00
Frans Gu
ed1aca8e8f amdgpu/addrlib: do some tile mode conversions to display surface 2017-03-30 14:44:33 +02:00
Xavi Zhang
cb8844392c amdgpu/addrlib: Check prt flag for PRT_THIN1 extra padding for DCC. 2017-03-30 14:44:33 +02:00
Frans Gu
fe216415c6 amdgpu/addrlib: Add new flags minimizePadding and maxBaseAlign
1) minimizePadding - Use 1D tile mode if padded size of 2D is bigger
than 1D
2) maxBaseAlign - Force PRT tile mode if macro block size is bigger than
requested alignment.

Also, related changes to tile mode optimization for needEquation.
2017-03-30 14:44:33 +02:00
Xavi Zhang
4dd4700612 amdgpu/addrlib: Always returns pixelPitch in original pixels 2017-03-30 14:44:33 +02:00
Sabre Shao
eb3036ed46 amdgpu/addrlib: fix crash on allocation failure 2017-03-30 14:44:33 +02:00
Frans Gu
680f91e5d4 amdgpu/addrlib: Add flag to report if a surface can have dcc ram 2017-03-30 14:44:33 +02:00
Roy Zhan
ca88f83222 amdgpu/addrlib: support non-power2 height alignment (for linear surface) 2017-03-30 14:44:33 +02:00
Frans Gu
c867a2b222 amdgpu/addrlib: Fix family setting for VI and CZ ASICs 2017-03-30 14:44:33 +02:00
Nicolai Hähnle
b328e47d3d amdgpu/addrlib: style cleanup
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Nicolai Hähnle
fbc9ba7559 amdgpu/addrlib: Pad pitch to multiples of 256 for DCC surface on Fiji
The change also modifies function CiLib::HwlPadDimensions to report
adjusted pitch alignment.
2017-03-30 14:44:33 +02:00
Xavi Zhang
145750efba amdgpu/addrlib: Fix number of //
Find ^/{80,99}$  and replace them to 100 "/"

Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Nicolai Hähnle
4e2668ecd1 amdgpu/addrlib: Cleanup.
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Xavi Zhang
d1ecb70ba3 amdgpu/addrlib: Use namespaces
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Kevin Zhao
8912862a40 amdgpu/addrlib: Adjust 99 "*" to 100 "*" alignment
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Frans Gu
acaeae2861 amdgpu/addrlib: Add a new tile mode ADDR_TM_UNKNOWN
This can be used by address lib client to ask address lib to select
tile mode.
2017-03-30 14:44:33 +02:00
Xavi Zhang
90029b958e amdgpu/addrlib: Stylish cleanup.
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Roy Zhan
554c1b9f2d amdgpu/addrlib: Disable tcComaptible when depth surface is not macro tiled
Experiment show 1D tiling + TcCompatible cannot work together.
2017-03-30 14:44:33 +02:00
Xavi Zhang
120a5d0e42 amdgpu/addrlib: fix pixel index calculation of thick micro tiling 2017-03-30 14:44:33 +02:00
Xavi Zhang
199912a9bc amdgpu/addrlib: Add a flag to skip calculate indices
This is useful for debugging and special cases for stencil surfaces
do not require texture fetch compatible.
2017-03-30 14:44:33 +02:00
Nicolai Hähnle
10f7d1cb03 amdgpu/addrlib: add equation generation
1. Add new surface flags needEquation for client driver use to force
the surface tile setting equation compatible. Override 2D/3D macro
tile mode to PRT_* tile mode if this flag is TRUE and num slice > 1.
2. Add numEquations and pEquationTable in ADDR_CREATE_OUTPUT structure
to return number of equations and the equation table to client driver
3. Add equationIndex in ADDR_COMPUTE_SURFACE_INFO_OUTPUT structure to
return the equation index to client driver

Please note the use of address equation has following restrictions:
1) The surface can't be splitable
2) The surface can't have non zero tile swizzle value
3) Surface with > 1 slices must have PRT tile mode, which disable
slice rotation
2017-03-30 14:44:33 +02:00
Nicolai Hähnle
3e44337bd6 amdgpu/addrlib: rename ComputeSurfaceThickness to Thickness 2017-03-30 14:44:33 +02:00
Xavi Zhang
79dcda5116 amdgpu/addrlib: add define HAVE_TSERVER 2017-03-30 14:44:33 +02:00
Frans Gu
7293a020bd amdgpu/addrlib: Add new interface to support macro mode index query 2017-03-30 14:44:33 +02:00
Roy Zhan
c16e1e2041 amdgpu/addrlib: add explicit Log2NonPow2 function 2017-03-30 14:44:33 +02:00
Nicolai Hähnle
4a4b7da141 amdgpu/addrlib: Fix invalid access to m_tileTable
Sometimes client driver passes valid tile info into address library,
in this case, the tile index is computed in function
HwlPostCheckTileIndex instead of CiAddrLib::HwlSetupTileCfg.
We need to call HwlPostCheckTileIndex to calculate the correct tile
index to get tile split bytes for this case.
2017-03-30 14:44:33 +02:00
Nicolai Hähnle
9e40e09089 amdgpu/addrlib: add ADDR_ANALYSIS_ASSUME
It helps fix analysis warnings in MSC.
2017-03-30 14:44:33 +02:00
XiaoYuan Zheng
6164f23a91 amdgpu/addrlib: add tcCompatible htile addr from coordinate support. 2017-03-30 14:44:33 +02:00
Carlos Xiong
3bd1380ab2 amdgpu/addrlib: force all zero tile info for linear general. 2017-03-30 14:44:33 +02:00
Nicolai Hähnle
8b110f0319 amdgpu/addrlib: Add a member "bpp" for input of method AddrConvertTileIndex and AddrConvertTileInfoToHW
When clients queries tile Info from tile index and expects accurate
tileSplit info,  bits per pixel info is required to be provided since
this is necessary for computing tileSplitBytes; otherwise Addrlib will
return value of "tileBytes" instead if bpp is 0 - which is also
current logic. If clients don't need tileSplit info, it's OK to pass
bpp with value 0.
2017-03-30 14:44:33 +02:00
Frans Gu
ca6a38fd6a amdgpu/addrlib: Refine the PRT tile mode selection
Switch the tile index based on logic instead of hardcoded threshold
for different ASIC.
2017-03-30 14:44:33 +02:00
Xavi Zhang
2bf243f7c6 amdgpu/addrlib: add dccRamSizeAligned output flag
This flag indicates to the client if this level's DCC memory is aligned
or not. No aligned means there are padding to the end.
2017-03-30 14:44:33 +02:00
Nicolai Hähnle
e443b48966 amdgpu/addrlib: Change comment alignment
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Nicolai Hähnle
e06aeaf19f amdgpu/addrlib: style changes and minor cleanups
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Nicolai Hähnle
cb5d22a3f3 amdgpu/addrlib: AddrLib inheritance refactor
Add one more abstraction layer into inheritance system.

Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Nicolai Hähnle
52a1288a15 amdgpu/addrlib: rearrange code in preparation of refactoring
No code changes.

Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Xavi Zhang
f12d430c59 amdgpu/addrlib: add disableLinearOpt flag 2017-03-30 14:44:33 +02:00
Xavi Zhang
b5d8120a07 amdgpu/addrlib: Add GetMaxAlignments 2017-03-30 14:44:33 +02:00
Xavi Zhang
3c3d620cf3 amdgpu/addrlib: Let Kaveri go general stereo right eye offset padding path
Kaveri (2-pipe) macro tiling mode table was initially set to all
4-aspect-ratio so the swizzling path did not work for it and then we
chose to pad the offset. We now discover the root cause is that if
ratio > 2, the swizzling path does not work. So we can safely use the
same path for Kaveri.
2017-03-30 14:44:33 +02:00
Xavi Zhang
3614999878 amdgpu/addrlib: Rewrite tile mode optmization code
Note: remove reference to degrade4Space and use opt4Space instead.
2017-03-30 14:44:33 +02:00
Carlos Xiong
c12e35065a amdgpu/addrlib: Add a flag "tcCompatible" to surface info output structure.
Even if surface info input flag "tcComaptible" is enabled, tc
compatible may be not supported if tile split happens for depth
surfaces. Add a new flag in output structure to notify client to
disable tc compatible in this case.
2017-03-30 14:44:33 +02:00
Xavi Zhang
2ffb30c2af amdgpu/addrlib: Make comments shorter
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
XiaoYuan Zheng
3c7bd4e013 amdgpu/addrlib: add new flag nonSplit
Flag tcCompatible has different usage in CI and VI. Add a new flag
"nonSplit" for CI.
2017-03-30 14:44:33 +02:00
Xiao-Tao Zai
47de94a794 amdgpu/addrlib: allow tileSplitBytes greater than row size
Carrizo row size is 1K, while tileSplitBytes is 2K for a 4xAA 32bpp
depth surface. Remove the sanity check that tileSplitBytes must be
greater than row size. There could be performance loss but may be
covered by non-split depth which enables tc-compatible read.
2017-03-30 14:44:33 +02:00
Carlos Xiong
d52e0bbfe6 amdgpu/addrlib: Change to compute TC compatible stencil info
Change the logic to compute tc compatible stencil info via depth's
tileIndex instead of using depth's tileInfo. So the clients can get
the stencil's tileInfo computed from macroModeTable. If the stencil
tileInfo is same as depth tileInfo, then stencil is tc compatible;
otherwise, stencil is not tc compatible. The current suggestion is to
create another stencil buffer with the tc compatible tileInfo, use
depth-to-color copy to decompress and tile convert the rendered
stencil to tc compoatible stencil (And use the new stencil buffer to
program TC).
2017-03-30 14:44:33 +02:00
Nicolai Hähnle
6c65f256e2 amdgpu/addrlib: rename SiAddrLib/CiAddrLib to match internal spelling
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:44:33 +02:00
Marek Olšák
6e44087e77 configure.ac: require libdrm_amdgpu 2.4.76 for Vega
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 14:42:06 +02:00
Samuel Pitoiset
e7850bb7f0 st/glsl_to_tgsi: use glsl_type::sampler_index()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 13:15:34 +02:00
Samuel Pitoiset
784d3a7066 glsl: allow glsl_type::sampler_index() with images
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 13:15:16 +02:00
Nicolai Hähnle
257ee3f7ef st/mesa: improve error messages and fix security warning
Debian, Ubuntu set default build flag: -Werror=format-security

  CC       state_tracker/st_cb_texturebarrier.lo
state_tracker/st_cb_eglimage.c: In function ‘st_egl_image_get_surface’:
state_tracker/st_cb_eglimage.c:64:7: error: format not a string literal and no format arguments [-Werror=format-security]
       _mesa_error(ctx, GL_INVALID_VALUE, error);
       ^~~~~~~~~~~
state_tracker/st_cb_eglimage.c:71:7: error: format not a string literal and no format arguments [-Werror=format-security]
       _mesa_error(ctx, GL_INVALID_OPERATION, error);
       ^~~~~~~~~~~

Reported-by: Krzysztof Kolasa <kkolasa@winsoft.pl>
Fixes: 83e9de25f3 ("st/mesa: EGLImageTarget* error handling")
2017-03-30 11:24:36 +02:00
Kenneth Graunke
e4dc005bce i965: Combine intel_batchbuffer_reloc and intel_batchbuffer_reloc64
These two functions do the exact same thing.  One returns a uint64_t,
and the other takes the same uint64_t and truncates it to a uint32_t.

We only need the uint64_t variant - the caller can truncate if it wants.
This patch gives us one function, intel_batchbuffer_reloc, that does
the 64-bit thing.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-30 00:15:28 -07:00
Kenneth Graunke
5177231670 i965: Use WARN_ONCE instead of open coding it.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-30 00:15:09 -07:00
Harish Krupo
36cb2003f1 android: pass sse4.1 flag as appropriate
We have functions which depend on sse4.1 support but we didnt pass
the right compile flag for it. This patch fixes it.

Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Signed-off-by: Harish Krupo <harish.krupo.kps@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-03-30 08:02:49 +03:00
Dave Airlie
a930c2c612 radv: fix mask attribs properly.
some days it just doesn't pay to get out of bed.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-30 13:09:30 +10:00
Dave Airlie
aa27a9f687 radv: fix regression with mask attrib setting code.
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-30 12:07:32 +10:00
Dave Airlie
2b35b60df1 radv: move to using nir clip/cull merge pass.
Doing this before tessellation makes doing some bits of
tessellation a bit cleaner. It also cleans up a bit of the
llvm generator code.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-30 11:04:56 +10:00
George Kyriazis
5079c277b5 swr: [scons] Fix windows build
Fix codegen build break that was introduced earlier

v2: update rules for gen_knobs.cpp and gen_knobs.h

v3: Introduce bldroot and revert generator file changes, making patch simpler.

Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2017-03-29 18:52:07 -05:00
Craig Stout
1da7a11de8 anv/cmd_buffer: fix host memory leak
push_constants must be free'd.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100452
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
2017-03-29 14:32:32 -07:00
Timothy Arceri
16debc652a mesa/glthread: fallback to sync if count validation fails
The old code would sync and then throw a cryptic error message.
There is no need for a custom error, we can just fallback to
the real function and have it do proper validation.

Fixes piglit test:
glsl-uniform-out-of-bounds

Which was returning the wrong error code.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 08:23:00 +11:00
Timothy Arceri
18f4c93b02 mesa/glthread: add async support to glProgramUniform*() functions
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 08:22:51 +11:00
Timothy Arceri
1ea73b9c61 mesa/glthread: print out syncs when MARSHAL_MAX_CMD_SIZE is exceeded
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-30 08:19:07 +11:00
Jason Ekstrand
9aba81b160 anv/batch_chain: Handle another OOM in cmd_buffer_execbuf
Found by inspection while rebasing other patches.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-29 09:39:49 -07:00
Philipp Zabel
83e9de25f3 st/mesa: EGLImageTarget* error handling
Stop trying to specify texture or renderbuffer objects for unsupported
EGL images. Generate the error codes specified in the OES_EGL_image
extension.

EGLImageTargetTexture2D and EGLImageTargetRenderbuffer would call
the pipe driver's create_surface callback without ever checking that
the given EGL image is actually compatible with the chosen target
texture or renderbuffer. This patch adds a call to the pipe driver's
is_format_supported callback and generates an INVALID_OPERATION error
for unsupported EGL images. If the EGL image handle does not describe
a valid EGL image, an INVALID_VALUE error is generated.

v2: fixed get_surface to actually use the usage and error parameters

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-29 18:04:42 +02:00
Philipp Zabel
d10172d527 st/mesa: move st_manager_get_egl_image_surface into st_cb_eglimage.c
The only callers are here, and we will add generation of GL errors in
the following patch.  Rename the function to st_egl_image_get_surface,
pass the gl_context instead of st_context, and move the cast from
GLeglImageOES to void* into st_egl_image_get_surface.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-29 18:04:12 +02:00
Alejandro Piñeiro
2f8d6bd578 i965: expose BRW_OPCODE_[F32TO16/F16TO32] name on gen8+
Technically those hw operations are only available on gen7, as gen8+
support the conversion on the MOV. But, when using the builder to
implement nir operations (example: nir_op_fquantize2f16), it is not
needed to do the gen check. This check is done later, on the final
emission at brw_F32TO16 (brw_eu_emit), choosing between the MOV or the
specific operation accordingly.

So in the middle, during optimization phases those hw operations can
be around for gen8+ too.

Without this patch, several (at least 95) vulkan-cts quantize tests
crashes when using INTEL_DEBUG=optimizer. For example:
dEQP-VK.spirv_assembly.instruction.graphics.opquantize.too_small_vert

v2: simplify the code using GEN_GE (Ilia Mirkin)
v3: tweak brw_instruction_name instead of changing opcode_descs
    table, that is used for validation (Matt Turner)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-03-29 17:34:15 +02:00
Marek Olšák
a2db9f9ff4 mesa: remove dd_function_table::BindProgram
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-29 15:44:00 +02:00
Marek Olšák
e81ee82119 r200: remove BindProgram
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-29 15:44:00 +02:00
Marek Olšák
bbb5561007 i915: remove BindProgram
The same thing is done in i915_update_program called by i915InvalidateState.
Why do it twice.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-29 15:44:00 +02:00
Marek Olšák
96a1c2406d mesa: don't use _NEW_TEXTURE mainly in mesa/main
v2: add missing %s

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-29 15:44:00 +02:00
Marek Olšák
d68150f15d mesa: split _NEW_TEXTURE into _NEW_TEXTURE_OBJECT & _NEW_TEXTURE_STATE
No performance testing has been done, because it makes sense to make this
change regardless of that. Also, _NEW_TEXTURE is still used in many places,
but the obvious occurences are replaced here.

It's now possible to split _NEW_TEXTURE_OBJECT further.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-29 15:44:00 +02:00
Marek Olšák
226ff6aa30 mesa: inline _mesa_update_texture
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-29 15:44:00 +02:00
Jose Fonseca
bb9faba172 appveyor: Update dependencies.
- Use explicit versions everywhere.
- Avoid deprecate `--egg` pip option.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-03-29 11:53:03 +01:00
Jose Fonseca
ecfafdcbf5 c11/threads: Include thr/xtimec.h for xtime definition when building with MSVC.
MSVC has been including a xtime definition in thr/xtimec.h ever since
MSVC 2013 (which is the minimum we require for building Mesa), and
including it prevents duplicate definitions when it gets included by
LLVM.

In fact, it looks that MSVC has been including a partial C11 threads
implementation too for some time, which we should consider migrating to
once we eliminate the use of _MTX_INITIALIZER_NP in our tree.

Thanks to the anonymous helper from
https://bugs.freedesktop.org/show_bug.cgi?id=100201#c4 for spotting
this.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100201
CC: "17.0" <mesa-stable@lists.freedesktop.org>
2017-03-29 11:53:03 +01:00
Timothy Arceri
e44cba540e mesa: update lower_jumps tests after bug fix
This change updates the tests to reflect the IR after
the following bug fix.

Fixes: c1096b7f1d ("glsl: fix lower jumps for returns when loop is
                      inside an if")

Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Bugzilla: https://bugs.freedesktop.org/100441
2017-03-29 20:53:06 +11:00
Thomas Hellstrom
ba8df2286a gbm/dri: Flush after unmap
Drivers may queue dma operations on the context at unmap time so we need
to flush to make sure the data gets to the bo. Ideally the application
would take care of this, but since there appears to be no exported gbm
flush functionality we need to explicitly flush at unmap time.

This fixes a problem where kmscube on vmwgfx in rgba textured mode would
render using an uninitialized texture rather than the intended
rgba pattern.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-03-29 09:17:21 +02:00
Bas Nieuwenhuizen
3df410069a radv: Enable sparseBinding feature.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-29 08:50:55 +02:00
Bas Nieuwenhuizen
b20af5c8d7 radv/amdgpu: Use reference counting for bos.
Per the Vulkan spec, memory objects may be deleted before the buffers
and images using them are deleted, although those resources then
cannot be used except for deletion themselves.

For the virtual buffers, we need to access them on resource destruction
to unmap the regions, so this results in a use-after-free. Implement
reference counting to avoid this.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-29 08:50:48 +02:00
Bas Nieuwenhuizen
e527e62e75 radv: Implement sparse memory binding.
v2: Only submit when semaphores are specified.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-29 08:50:41 +02:00
Bas Nieuwenhuizen
6154efc193 radv: Implement sparse image creation.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-29 08:50:37 +02:00
Bas Nieuwenhuizen
ef0e505d02 radv: Implement sparse buffer creation.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-29 08:50:33 +02:00
Bas Nieuwenhuizen
715df30a4e radv/amdgpu: Add winsys implementation of virtual buffers.
v2: - Added comments.
    - Fixed a double unmap bug.
    - Actually unmap the non-edge old ranges.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-29 08:50:17 +02:00
Bas Nieuwenhuizen
78ee8b3f84 radv: Assert when setting 0 registers in a sequence.
To catch more of those hangs early.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2017-03-29 01:58:16 +02:00
Jason Ekstrand
f3673db3d6 anv/cmd_buffer: Refactor flush_pipeline_select_*
While having the _3d and _gpgpu versions is nice, there's no reason why
we need to have duplicated logic for tracking the current pipeline.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-03-28 14:57:09 -07:00
Jason Ekstrand
6baae9625d anv: Flush caches prior to PIPELINE_SELECT on all gens
The programming note that says we need to do this still exists in the
SkyLake PRM and, from looking at the bspec, seems like it may apply to
all hardware generations SNB+.  Unfortunately, this isn't particularly
clear cut since there is also language in the bspec that says you can
skip the flushing and stall to get better throughput.  Experimentation
with the "Car Chase" benchmark in GL seems to indicate that some form of
flushing is still needed.  This commit makes us do the full set of
flushes regardless of hardware generation.  We can always reduce the
flushing later.

Reported-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
2017-03-28 14:57:08 -07:00
Jason Ekstrand
0fe3dcce4c anv/cmd_buffer: Fix bad indentation
A bunch of code was indented in such a way that it looked like it went
with the if statement above but it definitely didn't.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
2017-03-28 14:57:06 -07:00
Jason Ekstrand
01a65dc43b anv/cmd_buffer: Apply flush operations prior to executing secondaries
This fixes rendering issues in the Vulkan port of skia on some hardware.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-03-28 14:56:55 -07:00
Jason Ekstrand
9319ef96fd anv/blorp: Use anv_get_layerCount everywhere
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-03-28 14:41:48 -07:00
Jason Ekstrand
1b8fa8dd79 anv: Make anv_get_layerCount a macro
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-03-28 14:41:47 -07:00
Dave Airlie
93d61e4945 radv: only emit ps_input_cntl is we have any to output
Otherwise we get GPU hangs.

Reported-by: Alex Smith <asmith@feralinteractive.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-28 20:12:10 +01:00
Adam Jackson
f208bdc0d2 glx: Remove #include <GL/glxint.h>
We're not using anything in it, and we don't want to inherit struct
definitions from some other package anyway.

Signed-off-by: Adam Jackson <ajax@redhat.com>
2017-03-28 14:48:12 -04:00
Julien Isorce
7ee91af300 r600g: check NULL return from r600_aligned_buffer_create
Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-28 18:27:55 +01:00
Julien Isorce
699cce3493 st_cb_bitmap: check NULL return from u_upload_alloc
Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-28 18:27:55 +01:00
Julien Isorce
4a5e779b5f si_compute: check NULL return from u_upload_alloc
Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-28 18:27:53 +01:00
Julien Isorce
c5fe99eec2 r600g: check NULL return from u_upload_alloc
Like done in si_state_draw.c::si_draw_vbo

u_upload_alloc can fail, i.e. set output param *ptr to NULL, for 2 reasons:
alloc fails or map fails. For both there is already a fprintf/stderr in
radeon_create_bo and radeon_bo_do_map.

In src/gallium/drivers/ it is a common usage to just avoid to crash by doing
a silent check. But defer fprintf where the error comes from, libdrm calls.

Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-28 17:54:15 +01:00
Tim Rowley
749cf3be6e swr: fix llvm-5.0.0 build bustage
Handle rename of llvm AttributeSet to AttributeList in the same
fashion as ac_llvm_helper.cpp.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-28 11:46:58 -05:00
Tim Rowley
79d92a72d5 swr: [rasterizer jitter] fix llvm-5.0.0 build bustage
Add CreateAlignmentAssumptionHelper to gen_llvm_ir_macros.py ignore list.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-28 11:46:58 -05:00
Chad Versace
d1032a047b isl: Drop unused isl_surf_init_info::min_pitch
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-28 09:44:44 -07:00
Chad Versace
6cbc13d94c intel: Fix requests for exact surface row pitch (v2)
All callers of isl_surf_init() that set 'min_row_pitch' wanted to
request an *exact* row pitch, as evidenced by nearby asserts, but isl
lacked API for doing so. Now that isl has an API for that, update the
code to use it.

v2: Assert that isl_surf_init() succeeds because the callers assume
    it.  [for jekstrand]

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> (v1)
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
2017-03-28 09:44:44 -07:00
Chad Versace
e9017d58dc isl: Let isl_surf_init's caller set the exact row pitch (v2)
The caller does so by setting the new field
isl_surf_init_info::row_pitch.

v2: Validate the requested row_pitch.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
2017-03-28 09:44:44 -07:00
Chad Versace
23802dafc2 isl: Validate the calculated row pitch (v45)
Validate that isl_surf::row_pitch fits in the below bitfields,
if applicable based on isl_surf::usage.

    RENDER_SURFACE_STATE::SurfacePitch
    RENDER_SURFACE_STATE::AuxiliarySurfacePitch
    3DSTATE_DEPTH_BUFFER::SurfacePitch
    3DSTATE_HIER_DEPTH_BUFFER::SurfacePitch

v2:
  -Add a Makefile dependency on generated header genX_bits.h.
v3:
  - Test ISL_SURF_USAGE_STORAGE_BIT too. [for jekstrand]
  - Drop explicity dependency on generated header. [for emil]
v4:
  - Rebase for new gen_bits_header.py script.
  - Replace gen_10x with gen_device_info*.
v5:
  - Drop FINISHME for validation of GEN9 1D row pitch. [for jekstrand]
  - Reformat bit tests. [for jekstrand]

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v4)
2017-03-28 09:44:44 -07:00
Chad Versace
f0eaf38db2 genxml: New generated header genX_bits.h (v6)
genX_bits.h contains the sizes of bitfields in genxml instructions,
structures, and registers. It also defines some functions to query those
sizes.

isl_surf_init() will use the new header to validate that requested
pitches fit in their destination bitfields.

What's currently in genX_bits.h:

  - Each CONTAINER::Field from gen*.xml that has a bitsize has a macro
    in genX_bits.h:

        #define GEN{N}_CONTAINER_Field_bits {bitsize}

  - For each set of macros whose name, after stripping the GEN prefix,
    is the same, genX_bits.h contains a query function:

      static inline uint32_t __attribute__((pure))
      CONTAINER_Field_bits(const struct gen_device_info *devinfo);

v2 (Chad Versace):
  - Parse the XML instead of scraping the generated gen*_pack.h headers.

v3 (Dylan Baker):
  - Port to Mako.

v4 (Jason Ekstrand):
  - Make the _bits functions take a gen_device_info.

v5 (Chad Versace):
  - Fix autotools out-of-tree build.
  - Fix Android build. Tested with git://github.com/android-ia/manifest.
  - Fix macro names. They were all missing the "_bits" suffix.
  - Fix macros names more. Remove all double-underscores.
  - Unindent all generated code. (It was floating in a sea of whitespace).
  - Reformat header to appear human-written not machine-generated.
  - Sort gens from high to low. Newest gens should come first because,
    when we read code, we likely want to read the gen8/9 code and ignore
    the gen4 code. So put the gen4 code at the bottom.
  - Replace 'const' attributes with 'pure', because the functions now
    have a pointer parameter.
  - Add --cpp-guard flag. Used by Android.
  - Kill class FieldCollection. After Jason's rewrite, it was just
    a dict.

v6 (Chad Versace):
  - Replace `key not in d.keys()` with `key not in d`. [for dylan]

Co-authored-by: Dylan Baker <dylan@pnwbakers.com>
Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v5)
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> (v6)
2017-03-28 09:44:44 -07:00
Tim Rowley
3974cfea25 swr: [rasterizer core] Disable inline function expansion
Disable expansion in windows Debug builds.

Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
2017-03-28 11:24:44 -05:00
Tim Rowley
1c7224c85f swr: [rasterizer common] Use C++ thread_local keyword
Allows use of thread_local objects with constructors.

Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
2017-03-28 11:24:39 -05:00
Tim Rowley
aee5276375 swr: [rasterizer core] SIMD16 Frontend WIP
Implement widened clipper and binner interfaces for SIMD16.

Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
2017-03-28 11:24:33 -05:00
Tim Rowley
aea737e12e swr: [rasterizer core] Don't bind single-threaded contexts
Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
2017-03-28 11:24:27 -05:00
Tim Rowley
4cd0b1bb2c swr: [rasterizer core] Enable SIMD16
Make the AVX512 insert/extract intrinsics KNL-compatible

Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
2017-03-28 11:24:21 -05:00
Tim Rowley
ec51e8ecfe swr: [rasterizer jitter] Clean up EngineBuilder construction
Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
2017-03-28 11:24:14 -05:00
Tim Rowley
89b83f4b1e swr: [rasterizer codegen] add cmdline to archrast gen files
Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
2017-03-28 11:24:09 -05:00
Tim Rowley
549b9d2e9f swr: [rasterizer core] SIMD16 Frontend WIP
Fix GS and streamout.

Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
2017-03-28 11:23:45 -05:00
Tim Rowley
fee3fc018b swr: [rasterizer codegen] Refactor codegen
Move common codegen functions into gen_common.py.

v2: change gen_knobs.py to find the template file internally, like
the rest of the gen scripts.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-28 11:23:04 -05:00
Juan A. Suarez Romero
caa616ccc4 tests/cache_test: allow crossing mount points
When using an overlayfs system (like a Docker container), rmrf_local()
fails because part of the files to be removed are in different mount
points (layouts). And thus cache-test fails.

Letting crossing mount points is not a big problem, specially because
this is just for a test, not to be used in real code.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-28 18:00:39 +02:00
Emil Velikov
0f9a0cb5f5 glcpp/tests/glcpp-test-cr-lf: error out if we cannot find any tests
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-28 15:31:24 +01:00
Emil Velikov
d8096b75aa glcpp/tests/glcpp-test-cr-lf: correctly set/use srcdir/abs_builddir
Otherwise manual invokation of the script from elsewhere than
`dirname $0` will fail.

With these all the artefacts should be created in the correct location,
and thus we can remove the old (and slighly strange) clean-local line.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-28 15:31:24 +01:00
Emil Velikov
cf77cdce83 glcpp/tests: update testname in help string
Rather than hardcoding glcpp/other use `basename "$0"` which expands
appropriatelly.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-28 15:31:24 +01:00
Emil Velikov
4ea4fbf93a glcpp/tests/glcpp-test: error out if we cannot find any tests
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-28 15:31:24 +01:00
Emil Velikov
182d48ceb9 glcpp/tests/glcpp-test: print only the test basename
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-28 15:31:24 +01:00
Emil Velikov
addf62946d glcpp/tests/glcpp-test: set srcdir/abs_builddir variables
Current definitions work fine for the manual invokation of the script,
although the whole script does not consider that one can run it OOT.

The latter will be handled with latter patches, although it will be
extensively using the two variables.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-28 15:31:24 +01:00
Emil Velikov
ee8aea3572 glsl/tests/optimization-test: 'echo' only folders which has generators
The current "let's print any folder which exists" is simply confusing.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-28 15:31:24 +01:00
Emil Velikov
79a95f19e6 glsl/tests/optimization-test: print only the test basedir/name
The relative/absolute path brings little to no benefit in being printed
as testname. Trim it out.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-28 15:31:24 +01:00
Emil Velikov
33cd136fa2 glsl/tests/optimization-test: error if zero tests were executed
We don't want to lie ourselves that 'everything is fine' when no tests
were found/ran.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-28 15:31:23 +01:00
Emil Velikov
421115a729 glsl/tests/optimization-test: pass glsl_test as argument
Rather than hardcoding the binary location (which ends up wrong in a
number of occasions) in the python script, pass it as argument.

This allows us to remove a couple of dirname/basename workarounds that
aimed to keep this working, and succeeded in the odd occasion.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-28 15:31:23 +01:00
Emil Velikov
7d2a1394bb glsl/tests/optimization-test: error out if we fail to generate any tests
v2: use -eq over a string comparison (Eric)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-28 15:31:23 +01:00
Emil Velikov
86a937d264 glsl/tests/optimization-test: correctly manage srcdir/builddir
At the moment we look for generator script(s) in builddir while they
are in srcdir, and we proceed to generate the tests and expected output
in srcdir, which is not allowed.

To untangle:
 - look for the generator script in the correct place
 - generate the files in builddir, by extending create_test_cases.py to
use --outdir

With this in place the test passes `make check' for OOT builds - would
that be as standalone or part of `make distcheck'

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-28 15:31:23 +01:00
Emil Velikov
a7d9f0a361 glsl/tests/optimisation-test: ensure that compare_ir is available
Bail out early if the script is not where we expect it to be.

v2: use -f instead of -e. latter returns true on folder(s)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-28 15:31:23 +01:00
Emil Velikov
9083c625f5 glsl/tests/optimization-test: correctly set compare_ir
Now that we have srcdir we can use it to correctly manage/point to the
script. Effectively fixing OOT invokation of `make check'.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-28 15:31:23 +01:00
Emil Velikov
44b6422258 glsl/tests/optimization-test: add fallback srcdir/abs_builddir defines
There is no robust way to detect either one, so simply hope for the best
and warn just in case.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-28 15:31:23 +01:00
Emil Velikov
05bc5b35a7 glsl/tests/optimisation-test: make sure that $PYTHON2 is set/available
Otherwise we'll fail when invoking the script outside of "make check"

v2: use -ne over a string comparison (Eric)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-28 15:31:23 +01:00
Emil Velikov
bd4be79fc5 glsl/tests/warnings-test: print only the test basename
Spamming the log with the (in some cases extremely long) test location
is of limited use.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-28 15:31:23 +01:00
Emil Velikov
1c58d08bd9 glsl/tests/warnings-test: error if zero tests were executed
We don't want to lie ourselves that 'everything is fine' when no tests
were found/ran.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-28 15:31:22 +01:00
Emil Velikov
493fa69e37 glsl/tests/warnings-test: correctly manage srcdir/builddir
Before this commit, we would effectively fail to run any of the test in
a OOT builds.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-28 15:31:22 +01:00
Emil Velikov
81ccc7a484 glsl/tests/warnings-test: add fallback srcdir/abs_builddir defines
There is no robust way to detect either one, so simply hope for the best
and warn just in case.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-28 15:31:22 +01:00
Emil Velikov
4b366b171d glsl/tests/warnings-test: error out if glsl_compiler is missing
... or non-executable, in particular.

v2: use test -x (Eric)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-28 15:31:22 +01:00
Emil Velikov
1d93fa7be4 glsl: automake: export abs_builddir for the tests
We're going to use them with the next commits to determine where to put
the generated tests and/or built binaries.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-28 15:31:22 +01:00
Emil Velikov
841f0d2c58 glsl/tests: automake: cleanup all artefacts during clean-local
With later commits we'll fix the generators to produce the files in the
correct location. That in itself will cause an issue since the files
will be left dangling and make distcheck will fail.

v2: Use -r only as needed (Eric)

Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-28 15:31:22 +01:00
Nayan Deshmukh
3472be2bfd st/va: remove assert for single slice
we anyway allow for multiple slices

v2: do not remove assert to check for buf->size

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-03-28 12:08:54 +02:00
Nicolai Hähnle
21ba6543be radeonsi: use DMA for clears with unaligned size
Only a small tail needs to be uploaded manually.

This is only partly a performance measure (apps are expected to use
aligned access). Mostly it is preparation for sparse buffers, which the
old code would incorrectly have attempted to map directly.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-28 10:22:14 +02:00
Nicolai Hähnle
f0d9af772e radeonsi: CP DMA clear supports unaligned destination addresses
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-28 10:22:12 +02:00
Nicolai Hähnle
d9014952f5 radeonsi: remove the early-out for SDMA in si_clear_buffer
This allows the next patches to be simple while still being able
to make use of SDMA even in some unusual cases.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-28 10:22:01 +02:00
Dave Airlie
239a9224a3 radv: move shader stages calculation to pipeline.
With tess this becomes a bit more complex. so move to pipeline
for now.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-28 17:40:33 +10:00
Dave Airlie
0232ea8025 radv: move pa_cl_vs_out_cntl calculation to pipeline
This also takes the side band setting code from radeonsi.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-28 17:40:29 +10:00
Dave Airlie
92e9c14a6a radv: move calculating fragment shader i/os to pipeline.
There is no need to calculate this on each command submit.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-28 17:40:20 +10:00
Dave Airlie
4b467c759e radv: move shader_z_format calculation to pipeline.
No need to recalculate this every time.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-28 17:40:17 +10:00
Dave Airlie
8996fdbf61 radv: move db_shader_control calculation to pipeline.
There is no need to recalculate this every time.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-28 17:40:14 +10:00
Dave Airlie
cd33a5c1cb radv: move vgt_gs_mode value to pipeline.
No need to recalculate this everytime.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-28 17:40:08 +10:00
Dave Airlie
d43691ce77 radv: add parameter to emit_waitcnt.
This is just a precursor for tess support, which needs to
pass different values here.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-28 17:40:03 +10:00
Dave Airlie
931a8d0c9a radv: rework vertex/export shader output handling
In order to faciliate adding tess support, split the vs/es
output info into a separate block, so we make it easier to
have the tess shaders export the same info.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-28 17:39:59 +10:00
Dave Airlie
ae0551b4b3 radv: fix ia_multi_vgt_param for instanced vs indirect draw.
The logic was different than radeonsi, fix it up before adding
tess support.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-28 17:39:55 +10:00
Dave Airlie
a8b8e542c2 radv: handle NULL multisample state.
If rasterization is disabled, we can get a NULL multisample
state.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-28 17:39:38 +10:00
Bas Nieuwenhuizen
a8c51b1cd9 radv: flush DB cache before and after HTILE decompress.
It reads @ writes the DB cache, and we haven't flushed dst caches yet,
so DB cache may be stale. Also the user might be shader read (and probably is),
so also flush after.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
CC: <mesa-stable@lists.freedesktop.org>
Fixes: f4e499ec79 ("radv: add initial non-conformant radv vulkan driver")
2017-03-28 02:51:40 +02:00
Anuj Phogat
f5c32b0762 i965: Delete tile resource mode code
Yf/Ys tiling never got used in i965 due to not delivering
the expected performance benefits. So, this patch is deleting
this dead code in favor of adding it later in ISL when we
actually find it useful. ISL can then share this code between
vulkan and GL.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-27 16:17:18 -07:00
Anuj Phogat
bcee124ef7 i965: Delete fast copy blit code
Fast copy blit was primarily added to support Yf/Ys detiling.
But, Yf/Ys tiling never got used in i965 due to not delivering
the expected performance benefits. Also, replacing legacy blits
with fast copy blit didn't help the benchmarking numbers. This
is probably due to a h/w restriction that says "start pixel for
Fast Copy blit should be on an OWord boundary". This restriction
causes many blit operations to skip fast copy blit and use legacy
blits. So, this patch is deleting this dead code in favor of
adding it later when we actually find it useful.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-27 16:17:18 -07:00
Kenneth Graunke
088449487e i965: Require Kernel 3.6 for Gen4-5 platforms.
We've already required Kernel 3.6 on Gen6+ since Mesa 9.2 (May 2013,
commit 92d2f5acfa).  It seems reasonable
to require it for Gen4-5 as well, bumping the requirement from 2.6.39.

This is necessary for glClientWaitSync with a timeout to work, which
is a feature we expose on Gen4-5.  Without it, we would fall back to an
infinite wait, which is pretty bad.

See kernel commit 172cf15d18889313bf2c3bfb81fcea08369274ef in 3.6+.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-03-27 15:57:50 -07:00
Timothy Arceri
99dd3d1c3b glsl: fix spelling of embedded in comment 2017-03-28 09:56:27 +11:00
Timothy Arceri
c1096b7f1d glsl: fix lower jumps for returns when loop is inside an if
Previously we would just escape the loop and move everything
following the loop inside the if to the else branch of a new if
with a return flag conditional. However everything outside the
if the loop was nested in would still get executed.

Adding a new return to the then branch of the new if fixes this
and we just let a follow pass clean it up if needed.

Fixes:
tests/spec/glsl-1.10/execution/vs-nested-return-sibling-loop.shader_test
tests/spec/glsl-1.10/execution/vs-nested-return-sibling-loop2.shader_test

Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-03-28 09:54:31 +11:00
Dave Airlie
b640dfcd05 radv: don't emit no color formats. (v3)
If we had no rasterization, we'd emit SPI color
format as all 0's the hw dislikes this, add the workaround
from radeonsi.

Found while debugging tessellation

v2: handle at pipeline stage, we have to handle
it after we process the fragment shader. (Bas)
v3: simplify even further, remove old fallback.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-28 08:39:14 +10:00
Vinson Lee
f1f1cb41d0 mesa/tests: Link main-test with CLOCK_LIB.
Fix 'make check' linking error with glibc < 2.17.

  CXXLD  main-test
../../../../src/mesa/.libs/libmesa.a(libmesautil_la-u_queue.o): In function `u_thread_get_time_nano':
src/util/../../src/util/u_thread.h:84: undefined reference to `clock_gettime'

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
2017-03-27 14:36:34 -07:00
Matt Turner
7dccd38b40 i965/fs: Don't emit SEL instructions for type-converting MOVs.
SEL can only convert between a few integer types, which we basically
never do.

Fixes fs/vs-double-uniform-array-direct-indirect-non-uniform-control-flow
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
2017-03-27 10:59:42 -07:00
Xu Randy
004468de14 anv/blorp: Fix a crash in CmdClearColorImage
We should use anv_get_layerCount() to access layerCount of VkImageSub-
resourceRange in anv_CmdClearColorImage and anv_CmdClearDepthStencil-
Image, which handles the VK_REMAINING_ARRAY_LAYERS (~0) case.

Test: Sample multithreadcmdbuf from LunarG can run without crash

Signed-off-by: Xu Randy <randy.xu@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-03-27 07:43:17 -07:00
Brian Paul
804676f384 mesa: simplify code around 'variable_data' in marshal.c
Remove needless pointer increments, unneeded vars, etc.  Untested.
Plus, fix a couple comments.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-27 08:30:43 -06:00
Brian Paul
b71ef173a5 st/mesa: move duplicated st_ws_framebuffer() function into header file
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-27 08:30:43 -06:00
Andres Gomez
6255cc654d glsl: Interface Block instances don't need linking validation
From page 45 (page 52 of the PDF) of the GLSL ES 3.00 v.6 spec:

  " When instance names are present on matched block names, it is
    allowed for the instance names to differ; they need not match for
    the blocks to match.

From page 51 (page 57 of the PDF) of the GLSL 4.30 v.8 spec:

  " When instance names are present on matched block names, it is
    allowed for the instance names to differ; they need not match for
    the blocks to match."

Therefore, no cross linking validation is needed for the instance name
of an Interface Block.

This patch will make that no link error will be reported on a program
like this:

    "# VS

    layout(binding = 1) Block1 {
      vec4 color;
    } uni_block;

    ...

    # FS

    layout(binding = 2) Block2 {
      vec4 color;
    } uni_block;

    ..."

Fixes GL45-CTS.enhanced_layouts.ssb_layout_qualifier_conflict

Signed-off-by: Andres Gomez <agomez@igalia.com>
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-27 12:47:21 +03:00
Andres Gomez
40b09ed15c glsl: UBOs and SSBOs must match the binding qualifier too
From page 140 (page 147 of the PDF) of the GLSL ES 3.10 v.4 spec:

  " 9.2 Matching of Qualifiers

    The following tables summarize the requirements for matching of
    qualifiers.  It applies whenever there are two or more matching
    variables in a shader interface.

    Notes:

    1. Yes means the qualifiers must match.

    ...

    9.2.1 Linked Shaders

    | Qualifier | Qualifier | in/out | Default  | uniform | buffer|
    |   Class   |           |        | Uniforms |  Block  | Block |

    ...

    |  Layout   |  binding  |  N/A   |   Yes    |   Yes   |  Yes  |"

From page 93 (page 110 of the PDF) of the GL 4.2 (Core Profile) spec:

  " 2.11.7 Uniform Variables

    ...

    Uniform Blocks

    ...

    When a named uniform block is declared by multiple shaders in a
    program, it must be declared identically in each shader. The
    uniforms within the block must be declared with the same names and
    types, and in the same order. If a program contains multiple
    shaders with different declarations for the same named uniform
    block differs between shader, the program will fail to link."

From page 129 (page 150 of the PDF) of the GL 4.3 (Core Profile) spec:

  " 7.8 Shader Buffer Variables and Shader Storage Blocks

    ...

    When a named shader storage block is declared by multiple shaders
    in a program, it must be declared identically in each shader. The
    buffer variables within the block must be declared with the same
    names, types, qualification, and declaration order. If a program
    contains multiple shaders with different declarations for the same
    named shader storage block, the program will fail to link."

Therefore, if the binding qualifier differs between two linked Uniform
or Shader Storage Blocks of the same name, a link error should happen.

This patch will make that a link error will be reported on a program
like this:

    "# VS

    layout(binding = 1) Block {
      vec4 color;
    } uni_block1;

    ...

    # FS

    layout(binding = 2) Block {
      vec4 color;
    } uni_block2;

    ..."

Signed-off-by: Andres Gomez <agomez@igalia.com>
Cc: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-27 12:47:00 +03:00
Andres Gomez
bf15b2b515 glsl: on UBO/SSBOs link error reset the number of active blocks to 0
While it's legal to have an active blocks count > 0 on link failure.
Unless we actually assign memory for the blocks array we can end up
segfaulting in calls such as glUniformBlockBinding().

To avoid having to NULL check these api calls we simply reset the
block count to 0 if the array was not created.

Signed-off-by: Andres Gomez <agomez@igalia.com>
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-27 12:45:59 +03:00
Samuel Iglesias Gonsálvez
c4c02471f4 anv: enable sampling from fast-cleared images on SKL
A resolve is not needed on Skylake in this case. We were forcing
a resolve because we set the input_aux_usage to ISL_AUX_USAGE_NONE.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-03-27 06:32:24 +02:00
Grazvydas Ignotas
b97faea162 glsl, st/shader_cache: check the whole sha1 for zero
The checks were only looking at the first byte, while the intention
seems to be to check if the whole sha1 is zero. This prevented all
shaders with first byte zero in their sha1 from being saved.

This shaves around a second from Deus Ex load time on a hot cache.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-27 15:05:10 +11:00
Grazvydas Ignotas
f2d4d11611 glsl/shader_cache: restore evicted shader keys
Even though the programs themselves stay in cache and are loaded, the
shader keys can be evicted separately. If that happens, unnecessary
compiles are caused that waste time, and no matter how many times the
program is re-run, performance never recovers to the levels of first
hot cache run. To deal with this, we need to refresh the shader keys
of shaders that were recompiled.

An easy way to currently observe this is running Deux Ex, then piglit
and Deux Ex again, or deleting just the cache index. The later is
causing over a minute of lost time on all later Deux Ex runs, with this
patch it returns to normal after 1 run.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-27 09:10:16 +11:00
Axel Davy
bdf035ea6f st/nine: Use atomics for available_texture_mem
Resource dtor can be executed in the worker thread.
Use atomic to avoid threading safety issues.

CC: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Tested-by: James Harvey <lothmordor@gmail.com>
2017-03-26 23:10:38 +02:00
Axel Davy
bd85bb51c7 st/nine: Resolve deadlock in surface/volume dtors when using csmt
Surfaces and Volumes can be freed in the worker thread.

Without this patch, pending_uploads_counter could be non-zero
in the Surfaces or Volumes dtor, leading to deadlock.
Instead decrease properly the counter before releasing the
item.

Also avoid another potential deadlock if the item is not
properly unlocked: Do not call UnlockRect which will cause deadlock,
but free directly using the deadlock safe
nine_context_get_pipe_multithread.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99246

CC: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Tested-by: James Harvey <lothmordor@gmail.com>
2017-03-26 23:10:38 +02:00
Axel Davy
31f8b3babb st/nine: Fix user vertex data uploader with csmt
Fix regression caused by
abb1c645c4

The patch made csmt use context.pipe instead of
secondary_pipe, leading to thread safety issues.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2017-03-26 23:10:38 +02:00
Jose Fonseca
2ba991cbcd scons: Fix dependencies of marshal_generated.[ch].
These generated source files depend not only upon gl_and_es_API.xml, but
all other XML files that are included by it.

This change updates the generation rules to depend on all gen/*.xml
files, like done for other SCons generation rules, and should fix
incremental broken SCons builds due to missing dependencies.

Trivial.
2017-03-26 21:30:34 +01:00
Vinson Lee
641f629536 glsl: Link tests with CLOCK_LIB.
Fix 'make check' linking errors with glibc < 2.17.

  CXXLD  glsl/glsl_test
glsl/.libs/libglsl.a(libmesautil_la-u_queue.o): In function `u_thread_get_time_nano':
src/util/../../src/util/u_thread.h:84: undefined reference to `clock_gettime'

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
2017-03-25 01:23:04 -07:00
Timothy Arceri
425671f616 mesa/glthread: add custom marshalling for ClearBufferfv()
This is one of the main causes of syncs in Civ6.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-25 13:39:12 +11:00
Grazvydas Ignotas
b9e92334f7 util/disk_cache: don't deadlock on premature EOF
If we get EOF earlier than expected, the current read loops will
deadlock. This may easily happen if the disk cache gets corrupted.
Fix it by using a helper function that handles EOF.

Steps to reproduce (on a build with asserts disabled):
$ glxgears
$ find ~/.cache/mesa/ -type f -exec truncate -s 0 '{}' \;
$ glxgears # deadlock

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-25 13:08:37 +11:00
Chad Versace
7414326164 genxml: Add 3DSTATE_DEPTH_BUFFER to gen5.xml
isl will use this for validating the depth buffer pitch.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-24 19:07:05 -07:00
Grazvydas Ignotas
7d8ee4b4d0 tests/cache_test: mark arguments const
While at it, also fix up a failure message to not reference timestamp
and gpu dirs as those are no longer being made.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-25 12:46:18 +11:00
Rob Clark
d87ef8f77c freedreno: free compiler when screen is destroyed
Drop ir3_compiler_destroy(), since it is only ralloc_free() and we
shouldn't really have an ir3 dependency in core.  If some future hw
has a new compiler, as long as all it's resources are ralloc()d then
things will all just work.

(In practice, I suppose you never really see this leak, but removing
it at least cleans up some noise in valgrind.)

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-03-24 18:01:47 -04:00
Jason Ekstrand
e6621746dc genxml: Whitespace fixes
Some field names had extra spaces and some had places where we should
have had a space but didn't.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-03-24 15:00:37 -07:00
Jason Ekstrand
34c3f6a27f genxml: Replace "[N]" with "N"
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-03-24 15:00:37 -07:00
Jason Ekstrand
c2af555d6e genxml/gen6: Remove a couple of bogus values
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-03-24 15:00:37 -07:00
Jason Ekstrand
ec27402a8f genxml/gen8: Remove BLACK_LEVEL_CORRECTION_STATE
We've never used it, it only exists on gen8, and the name of the struct
contains piles of bad characters.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-03-24 15:00:37 -07:00
Jason Ekstrand
a6df637d26 genxml: Rename two MCS fields to Auxiliary Surface on gen7
This makes gen7 more consistent with gen8+

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-03-24 15:00:37 -07:00
Rob Clark
c03f6f12bb freedreno: fix memory leak
Otherwise blitter would still hold a ref to, for example, sampler-
views.

To reproduce:

   glmark2 -b desktop:duration=2 --run-forever

Fixes: a8e6734 ("freedreno: support for using generic clear path")
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-03-24 17:49:00 -04:00
Chad Versace
b3f81e06d4 genxml: Fix gen_zipped_file.py dependency
The gen*_xml.h files depend on gen_zipped_file.py, not the gen*_pack.h
files.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-24 14:38:22 -07:00
Chad Versace
c7c6c53adb genxml: Define GENXML_XML_FILES in Makefile.sources
The future header genX_bits.h will depend on GENXML_XML_FILES.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-24 14:38:15 -07:00
Jan Vesely
14b543bdc9 clover: use pipe_resource references
v2: buffers are created with one reference.
v3: add pipe_resource reference to mapping object
v4: rename to pres and drop inline initializers

CC: "17.0 13.0" <mesa-stable@lists.freedesktop.org>

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-03-24 15:57:47 -04:00
Kenneth Graunke
0a60ff4d8c i965: Fix symbolic size of next_offset[] array.
It's indexed by buffer, not stream.  BRW_MAX_SOL_BUFFERS and
MAX_VERTEX_STREAMS happen to both be 4, so there's no actual bug.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-03-24 12:21:50 -07:00
Kenneth Graunke
652d521408 i965: Remove pointless NULL check from Gen6 primitive counting code.
We create the BO when creating a transform feedback object, and only
destroy it when deleting that object.  So it won't be NULL.

CID: 1401410

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-03-24 12:21:06 -07:00
Marek Olšák
61926733f9 radeonsi: don't crash on compute shader compile failure
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-24 18:25:05 +01:00
Marek Olšák
518d834162 radeonsi: don't hang on shader compile failure
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-24 18:25:05 +01:00
Nicolai Hähnle
eebd0cd560 radeonsi: fix dvec[34] attributes sourced from current attribute state
The state tracker no longer uploads those attributes for us,
so we must conservatively upload the size of the largest
attribute, which is a dvec4.

Fixes a regression of GL45-CTS.gpu_shader_fp64.varyings and
GL45-CTS.vertex_attrib_64bit.limits_test.

Fixes: 9b91e0b54c ("radeonsi: allow unaligned vertex buffer offsets and strides on CIK-VI")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-24 17:35:21 +01:00
Emil Velikov
15603055fb anv: automake: ensure that the destination directory is created
Earlier commit unintentionally dropped the mkdir, as it was rebased.

Some versions of autotools will not create the output directory for
generated sources. Thus the issue went unnoticed by the original author.

Cc: Dylan Baker <dylan@pnwbakers.com>
Cc: Steven Newbury <steve@snewbury.org.uk>
Reported-by: Steven Newbury <steve@snewbury.org.uk> Fixes:
Fixes: 1610b3dede ("anv: don't pass xmlfile via stdin anv_entrypoints_gen.py")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-24 12:02:04 +00:00
Samuel Pitoiset
43f5a2c915 glsl_to_tgsi: don't rely on glsl types when visiting tex instructions
Instead add is_cube_shadow like is_cube_array.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-24 11:12:27 +01:00
Iago Toral Quiroga
129fd58131 anv/query: handle out of host memory without crashing in compute_query_result()
We don't need to make the caller (CmdCopyQueryPoolResults) aware of the
problem since compute_query_result() only emits state. The caller is also
expected to hit OOM in this scenario right after calling this function, but
it is already handling it safely.

Fixes:
dEQP-VK.api.out_of_host_memory.cmd_copy_query_pool_results

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-24 09:39:44 +01:00
Iago Toral Quiroga
ddb2bb3ed4 anv/pipeline: make FragCoord include sample positions when sample shading
We need to know if sample shading has been requested during shader
compilation since that affects the way fragment coordinates are
computed.

Notice that the semantics of fragment coordinates only depend on
whether sample shading has been requested, not on whether more
than one sample will actually be produced (that is,
minSampleShading and rasterizationSamples do not affect this
behavior).

Because this setting affects the code we generate for the shader, we also
need to include it in the WM prog key. Notice we don't need to alter the
OpenGL code because it doesn't ever use this behavior, so they key's
value is always false (the default).

Fixes:
dEQP-VK.glsl.builtin_var.fragcoord_msaa.*

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-24 08:11:53 +01:00
Iago Toral Quiroga
023ea3772d nir/lower_wpos_center: support adding sample position to fragment coordinate
According to section 14.6 of the Vulkan specification:

   "When sample shading is enabled, the x and y components of FragCoord
    reflect the location of the sample corresponding to the shader
    invocation."

So add a boolean parameter to the lowering pass to select this behavior
when we need it.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-24 08:11:53 +01:00
Iago Toral Quiroga
4da1832c00 anv: return VK_ERROR_DEVICE_LOST immeditely when device is known to be lost
If we know the device has been lost we should return this error code for
any command that can report it before we attempt to do anything with the
device.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-24 08:11:53 +01:00
Iago Toral Quiroga
50c8d2c1f7 anv/device: keep track of 'device lost' state
The Vulkan specs say:

   "A logical device may become lost because of hardware errors, execution
    timeouts, power management events and/or platform-specific events. This
    may cause pending and future command execution to fail and cause hardware
    resources to be corrupted. When this happens, certain commands will
    return VK_ERROR_DEVICE_LOST (see Error Codes for a list of such commands).
    After any such event, the logical device is considered lost. It is not
    possible to reset the logical device to a non-lost state, however the lost
    state is specific to a logical device (VkDevice), and the corresponding
    physical device (VkPhysicalDevice) may be otherwise unaffected. In some
    cases, the physical device may also be lost, and attempting to create a
    new logical device will fail, returning VK_ERROR_DEVICE_LOST."

This means that we need to track if a logical device has been lost so we can
have the commands referenced by the spec return VK_ERROR_DEVICE_LOST
immediately.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-24 08:11:53 +01:00
Iago Toral Quiroga
70194c9f1a anv/device: return VK_ERROR_DEVICE_LOST for errors during queue submissions
So that we don't have to do things like rolling back address relocations in
case that we ran into OOM after computing them, etc

Also, make sure that if the queue submission comes with a fence, we set it up
correctly so it behaves according to the spec after returning
VK_ERROR_DEVICE_LOST.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-24 08:11:53 +01:00
Timothy Arceri
adced4a2f9 mesa/marshal: add custom BufferData/BufferSubData marshalling
GL_AMD_pinned_memory requires memory to be aligned correctly, so
we skip marshalling in this case. Also copying the data defeats
the purpose of EXTERNAL_VIRTUAL_MEMORY_BUFFER_AMD.

Fixes GL_AMD_pinned_memory piglit tests when glthread is enabled.

Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-24 11:23:06 +11:00
Timothy Arceri
0a32b52a27 util/disk_cache: write cache entry keys to file header
This can be used to deal with key hash collisions from different
versions (should we find that to actually happen) and to find
which mesa version produced the cache entry.

V2: use blob created at cache creation.

v3: remove left over var from v1.

Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
2017-03-24 11:20:09 +11:00
Grazvydas Ignotas
5136c09e70 util/disk_cache: hash pointer size and gpu name into cache keys
This allows to get rid of the arch and gpu name directories.

v2: (Timothy Arceri) don't use an opaque data type to store
    pointer size and gpu name.

v3: (Timothy Arceri) use blob to store driver keys just make sure
    to store null terminator for strings, and make sure blob is
    defined by disk_cache and not it's users.

v4: (Timothy Arceri) fix typo, and make ptr_size a uint8_t.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-24 11:20:09 +11:00
Grazvydas Ignotas
feb716239e util/disk_cache: hash timestamps into the cache keys
Instead of using a directory, hash the timestamps into the cache keys
themselves. Since there is no more timestamp directory, there is no more
need for deleting the cache of other mesa versions and we rely on
eviction to clean up the old cache entries. This solves the problem of
using several incarnations of disk_cache at the same time, where one
deletes a directory belonging to the other, like when both OpenGL and
gallium nine are used simultaneously (or several different mesa
installations).

v2: using additional blob instead of trying to clone sha1 state

v3: (Timothy Arceri) don't use an opaque data type to store
    timestamp.

V4: (Timothy Arceri) use blob to store driver keys just make sure
    to store null terminator for strings, and make sure blob is
    defined by disk_cache and not it's users.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100091
2017-03-24 11:20:09 +11:00
Miklós Máté
7ceb1a4fa8 mesa: set thread name for glthread
Signed-off-by: Miklós Máté <mtmkls@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-24 10:00:19 +11:00
Matt Turner
7499bc7fd7 i965: Replace OPT_V() with OPT().
We want to be able to check the progress of each pass and dump the NIR
for debugging purposes if it changed.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:44 -07:00
Matt Turner
1be91bd9d8 i965/fs: Return progress from demote_sample_qualifiers().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:44 -07:00
Matt Turner
fd3351246c i965/fs: Return progress from move_interpolation_to_top().
And mark as static at the same time.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:44 -07:00
Matt Turner
e0f8daeb86 i965: Return progress from brw_nir_lower_uniforms().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:44 -07:00
Matt Turner
ef71af7356 nir: Return progress from nir_convert_from_ssa().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:44 -07:00
Matt Turner
abc8a702d0 nir: Return progress from nir_lower_io().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:44 -07:00
Matt Turner
a934b00222 nir: Return progress from nir_lower_regs_to_ssa().
And from nir_lower_regs_to_ssa_impl() as well.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:44 -07:00
Matt Turner
b0e72defc2 nir: Return progress from nir_lower_samplers().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:44 -07:00
Matt Turner
01548f9f01 nir: Return progress from nir_lower_atomics().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:44 -07:00
Matt Turner
0bd615d961 nir: Return progress from nir_lower_clamp_color_outputs().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:44 -07:00
Matt Turner
9dbf91f5c0 nir: Return progress from nir_lower_clip_fs().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:44 -07:00
Matt Turner
4e4927cd95 nir: Return progress from nir_lower_clip_vs().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:44 -07:00
Matt Turner
6077cc75aa nir: Return progress from nir_move_vec_src_uses_to_dest().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:43 -07:00
Matt Turner
a539e05d00 nir: Return progress from nir_lower_to_source_mods().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:43 -07:00
Matt Turner
5a7e4ae23d nir: Return progress from nir_lower_clip_cull_distance_arrays().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:43 -07:00
Matt Turner
19345fc160 nir: Return progress from nir_lower_var_copies().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:43 -07:00
Matt Turner
b831b8d2e1 nir: Return progress from nir_lower_load_const_to_scalar().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:43 -07:00
Matt Turner
adb157ddfd nir: Return progress from nir_lower_64bit_pack().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:43 -07:00
Matt Turner
0012a6144a nir: Return progress from nir_lower_doubles().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:43 -07:00
Matt Turner
c597f87739 nir: Return progress from nir_lower_vars_to_ssa().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:43 -07:00
Matt Turner
7d41bf8d7b nir: Fix syntax.
et is not an abbreviation.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:43 -07:00
Matt Turner
70c0455974 nir: Fix misspellings.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:43 -07:00
Matt Turner
d6e2bdfed3 nir: Stop using apostrophes to pluralize.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:43 -07:00
Leo Liu
54f9f34181 st/omx/enc: use PIPE_USAGE_STAGING for output buffer
Workaround an unknown bug with inside the transfer_map for certain
ASIC, also tested with un-affected ASICs, the performance actually
improved slightly.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-23 14:43:42 -04:00
Daniel Stone
378025ca8b gbm: Use unsigned for BO offset getter
The actual offset returned is uint32_t, however int64_t was used as the
return type from gbm_bo_get_offset to allow negative returns to signal
errors to the caller.

In case of an error getting the offset, the user will also be unable to
get the handle/FD, and thus have nothing to offset into. This means that
returning 0 as an error value is harmless, allowing us to change the
return type to uint32_t in order to avoid signed/unsigned confusion in
callers.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 15:28:41 +00:00
Eric Engestrom
ec0313fd58 REVIEWERS: add autogen.sh to the autoconf group
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
2017-03-23 14:50:51 +00:00
Eric Engestrom
0adc9832f5 docs/submittingpatches: add mention about legal disclaimers
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-23 14:21:48 +00:00
Julien Isorce
48b5f1cca7 r600_shader.c: fix indentation
Introduced by ad13bd2e51

Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-23 13:21:37 +00:00
Topi Pohjolainen
90633079ec glx: Prefer library path given by pkgconfig over the system
Recent change to use drmGetDevices2() made me realize that
build configured using

PKG_CONFIG_PATH=my_drm_lib_path/pkgconfig ./autogen.sh

considers the libdrm path gotten from pkgconfig only during
make. When invoking "make install" the relink command puts
system library ahead of the path gotten from pkgconfig
(and starts to fail as system libdrm isn't new enough).

This change forces the relink command to respect pkgconfig
settings.

It looks to me that in

https://bugs.freedesktop.org/show_bug.cgi?id=100259

with Emil et al considering it a libtool bug.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
[Emil Velikov: add inline comment]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-23 12:30:19 +00:00
Tapani Pälli
4f69573178 intel: move gen_decoder.* to DECODER_FILES
patch adds DECODER_FILES for libintel_common, this is so that platforms
such as Android not currently using this functionality can opt out.

Fixes: 7d84bb3 ("intel: Move tools/decoder.[ch] to common/gen_decoder.[ch].")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-23 14:05:19 +02:00
Tapani Pälli
bcae4eb502 android: fix vulkan build issues with anv_entrypoints
Patch fixes entrypoint generation for libmesa_anv_entrypoints that
still used old style of calling generator script.

Also small fixes to libmesa_vulkan_common where there was a typo
in target name (vulknan) and files were generated to wrong folder.

Fixes: 8211e3e6 ("anv: Generate anv_entrypoints header and code in one command")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-23 14:04:44 +02:00
Mauro Rossi
0ff8ac1b55 android: i965: generate code for OA counter queries
Automake generation rules are replicated for android.
$* macro was expected to return "hsw" but instead gives "hsw.{h,c}"
so $(basename $*) is used as a workaround
to set the correct --chipset option for brw_oa.py script.

Build tested with nougat-x86

Fixes: e565505 "i965: Add script to gen code for OA counter queries"
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Robert Bragg <robert@sixbynine.org>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-23 08:20:18 +02:00
Tapani Pälli
dc9ebc6ef1 android: rename Intel Vulkan library to match desktop one
Original naming was following Vulkan HAL naming scheme for no good
purpose and we need same binary name for build-id code.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-23 08:19:16 +02:00
Boyan Ding
51b7fae1ae nouveau: enable glsl/tgsi on-disk cache
v2: Fix argument to nouveau_screen_get_name()

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-03-22 22:51:35 -04:00
Eric Engestrom
e8875c7a87 REVIEWERS: add myself as a reviewer for EGL and docs
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-23 00:35:45 +00:00
Dylan Baker
4ee675d537 anv: Remove dead prototype from entrypoints
Spotted by Emil.

v2: - Add this patch

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
2017-03-22 16:22:00 -07:00
Dylan Baker
860beb99a6 anv: use cElementTree in anv_entrypoints_gen.py
It's written in C rather than pure python and is strictly faster, the
only reason not to use it that it's classes cannot be subclassed.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
2017-03-22 16:22:00 -07:00
Dylan Baker
9050138af7 anv: don't use Element.get in anv_entrypoints_gen.py
This has the potential to mask errors, since Element.get works like
dict.get, returning None if the element isn't found. I think the reason
that Element.get was used is that vulkan has one extension that isn't
really an extension, and thus is missing the 'protect' field.

This patch changes the behavior slightly by replacing get with explicit
lookup in the Element.attrib dictionary, and using xpath to only iterate
over extensions with a "protect" attribute.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
2017-03-22 16:22:00 -07:00
Dylan Baker
4d4697f868 anv: use dict.get in anv_entrypoints_gen.py
Instead of using an if and a check, use dict.get, which does the same
thing, but more succinctly.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
2017-03-22 16:22:00 -07:00
Dylan Baker
96a5f2a5ac anv: anv_entrypoints_gen.py: use reduce function.
Reduce is it's own reward.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
2017-03-22 16:22:00 -07:00
Dylan Baker
dd3830d11b anv: anv-entrypoints_gen.py: rename hash to cal_hash.
hash is reserved name in python, it's the interface to access an
object's hash protocol.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
2017-03-22 16:22:00 -07:00
Dylan Baker
8211e3e60d anv: Generate anv_entrypoints header and code in one command
This produces the header and the code in one command, saving the need to
call the same script twice, which parses the same XML file.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
2017-03-22 16:22:00 -07:00
Dylan Baker
383032c700 anv: anv_entrypoints_gen.py: directly write files instead of piping
This changes the output to be written as a file rather than being piped.
This had one critical advantage, it encapsulates the encoding. This
prevents bugs where a symbol (generally unicode like © [copyright]) is
printed and the system being built on doesn't have a unicode locale.

v2: - Update Android.mk
v3: - Don't generate both files at once
    - Fix Android.mk
    - drop --outdir, since the filename is passed in as an argument

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
2017-03-22 16:22:00 -07:00
Dylan Baker
a2a2bad2e2 anv: convert C generation to template in anv_entrypoints_gen.py
This produces a file that is identical except for whitespace, there is a
table that has 8 columns in the original and is easy to do with prints,
but is ugly using mako, so it doesn't have columns; the data is not
inherently tabular.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
2017-03-22 16:22:00 -07:00
Dylan Baker
0d8e22c5e4 anv: convert header generation in anv_entrypoints_gen.py to mako
This produces an identical file except for whitespace.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
2017-03-22 16:22:00 -07:00
Dylan Baker
ba1085c694 anv: Update "do not edit" comments with proper filename
This does two things, first it updates both the .h and the .c file to
have the same do not edit string. Second, it uses __file__ to ensure
that even if the file is moved or renamed that the name will be correct.

One thing to note is the use of '{{' and '}}' in the C template. This is
to instruct python to print a literal '{' and '}' respectively, rather
than treating the contents as a formatter specifier.

v3: - add this patch

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
2017-03-22 16:22:00 -07:00
Dylan Baker
ed9339bf26 anv: split main into two functions in anv_entrypoints_gen.py
This is groundwork for the next patches, it will allows porting the
header and the code to mako separately, and will also allow both to be
run simultaneously.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
2017-03-22 16:22:00 -07:00
Dylan Baker
1610b3dede anv: don't pass xmlfile via stdin anv_entrypoints_gen.py
It's slow, and has the potential for encoding issues.

v2: - pass xml file location via argument
    - update Android.mk

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
2017-03-22 16:22:00 -07:00
Dylan Baker
8017da8dd2 anv: make constants capitals in anv_entrypoints_gen.py
Again, it's standard python style.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
2017-03-22 16:22:00 -07:00
Dylan Baker
08a6d3b4ba anv: Use python style in anv_entrypoints_gen.py
These are all fairly small cleanups/tweaks that don't really deserve
their own patch.

- Prefer comprehensions to map() and filter(), since they're faster
- replace unused variables with _
- Use 4 spaces of indent
- drop semicolons from the end of lines
- Don't use parens around if conditions
- don't put spaces around brackets
- don't import modules as caps (ET -> et)
- Use docstrings instead of comments

v2: - Replace comprehensions with multiplication

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
2017-03-22 16:22:00 -07:00
Dylan Baker
abd72f2e35 anv: anv_entrypoints_gen.py: use a main function
This is just good practice.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
2017-03-22 16:22:00 -07:00
Alex Smith
bc5d587a80 radv: Invalidate L2 for TRANSFER_WRITE barriers
CP DMA and PKT3_WRITE_DATA (in CmdUpdateBuffer) don't (currently) write
through L2. Therefore, to make these writes visible to later accesses
we must invalidate L2 rather than just writing it back, to avoid the
possibility that stale data is read through L2.

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-23 09:20:31 +10:00
Vinson Lee
bb32ea4fc6 glsl: Link glsl_compiler with CLOCK_LIB.
Fix linking error on CentOS 6.

  CXXLD  glsl_compiler
glsl/.libs/libstandalone.a(lt16-libmesautil_la-u_queue.o): In function `u_thread_get_time_nano':
src/util/../../src/util/u_thread.h:84: undefined reference to `clock_gettime'

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-22 14:58:18 -07:00
Timothy Arceri
6a9020f8dc util/disk_cache: use rand_xorshift128plus() to get our random int
Otherwise for apps that don't seed the regular rand() we will always
remove old cache entries from the same dirs.

V2: assume bits returned by rand are independent uniformly distributed
    bits and grab our hex value without taking the modulus of the whole
    value, this also fixes a bug where 'f' was always missing.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-23 08:16:29 +11:00
Timothy Arceri
dd00a3c923 util/rand_xor: add function to seed rand
V2: pass the seed to the seed function so that we can isolate
    its uses. Stop leaking fd when urandom couldn't be read.

Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-23 08:16:29 +11:00
Timothy Arceri
53660c2366 util: move rand_xorshift128plus() to utils
V2: pass the seed to rand_xorshift128plus() so that we can isolate
    its uses.

Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-03-23 08:16:29 +11:00
Samuel Pitoiset
e11049f2c3 drirc: add force_glsl_abs_sqrt() for "Spec Ops: The Line"
Game ported from D3D9 which expects sqrt() to compute the absolute
value as explained in the spec.

This gets rid of the NaN values as well as the black squares
with RadeonSI.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97338
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-22 22:02:20 +01:00
Samuel Pitoiset
7a0ecbfffd st/glsl_to_tgsi: enable lower_sqrt() conditionally
It relies on the force_glsl_abs_sqrt driconf option.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-22 22:02:20 +01:00
Samuel Pitoiset
737c734cd4 glsl: lower sqrt(abs()) and inversesqrt(abs()) if requested
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-22 22:02:12 +01:00
Samuel Pitoiset
448f4c0c89 driconf: add force_glsl_abs_sqrt option
This will allow to force computing the absolute value for sqrt()
and inversesqrt() in order to follow D3D9 behaviour for buggy
apps that rely on it.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-22 22:01:01 +01:00
Tim Rowley
08f864abd9 swr: [rasterizer jitter] fix llvm >= 5.0 build break
Function::getArgumentList() doesn't exist anymore, switch to using
arg_begin() (existed back to at least llvm-3.6.0).

Reviewed-by: Vedran Miletić <vedran@miletic.net>
CC: <mesa-stable@lists.freedesktop.org>
2017-03-22 13:45:35 -05:00
Rob Herring
7a5b5f5226 Android: drop Android 4.4 (KitKat) support
Any users of KitKat are likely using an older version of Mesa and
KitKat support adds complexity to the make files. Dropping support
allows removing the MESA_LOLLIPOP_BUILD make variable in various make
files.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-22 17:53:31 +00:00
Rob Herring
0e1ff22d55 Android: kill off {MESA_}ANDROID_VERSION defines aka Android 4.1 and older
The Android version defines are only needed for versions less than 4.2
which aren't really supported or tested.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-22 17:52:57 +00:00
Rob Herring
6bfad7c659 Android: fix libz dependency for host targets
Commit 6facb0c08f ("android: fix libz dynamic library dependencies")
added libz as a dependency, but this breaks host targets as the host
dependency is libz-host. As no host lib needs libz, just remove the
dependency for them.

Fixes: 6facb0c08f "android: fix libz dynamic library dependencies"
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-22 17:52:35 +00:00
Rob Herring
6f8f97a9b2 Android: remove host libmesa_util
The host libmesa_util is never used for Android builds, so remove it.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-22 17:52:23 +00:00
Rob Herring
5410c60112 Android: clean-up trailing '\' in make variables
Fixed with the following command:

perl -pe 'BEGIN{undef $/;} s/ \\\n\n/\n\n/smg' $(find . -name 'Android.*')

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-22 17:52:06 +00:00
Emil Velikov
50a9b0cb43 mesa/main: remove unused strndup.h include
No longer needed as of commit ac257f1070 ("glsl: calculate
TOP_LEVEL_ARRAY_SIZE and STRIDE when adding resources")

Reported-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-22 17:51:07 +00:00
Emil Velikov
68b545fa27 util: automake: beautify sources list
Remove trailing tabs and sort alphabetically.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Vedran Miletić <vedran@miletic.net>
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-22 16:55:23 +00:00
Emil Velikov
e0129f3142 util/strndup: move header inclusion as applicable
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Vedran Miletić <vedran@miletic.net>
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-22 16:55:23 +00:00
Emil Velikov
e325fc12db util: inline strndup implementation in the header
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Vedran Miletić <vedran@miletic.net>
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-22 16:55:23 +00:00
Emil Velikov
d542d2fc13 util: consistently use ifndef guards over pragma once
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Vedran Miletić <vedran@miletic.net>
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-22 16:55:22 +00:00
Emil Velikov
43a9ca8eb4 mesa/program: consistently use ifndef guards over pragma once
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Vedran Miletić <vedran@miletic.net>
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-22 16:55:22 +00:00
Emil Velikov
f66fe28d9f mesa/main: consistently use ifndef guards over pragma once
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Vedran Miletić <vedran@miletic.net>
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-22 16:55:22 +00:00
Emil Velikov
2438c0a236 intel/compiler: consistently use ifndef guards over pragma once
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Vedran Miletić <vedran@miletic.net>
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-22 16:55:22 +00:00
Emil Velikov
868324419e intel/common: consistently use ifndef guards over pragma once
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Vedran Miletić <vedran@miletic.net>
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-22 16:55:22 +00:00
Emil Velikov
8c8761b237 i965: consistently use ifndef guards over pragma once
The only remaining case is the brw_oa.py generator which pipes the
generated file to stdout. That will be resolved with later commits.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Vedran Miletić <vedran@miletic.net>
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-22 16:55:22 +00:00
Emil Velikov
b04916285e st/wgl: consistently use ifndef guards over pragma once
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Vedran Miletić <vedran@miletic.net>
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-22 16:55:22 +00:00
Emil Velikov
1385e58805 egl/dri2: consistently use ifndef guards over pragma once
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Vedran Miletić <vedran@miletic.net>
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-22 16:55:22 +00:00
Emil Velikov
b27a883205 spirv: consistently use ifndef guards over pragma once
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Vedran Miletić <vedran@miletic.net>
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-22 16:55:22 +00:00
Emil Velikov
e3de145fa2 nir: consistently use ifndef guards over pragma once
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Vedran Miletić <vedran@miletic.net>
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-22 16:55:22 +00:00
Emil Velikov
b08aee305e glsl: consistently use ifndef guards over pragma once
Through the glsl headers we had an odd mix of guards be that
"ifndef", "pragma once" neither or both.

Simplify things by using the more common ones (ifndef) and annotating
all the sources, barring the generated builting header -
builtin_int64.h.

The final header - udivmod64.h - is [seemingly] unused and on its way
out (patch purge it is on the mailing list).

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Vedran Miletić <vedran@miletic.net>
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-22 16:55:22 +00:00
Emil Velikov
b0bfb5f89c compiler: consistently use ifndef guards over pragma once
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Vedran Miletić <vedran@miletic.net>
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-22 16:55:21 +00:00
Emil Velikov
b9d035e75b radv: consistently use ifndef guards over pragma once
Namely: annotate the single file which is not using a ifndef guard -
vk_format.h

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Vedran Miletić <vedran@miletic.net>
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-22 16:55:21 +00:00
Emil Velikov
95ab07c586 ac: consistently use ifndef guards over pragma once
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Vedran Miletić <vedran@miletic.net>
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-22 16:55:21 +00:00
Emil Velikov
3b277bae66 i965: make brw_setup_image_uniform_values static
Used only internally.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Vedran Miletić <vedran@miletic.net>
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-22 16:55:21 +00:00
Emil Velikov
7e79e895a6 docs/releasing: do not pass any arguments to autogen.sh
This should just work (tm) with the default options. Plus the one we
pass is already the default, so just drop it.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2017-03-22 16:55:21 +00:00
Emil Velikov
559ca99ce1 mesa: more unused linux/version.h include
The header provides the LINUX_VERSION_CODE and KERNEL_VERSION macros.
With neither of which being used by any part of mesa.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-22 16:55:21 +00:00
Marek Olšák
84012262ea ac: fix build with LLVM 5.0svn
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-22 17:54:42 +01:00
Marek Olšák
6e2b9fd071 gallivm: remove lp_add_attr_dereferenceable in favor of amd/common
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-22 17:54:40 +01:00
Jason Ekstrand
7ab03ba725 anv/device: Move push descriptor query handling
The query is a properties query so it needs to be handled in
GetPhysicalDeviceProperties2, not GetPhysicalDeviceFeatures2.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-22 09:44:54 -07:00
Jason Ekstrand
c942faf8f3 anv/image: Return early when unbinding an image
Found by inspection.

Reviewed-by: Chad Versace <chadversary@chromium.org>
 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
2017-03-22 09:44:54 -07:00
Grazvydas Ignotas
10d3702a36 util/sha1: harmonize _mesa_sha1_* wrappers
Rather than using 3 different ways to wrap _mesa_sha1_*() to SHA1*()
functions (a macro, prototype with implementation in .c and an inline
function), make all 3 inline functions.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-22 11:33:51 +00:00
Emil Velikov
64b9a37c3b anv: android: remove unused include/vulkan include
Spotted while skimming through similar hunks for the Autotools build.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-03-22 11:33:40 +00:00
Emil Velikov
1fa6a33e4d anv: automake: use the local headers over any system provided ones
At the moment, we would honour any system headers - vulkan_intel.h in
particular over the ones in-tree.

Thus, if one does incremental build of mesa, without the vulkan.h
already installed (or at least not in the same directory as
vulkan_intel.h) the build will fail.

In the future we might want to upstream the vulkan_intel.h within
vulkan.h or use other ways to make vulkan_intel.h obsolete. In either
case, the more robust thing is to rely on our own copy.

v2: Move AM_CPPFLAGS just above LIBDRM_CFLAGS (Grazvydas, Jason)

Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Fixes: ee8044fd "intel/vulkan: Get rid of recursive make"
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reported-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-22 11:32:54 +00:00
Nicolai Hähnle
c11dcfb5e9 mesa/main: fix MultiDrawElements[BaseVertex] validation of primcount
primcount must be a GLsizei as in the signature for MultiDrawElements
or bad things can happen.

Furthermore, an error should be flagged when primcount is negative.

Curiously, this code used to work somewhat correctly even when primcount
was negative, because the loop that checks count[i] would iterate out of
bounds and almost certainly hit a negative value at some point.

Found by an ASAN error in
GL45-CTS.gtf32.GL3Tests.draw_elements_base_vertex.draw_elements_base_vertex_primcount

Note that the OpenGL spec seems to have s/primcount/drawcount/ at some
point, and the code still reflects the old language.

v2: provide the correct spec quotes (pointed out by Ian)

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-03-22 12:12:29 +01:00
Nicolai Hähnle
c2dfff280b mesa: Avoid out-of-bounds stack read via _mesa_Materiali
MATERIALFV may end up reading up to 4 floats from the passed parameter.

This should really set a GL_INVALID_ENUM error in the cases where it
matters, but does anybody really care?

Found by ASAN in piglit gl-1.0-beginend-coverage.

v2: fix a trivial compiler warning

Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
2017-03-22 12:12:11 +01:00
Vinson Lee
bd6f0dcafc configure.ac: Do not strip away space after regex word match.
Fixes: 62c48ccb41 ("configure.ac: Use POSIX compatible regex for word boundary.")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
2017-03-22 00:30:22 -07:00
Vinson Lee
62c48ccb41 configure.ac: Use POSIX compatible regex for word boundary.
Fixes build error on Mac OS X.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100236
Suggested-by: Jan Beich <jbeich@freebsd.org>
Suggested-by: Michel Dänzer <michel@daenzer.net>
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-21 23:56:17 -07:00
Chad Versace
44ac618a41 isl: Refactor row pitch calculation (v2)
The calculations of row_pitch, the row pitch's alignment, surface size,
and base_alignment were mixed together. This patch moves the calculation
of row_pitch and its alignment to occur before the calculation of
surface_size and base_alignment.

This simplifies a follow-on patch that adds a new member, 'row_pitch',
to struct isl_surf_init_info.

v2:
  - Also extract the row pitch alignment.
  - More helper functions that will later help validate the row pitch.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
2017-03-21 15:56:16 -07:00
Chad Versace
c2b706f8af isl: Drop misplaced comment about padding
isl has a giant comment that explains the hardware's padding
requirements. (Hint: Cache lines and page faults). But the comment is in
the wrong place, in isl_calc_linear_row_pitch(), which is unrelated to
padding.

The important parts of that comment were copied to
isl_apply_surface_padding() long ago. So drop the misplaced comment.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-21 15:56:13 -07:00
Ben Widawsky
0e55e46540 i965/dri: Turn on support for image modifiers
All the plumbing is in place so the extension just needs to be
advertised.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-21 14:48:12 -07:00
Ben Widawsky
cd6bd7f123 i965/dri: Handle X-tiled modifier
This doesn't really "do" anything because the default tiling for the
winsys buffer is X tiled. We do however want the X tiled modifier to
work correctly from the API perspective, which would imply that if you
set this modifier, and later do a get_modifier, you get back at least X
tiled.

Running with a modified kmscube, here are the bandwidth measurements.

Linear:
Read bandwidth: 1039.31 MiB/s
Write bandwidth: 1453.56 MiB/s

Y-tiled:
Read bandwidth: 458.29 MiB/s
Write bandwidth: 542.12 MiB/s

X-tiled:
Read bandwidth: 575.01 MiB/s
Write bandwidth: 606.25 MiB/s

Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-21 14:48:12 -07:00
Ben Widawsky
7ce0405826 i965/dri: Handle Y-tiled modifier
This patch begins introducing how we'll actually handle the potentially
many modifiers coming in from the API, how we'll store them, and the
structure in the code to support it.

Prior to this patch, the Y-tiled modifier would be entirely ignored. It
shouldn't actually be used until this point because we've not bumped the
DRIimage extension version (which is a requirement to use modifiers).

Measuring later in the series with kmscube:
Linear:
Read bandwidth: 1048.44 MiB/s
Write bandwidth: 1483.17 MiB/s

Y-tiled:
Read bandwidth: 471.13 MiB/s
Write bandwidth: 589.10 MiB/s

Similar functionality was introduced and then reverted here:

commit 6a0d036483
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Thu Apr 21 20:14:58 2016 -0700

    i965: Always use Y-tiled buffers on SKL+

v2: Use last set bit instead of first set bit in modifiers to address
bug found by Daniel Stone.

v3: Use the new priority modifier selection thing. This nullifies the
bug fixed by v2 also.

v4: Get rid of modifier compaction which originally served another
purpose and now serves none (Jason)

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-21 14:48:12 -07:00
Ben Widawsky
d78a36ea62 i965/dri: Handle the linear fb modifier
At image creation create a path for dealing with the linear modifier.
This works exactly like the old usage flags where __DRI_IMAGE_USE_LINEAR
was specified.

During development of this patch series, it was decided that a lack of
modifier was an insufficient way to express the required modifiers. As a
result, 0 was repurposed to mean a modifier for a LINEAR layout.

NOTE: This patch was added for v3 of the patch series.

v2: Rework the algorithm for modifier selection to go from a bitmask
based selection to this priority value.

v3: Make DRM_FORMAT_MOD_INVALID allowed at selection as a way of
identifying no modifiers found (because 0 is LINEAR) (Jason)

v4: Remove the logic to prune unknown modifiers (like those from other
vendors) and simply handle is in select_best_modifier (Jason)

Requested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-21 14:48:12 -07:00
Ben Widawsky
79f619ca70 i965/dri: Enable modifier queries
New to the patch series after reordering things for landing smaller
chunks.

This will essentially enable modifiers from clients that were just
enabled in previous patches. A client could use the modifiers by
setting all of them at create, but had no way to actually query them
after creating the surface (ie. stupid clients could be broken before
this patch, but in more ways than this).

Obviously, there are no modifiers being actually stored yet - so this
patch shouldn't do anything other than allow the API to get back 0 (or
the LINEAR modifier).

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-21 14:48:12 -07:00
Ben Widawsky
fc1e9f0cb2 i965/dri: Store the screen associated with the image
I intend to need to get to the devinfo structure, and storing the screen
is an easy way to do that.

It seems to be the consensus that you cannot share an image between
multiple screens.

Scape-goat: Rob Clark <robdclark@gmail.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-21 14:48:11 -07:00
Ben Widawsky
2a16de9e4b gbm: Disallow INVALID modifiers returned upon image creation
v2: Add a TODO about modifier validation (Jason)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-21 14:48:11 -07:00
Ben Widawsky
962b31da95 i965/dri: Disallow image with INVALID modifier
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-21 14:48:03 -07:00
Kenneth Graunke
b15038a289 i965: Shut up major()/minor() warnings.
Recent glibc generates this warning:

brw_performance_query.c:1648:13: warning: In the GNU C Library, "minor" is defined
 by <sys/sysmacros.h>. For historical compatibility, it is
 currently defined by <sys/types.h> as well, but we plan to
 remove this soon. To use "minor", include <sys/sysmacros.h>
 directly. If you did not intend to use a system-defined macro
 "minor", you should undefine it after including <sys/types.h>.

    min = minor(sb.st_rdev);

So, include sys/sysmacros.h to shut up the warning.

v2: Use the AC_HEADER_MAJOR defines to figure out the right header
    (thanks to Jonathan Gray for helping me not break non-glibc systems)

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Emil Velikov <emli.velikov@collabora.com>
2017-03-21 14:10:17 -07:00
Kenneth Graunke
0c3fbf8028 i965: Drop AUB_TRACE_* stuff.
This was used for aubdumping (deleted a while ago) and INTEL_DEBUG=bat
decoding (deleted recently).

While we're changing parameters, delete the wrapper macro and make the
actual function brw_state_batch instead of __brw_state_batch.

This subsumes a patch by Emil Velikov to drop this from BLORP.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-21 13:49:18 -07:00
Kenneth Graunke
705c38e96f i965: Use aubinator/genxml for INTEL_DEBUG=bat state decoding.
This deletes all of our handwritten code in favor of autogenerated
genxml-based decoding.  This should be much more usable, as the old
code isn't entirely accurate - we updated some things for new
generations, but not everything.

Aubinator has one annoying limitation: it has no idea how many entries
to print when encountering e.g. 3DSTATE_BINDING_TABLE_POINTERS_VS.  It
picks an arbitrary number, which may skip decoding valid data, and may
print extra garbage entries.

We do a better job here by making brw_state_batch track the size of the
data stored at a particular batchbuffer offset.  Then, we can divide by
the structure size to obtain the exact number of entries.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-21 13:49:15 -07:00
Kenneth Graunke
5fab46572f i965: Use aubinator/genxml for INTEL_DEBUG=bat commands.
This should give substantially better decoding, as the public libdrm
decoder hasn't been properly maintained in years.

For now, we reuse the existing state dumping mechanism.  We'll improve
that in the next patch.

To avoid increasing the size of the driver, we restrict this feature
to debug builds of Mesa.  There's probably very little use for it in
release builds anyway.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-21 13:49:13 -07:00
Kenneth Graunke
7d84bb32aa intel: Move tools/decoder.[ch] to common/gen_decoder.[ch].
This way they become part of libintel_common.la so I can use them in
the i965 driver.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-21 13:49:10 -07:00
Kenneth Graunke
2b074bb7e5 intel: Add a INTEL_DEBUG=color option.
This will be used for color output in debug messages.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-21 13:48:53 -07:00
Vinson Lee
1fa432741c nir: Add positional argument specifiers.
Fix build with Python < 2.7.

  File "src/compiler/nir/nir_builder_opcodes_h.py", line 46, in <module>
    from nir_opcodes import opcodes
  File "src/compiler/nir/nir_opcodes.py", line 178, in <module>
    unop_convert("{}2{}{}".format(src_t[0], dst_t[0], bit_size),
ValueError: zero length field name in format

Fixes: 762a6333f2 ("nir: Rework conversion opcodes")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
2017-03-21 13:38:00 -07:00
Julien Isorce
ad13bd2e51 r600_shader.c: check returned value of eg_get_interpolator_index
Like done in another place in that same file.

CID 1250588

Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-21 18:14:26 +00:00
Timothy Arceri
020b3f0c46 util/disk_cache: fix build on platforms where shader cache is disabled 2017-03-21 11:51:03 +11:00
Grazvydas Ignotas
b9a370f2b4 util/disk_cache: add a write helper
Simplifies the write code a bit and handles EINTR.

V2: (Timothy Arceri) Drop EINTR handling. To do it
    properly we would need a retry limit but it's
    probably best to just avoid trying to write if
    we hit EINTR and try again next time we see
    the program.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-21 11:51:03 +11:00
Grazvydas Ignotas
af73acca2b tests/cache_test: use the blob key's actual first byte
There is no need to hardcode it, we can just use blob_key[0].
This is needed because the next patches are going to change how cache
keys are computed.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-21 11:15:52 +11:00
Grazvydas Ignotas
529a767041 util/disk_cache: use a helper to compute cache keys
This will allow to hash additional data into the cache keys or even
change the hashing algorithm easily, should we decide to do so.

v2: don't try to compute key (and crash) if cache is disabled

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-21 11:15:52 +11:00
Dave Airlie
021c87fa24 radv: move KHR_get_physical_device_properties2 to instance props.
This is an instance property not a device one.

Fixes:
dEQP-VK.api.info.device.extensions

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-21 10:05:49 +10:00
Dave Airlie
93e62898cc radv: drop illegal DB format error.
We'll get this if we have a stencil only setup.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-21 10:05:49 +10:00
Kenneth Graunke
72c89522c2 i965: Add autogenerated OA files to .gitignore. 2017-03-20 16:28:04 -07:00
Tim Rowley
fe325e6423 swr: [rasterizer] Cleanup naming of codegen files
All template files and generated files are prefixed with gen_.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:05:54 -05:00
Tim Rowley
cf8fa67364 swr: [rasterizer codegen] Remove BOM from knob_defs.py
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:05:54 -05:00
Tim Rowley
8a5069e81f swr: [rasterizer codegen] Rewrite gen_llvm_types.py to use mako
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:05:54 -05:00
Tim Rowley
5d0b3b05a2 swr: [rasterizer codegen] Fix generation of knobs
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:05:54 -05:00
Tim Rowley
4ed72758db swr: [rasterizer codegen] Change backend template comment style
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:05:54 -05:00
Tim Rowley
2776d94545 swr: [rasterizer codegen] Rewrite gen_llvm_ir_macros.py to use mako
Don't create/use cpp files, header only now.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:05:54 -05:00
Tim Rowley
9538ba9bd1 swr: [rasterizer codegen] Quiet gen_backends.py execution
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:05:54 -05:00
Tim Rowley
97cbabc8fb swr: [rasterizer scripts] Put codegen scripts into a separate directory
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:05:54 -05:00
Tim Rowley
7046695a0e swr: [rasterizer core] Fix trifan regression from 9d3442575f
Fixes piglit triangle-rasterization-overdraw.

SIMD16 path not working.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:05:22 -05:00
Tim Rowley
4cb69e817c swr: [rasterizer core] SIMD16 Frontend WIP - fix tesselation crashes
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Tim Rowley
ab3f4449c3 swr: [rasterizer jitter] Fix LogicOp blend jit after assert changes
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Tim Rowley
8cd8240cfc swr: [rasterizer] Convert more SWR_ASSERT(false, ...) to SWR_INVALID(...)
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Tim Rowley
ab032fb436 swr: [rasterizer core] Fix typo in SIMD16 code path
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Tim Rowley
d011ba74ee swr: [rasterizer core/common] Fix the native AVX512 build under ICC
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Tim Rowley
2f513d8d83 swr: [rasterizer core] Allow no arguments to SWR_INVALID macro
Turns out this is somewhat tricky with gcc/g++.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Tim Rowley
0b066b2bf3 swr: [rasterizer] Slight assert refactoring
Make asserts more robust.

Add SWR_INVALID(...) as a replacement for SWR_ASSERT(0, ...)

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Tim Rowley
f445b6de9c swr: [rasterizer] Backend code adjustments
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Tim Rowley
e4d1294afb swr: [rasterizer archrast] Fix the early and late depthstencil events
The coverage and stencil mask arguments were reversed.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Tim Rowley
a508c2c2ac swr: [rasterizer core] Implement double pumped SIMD16 TESS
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Tim Rowley
2cbac00221 swr: [rasterizer archrast/core/scripts] Fix archrast multithreading issue
Per pixel stats are cached but were not always being flushed as threads
moved from one draw context to the next.  Added an explicit flush to allow
all archrast objects to flush any cached events.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Tim Rowley
0a36a7cf04 swr: [rasterizer archrast] Remove redundant data from archrast files
If count can be derived from other counts then this can be done in
post processing scripts.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Tim Rowley
1cc885d1d1 swr: [rasterizer archrast/scripts] Further archrast cleanups
Removed redundant data being written out to file

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Tim Rowley
1399fbd6fd swr: [rasterizer core] Fix RECT_LIST primitive assembly
The bug would make the 3rd component of attributes on the second
triangle of a RECT be invalid.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Tim Rowley
ade5351900 swr: [rasterizer common] Add InterpolateComponentFlat utility
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Tim Rowley
ab04221bf1 swr: [rasterizer archrast] Fix performance issue with archrast stats
Performance is now 50x faster with archrast now that we're properly
filtering out all of the rdtsc begin/end.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Tim Rowley
b228d2db18 swr: [rasterizer core] Implement SIMD16 GS and STREAMOUT
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Tim Rowley
5830a0a6f8 swr: [rasterizer archrast] Add additional API events
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Tim Rowley
d2759c1eb3 swr: [rasterizer core/scripts] Autogen backend initialization function(s)
Autogen functions that instantiates different BackendPixelRate templates.
Functions get split into separate files after reaching a user defined
threshold (currently 512 per file) to speed up compilation.

This change will enable the addition of more template flags in the pixel
back end.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Tim Rowley
2c820d22cf swr: [rasterizer core] backend.h declares gBackendPixelRateTable
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Tim Rowley
50d491e22d swr: [rasterizer core] Finish SIMD16 PA OPT including tesselation
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Tim Rowley
9d3442575f swr: [rasterizer core] Finish SIMD16 PA OPT except tesselation
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Tim Rowley
7b94e5e1fa swr: [rasterizer core] Support sparse numa id values on all OSes
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Kenneth Graunke
5e29af5f77 i965: Skip register write detection when possible.
Detecting register write support by trial and error introduces a
stall at screen creation time, which it would be nice to avoid.
Certain command parser versions guarantee this will work (see the
giant comment in intelInitScreen2 below, or a few commits ago):

- Ivybridge: version >= 1 (kernel v3.16)
- Baytrail:  version >= 2 (kernel v3.19)
- Haswell:   version >= 7 (kernel v4.8)

For simplicity, we don't bother with version 1 in this patch.

This assumes that the user hasn't disabled aliasing PPGTT via a kernel
command line parameter.  Don't do that - you're only breaking things.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-03-20 15:58:05 -07:00
Kenneth Graunke
31693a13f8 i965: Set screen->cmd_parser_version to 0 if we can't write registers.
If we can't write registers, then the effective command parser version
is 0 - it may exist, but it's not usefully enabling anything.

See kernel commit 1ca3712ca3429a617ed6c5f87718e4f6fe4ae0c6 (in v4.8)
where the kernel starts doing this for us.  This makes us do more or
less the same thing on older kernels.

This should preserve a bit of sanity by allowing us to perform a
screen->cmd_parser_version > N check to determine that we really can
use the features promised by command parser version N.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-03-20 15:58:05 -07:00
Kenneth Graunke
4a2ad6b145 i965: Document the sad story of the kernel command parser.
This should help us figure out the complexities of which kernel
versions we need to get various features on various platforms.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-03-20 15:58:05 -07:00
Kenneth Graunke
9b324e4dca i965: Fall back to GL 4.2/4.3 on Haswell if the kernel isn't new enough.
In commit d2590eb65f I enabled GL 4.5
on Haswell...but failed to check if we could do indirect compute
shader dispatch...and query buffer objects.

Indirect compute shader dispatch requires command parser version 5
(kernel commit 7b9748cb513a6bef4af87b79f0da3ff7e8b56cd8, which is in
Linux v4.4).  On earlier kernels we would have disabled
ARB_compute_shader, which is a mandatory part of OpenGL 4.3+.

Query buffer objects currently require MI_MATH and MI_LOAD_REGISTER_REG,
which mean command parser version 7 (Linux v4.8).  On earlier kernels
we would have disabled ARB_query_buffer_object, which is a mandatory
part of OpenGL 4.4+.

The new version support looks like:

- Kernel 4.1 and older => OpenGL 3.3
- Kernel 4.2-4.3       => OpenGL 4.2
- Kernel 4.4-4.7       => OpenGL 4.3
- Kernel 4.8+          => OpenGL 4.5

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-03-20 15:58:05 -07:00
Constantine Kharlamov
99d400b78f r600g/sb: Fix memory leak by reworking uses list (rebased)
The author is Heiko Przybyl(CC'ing), the patch is rebased on top of Bartosz Tomczyk's one per Dieter Nützel's comment.
Tested-by: Constantine Charlamov <Hi-Angel@yandex.ru>

v2: Resend the patch again through git-email. The prev. rebase was sent
through Thunderbird, which screwed up tab characters, making the patch
not apply.

--------------
When fixing the stalls on evergreen I introduced leaking of the useinfo
structure(s). Sorry. Instead of allocating a new object to hold 3 values
where only one is actually used, rework the list to just store the node
pointer. Thus no allocating and deallocation is needed. Since use_info
and use_kind aren't used anywhere, drop them and reduce code complexity.
This might also save some small amount of cycles.

Thanks to Bartosz Tomczyk for finding the bug.

Reported-by: Bartosz Tomczyk <bartosz.tomczyk86 at gmail.com <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>>
Signed-off-by: Heiko Przybyl <lil_tux at web.de <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>>
Supersedes: https://patchwork.freedesktop.org/patch/135852
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-03-20 23:23:50 +01:00
Marek Olšák
827ae79b2c radeonsi: check the IR type before waiting for a compute compilation fence
This should fix OpenCL getting stuck.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100288
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-03-20 23:17:14 +01:00
Kenneth Graunke
4084083124 aubinator: Move the guts of decode_group() to decoder.c.
This lets us use it outside of the aubinator binary itself.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-20 11:20:51 -07:00
Kenneth Graunke
aa1ef0b984 aubinator: Drop spec parameter to decode_group().
No longer necessary - the iterator gets it from the group.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-20 11:20:51 -07:00
Kenneth Graunke
b2c0c1d9a5 aubinator: Make the iterator store a pointer to structure descriptions.
When the iterator encounters a structure field, it now looks up the
gen_group for that structure definition and saves a pointer to it.

This lets us drop a lot of ridiculous code in the caller, which looked
at item->value (<struct NAME dword>), strtok'd the structure name back
out, and looked it up itself.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-20 11:20:51 -07:00
Kenneth Graunke
a1aa78cb45 aubinator: Track the current field's starting dword offset.
The iterator code already computed this value, then we stored it in
the structure name, strtok'd it back out, and also manually computed
it when printing dword headers.

Just put the value in the struct and use it.  Way simpler.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-20 11:20:51 -07:00
Kenneth Graunke
e6f7357cab aubinator: Drop decode_structure() helper.
It made more sense when decode_group() took a bunch of extra options,
but now that there's only one...we may as well pass 0 and call it a day.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-20 11:20:51 -07:00
Kenneth Graunke
a8d4184b00 aubinator: Drop unused print_dword_headers flag.
I added this flag in 65a9d5eabb but
it was completely unused.  Both callers appear to have printed dword
headers, so we can just drop the flag and continue doing it
unconditionally.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-20 11:20:51 -07:00
Kenneth Graunke
7f21cb56b8 aubinator: Store a pointer from gen_group back to gen_spec.
When decoding a structure field within a group, we may want to look up
that structure type.  Having a gen_spec pointer makes it easy to do so.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-20 11:20:51 -07:00
Kenneth Graunke
2c6c760a4b aubinator: Store enum textual name in iter->value.
gen_field_iterator_next() produces a string representing the value of
the field.  For enum values, it also produced a separate "description"
string containing the textual name of the enum.

The only caller of this function combines the two, printing enums as
"<numeric value> (<texture enum name>)".  We may as well just store
that in item->value directly, eliminating the description field, and
a layer of wrapping.

v2: Use non-overlapping source and destination strings in snprintf.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-20 11:20:51 -07:00
Julien Isorce
a6e2124402 si_descriptor: move velems nullity check before dereference
CID 1399479: Dereference before null check (REVERSE_INULL)
check_after_deref: Null-checking velems suggests that it may be null,
but it has already been dereferenced on all paths leading to the check.

Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-20 18:01:51 +00:00
Julien Isorce
521860b2a9 radeon_drm_bo: explicitly check return value of drmCommandWriteRead
CID 1313492

Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-20 18:01:51 +00:00
Julien Isorce
dac124466a si_pipe: remove nullity check after dereference
sscreen cannot be NULL

CID 1354483

Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-20 18:01:41 +00:00
Julien Isorce
ce27b27c38 radeon: initialize hole variable before calling container_of
Like in a few other places in that radeon_drm_bo.c file.

CID 715739.

Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-20 16:47:31 +00:00
Nanley Chery
7c50f9903f intel: Correct the BDW surface state size
The PRMs state that this packet is 16 DWORDS long. Ensure that the last
three DWORDS are zeroed as required by the hardware when allocating a
null surface state.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-03-20 09:43:44 -07:00
Bartosz Tomczyk
f4b23589da r600g: Fix out of bounds access
fc_sp variable should indicate number of elements in
fc_stack array, but fc_sp was increased at beginning of fc_pushlevel
function. It leads to situation where idx=0 was never used, and last
32 element was stored outside fs_stack array.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-03-20 17:32:53 +01:00
Constantine Kharlamov
f9190f3e65 r600g: update sb documentation
v2: s/r600/r600g in the title

Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-03-20 17:11:15 +01:00
Constantine Kharlamov
64cbbd2888 r600g: make condition clearer
The second check in the old code looked pretty much unreachable, esp.
because it's not obvious that "max_entries" could be zero. To find out
that it was intentional I had to run some checks, and to dig into
the old versions of the file.

So, rewrite the check to make the intention clear.

v2: s/r600/r600g in the title, and per Dieter Nützel's comment wrap
lines of condition.

Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-03-20 17:11:15 +01:00
Emil Velikov
36e029d356 docs: add news item and link release notes for 13.0.6/17.0.2
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-20 14:25:18 +00:00
Emil Velikov
54fd78f637 docs: add sha256 checksums for 17.0.2
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 9b66351f5b)
2017-03-20 14:20:32 +00:00
Emil Velikov
887ad468b5 docs: add release notes for 17.0.2
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 373d88a711)
2017-03-20 14:20:31 +00:00
Emil Velikov
9bad99742f docs: add sha256 checksums for 13.0.6
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 879d24c497)
2017-03-20 14:20:26 +00:00
Emil Velikov
0babb9e091 docs: add release notes for 13.0.6
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit fcef88d13a)
2017-03-20 14:20:25 +00:00
Xu,Randy
57595cb073 anv/genX: Solve the vkCreateGraphicsPipelines crash
The crash is due to NULL pColorBlendState, which is legal if the
pipeline has rasterization disabled or if the subpass of the render pass
the pipeline is created against does not use any color attachments.

Test: Sample subpasses from LunarG can run without crash

Signed-off-by: Xu,Randy <randy.xu@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
2017-03-20 08:31:18 +02:00
Dave Airlie
e70e7cc7ff radv: fix logic for when to flush on multiple CS emission
The current code evaluated to always true, we only want to flush
on the first submit. Rename the variable to do_flush, and only
emit on the first iteration.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-20 14:17:43 +10:00
Jason Ekstrand
fcca6a83cd spirv: Implement IsInf using an integer comparison
Since we already do fabs on the one source, we're guaranteed to get
positive infinity if we get any infinity at all.  Since +inf only has
one IEEE 754 representation, we can use an integer comparison and avoid
all of the ordered/unordered issues.

Cc: Dave Airlie <airlied@redhat.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-20 14:08:19 +10:00
Dave Airlie
e0208949d1 radv/meta: fix image clears for r4g4 format.
This just uses an 8-bit clear and packs the values.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-20 13:41:31 +10:00
Dave Airlie
10c2b588c4 Revert "radv: fallback to an in-memory cache when no pipline cache is provided"
This reverts commit 2845a108a9.

This break VK-GL-CTS randomly.
./deqp-vk --deqp-case=dEQP-VK.texture.filtering.3d.formats.r4g4b4a4*

bounces around here from 6/6 to 3/6 or 4/6 to hanging.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-20 13:41:31 +10:00
Timothy Arceri
72fa447d45 mesa: disable glthread when glNewList() is called
glNewList() swaps dispatch tables, and we don't have anything in
place to handle that in glthread.

Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2017-03-20 10:22:20 +11:00
Dave Airlie
d06e168b87 radv: fix primitive reset index emission
This was meant to be checking the index type to get the correct
index not the last emitted one. This fixes:
dEQP-VK.pipeline.input_assembly.primitive_restart.index_type_uint32.triangle_strip_with_adjacency

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-20 08:47:03 +10:00
Grazvydas Ignotas
274aaa331c util/disk_cache: check rename result
I haven't seen this causing problems in practice, but for correctness
we should also check if rename succeeded to avoid breaking accounting
and leaving a .tmp file behind.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-20 08:24:46 +11:00
Grazvydas Ignotas
67911fa4b8 util/disk_cache: delete .tmp if target exists
At the time of target file check, .tmp file is already created and file
lock is held, so we should remove the .tmp, like in other error paths.

With this, piglit no longer leaves large amount of empty .tmp files
behind, which waste directory entries and may interfere with eviction.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-20 08:24:38 +11:00
Grazvydas Ignotas
bd93cea691 util/disk_cache: fix stored_keys index
It seems there is a bug because:
- 20 bytes are compared, but only 1 byte stored_keys step is used
- entries can overlap each other by 19 bytes
- index_mmap is ~1.3M in size, but only first 64K is used

With this fix for Deus Ex:
- startup time (from launch to Feral logo): ~38s -> ~16s
- disk_cache_has_key() hit rate: ~50% -> ~96%

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-20 08:14:31 +11:00
Ilia Mirkin
663e7c25f5 nv30: create uploader after pipe->screen is set
Fixes crashes after recent upload rework.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-03-19 01:24:06 -04:00
Ilia Mirkin
0e9232dbcc nv50,nvc0: enable TEX_LZ and TXF_LZ
There should be minimal gain, if any, for nvc0, but nv50 may end up
noticing more often that the lod argument is uniform. This, in turn,
will remove the need for some unnecessary transformations, which were
being hit due to the checks being done pre-ssa.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-03-18 20:37:52 -04:00
Ilia Mirkin
dab88e9af7 st/mesa: set result writemask based on ir type
This prevents textureQueryLevels, which maps as LODQ, from ending up
with a xyzw writemask, which is illegal.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100061
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-18 20:16:45 -04:00
Karol Herbst
09f16de7e6 nvc0/ir: treat FMA like MAD for operand propagation
Helps mainly Feral-ported games, due to their use of fma()

shader-db changes:
total instructions in shared programs : 3901147 -> 3842505 (-1.50%)
total gprs used in shared programs    : 471258 -> 467359 (-0.83%)
total local used in shared programs   : 27405 -> 27361 (-0.16%)
total bytes used in shared programs   : 35749888 -> 35214176 (-1.50%)

                local        gpr       inst      bytes
    helped          17        1829        4091        4091
      hurt           4          44           3           3

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2017-03-18 20:15:45 -04:00
Alan Swanson
a7eb7984bf util/disk_cache: pass predicate functions file stats directly (v4)
Since switching to LRU eviction the only user of these predicate
functions now resolves directory entry stats itself so pass them
directly saving calling fstat and strlen twice (and the
expensive strlen is skipped entirely if access time is newer).

v2: Update for empty cache dir detection changes
v3: Fix passing string length to predicate with the +1 for NULL
    termination and also pass sb as pointer
v4: Missed ampersand for passing sb as pointer

Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-18 14:32:57 +11:00
Timothy Arceri
bf8bc6190e glsl: use set for copy propagation kills
Previously each time we saw a variable we just created a duplicate
entry in the list. This is particularly bad for loops were we add
everything twice, and then throw nested loops into the mix and the
list was growing expoentially.

This stops the glsl-vs-unroll-explosion test which has 16 nested
loops from reaching the tests mem usage limit in this pass. The
test now hits the mem limit in opt_copy_propagation_elements()
instead.

I suspect this was also part of the reason this pass can be so
slow with some shaders.

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2017-03-18 14:21:09 +11:00
Timothy Arceri
9e42b93f33 st/dri: wait for thread to finish before unbinding context
Fixes a bunch of piglit crashes that hit an assert() when trying
to delete the framebuffer. The assert() was triggered because
WinSysDrawBuffer was set to NULL before glDeleteFramebuffers()
was called.

Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-18 14:15:52 +11:00
Timothy Arceri
40bc1afc94 glsl: don't leak memory when trying to count loop iterations
Suggested-by: Damian Dixon <damian.dixon@gmail.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99789
2017-03-18 14:12:40 +11:00
Jason Ekstrand
1d5f4f46da genxml: Make MI_STORE_DATA_IMM have a single 64-bit data field
This is way more convenient than having two separate dword fields.

Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-17 15:31:19 -07:00
Jason Ekstrand
ced61fd53e anv: Turn on inherited queries
It all just works since it's just a hardware register so we might as
well turn it on.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-17 12:12:50 -07:00
Ilia Mirkin
e675f57d4f anv: Implement pipeline statistics queries
In the end, pipeline statistics queries look a lot like occlusion
queries only with between 1 and 11 begin/end pairs being generated
instead of just the one.

Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-17 12:12:50 -07:00
Jason Ekstrand
dda54890f3 anv: Disable VF statistics for blorp and SOL memcpy
In order to get accurate statistics, we need to disable statistics for
blits, clears, and the surface state memcpy at the top of each secondary
command buffer.  There are two possible approaches to this:

 1) Disable before the blit/memcpy and re-enable afterwards

 2) Move emitting 3DSTATE_VF_STATISTICS from initialization and make it
    part of pipeline state and then just disabale statistics before
    blits and memcpy operations.

Emitting 3DSTATE_VF_STATISTICS should be fairly cheap so it doesn't
really matter which path we take.  We choose the second option as it's
more consistent with the way the rest of the statistics are enabled and
disabled.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-17 12:12:50 -07:00
Jason Ekstrand
9576cea519 anv/pipeline: Enable clipper statistics
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-17 12:12:50 -07:00
Jason Ekstrand
2a616242cd genxml: s/Clipper Statistics Enable/Statistics Enable/
It's in 3DSTATE_CLIP, so it doesn't really need the extra detail.  This
matches what we do for VS, FS, etc.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-17 12:12:50 -07:00
Jason Ekstrand
149d10d38a anv/query: Rework store_query_result
The new version is a nice GPU parallel to cpu_write_query_result and it
nicely handles things like dealing with 32 vs. 64-bit offsets in the
destination buffer.

Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-17 12:12:50 -07:00
Jason Ekstrand
c773ae88df anv/query: Break GPU query calculation into a helper
Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-17 12:12:50 -07:00
Jason Ekstrand
7de73f0c94 genxml: Add pipeline statistics registers on gen7+
Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-17 12:12:50 -07:00
Jason Ekstrand
0557dfdb4a anv/query: Add a helper for writing a query pool result
Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-17 12:12:50 -07:00
Jason Ekstrand
bce4a935c6 anv/query: Use a variable-length slot size
Not all queries are the same.  Even the two queries we support today
require a different amount of data per slot.  Once we introduce pipeline
statistics queries, the size will vary wildly.

Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-17 12:12:49 -07:00
Jason Ekstrand
1c797af2c6 anv/query: Move the available bits to the front
We're about to make slots variable-length and always having the
available bits at the front makes certain operations substantially
easier once we do that.

Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-17 12:12:47 -07:00
Jason Ekstrand
9d43afa3dc anv/query: Let 32-bit values wrap
From the Vulkan 1.0.39 Specification:

   "If VK_QUERY_RESULT_64_BIT is not set and the result overflows a
   32-bit value, the value may either wrap or saturate."

So we can either clamp or wrap.  Wrapping is both easier and what the
user gets if they use vkCmdCopyQueryPoolResults and we should be
consistent.  We could make vkCmdCopyQueryPoolResults clamp but it's
annoying and ends up burning extra batch for something the spec clearly
doesn't require.

Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-17 12:11:35 -07:00
Alex Deucher
c2a97fb7ae radeonsi: add new polaris12 pci id
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-03-17 14:13:17 -04:00
Marek Olšák
4b064d16e5 gallium/radeon: formalize that create_batch_query doesn't need pipe_context
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-17 18:30:21 +01:00
Marek Olšák
be6173e7d6 gallium/radeon: formalize that create_query doesn't need pipe_context
for threaded gallium

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-17 18:30:21 +01:00
Marek Olšák
04e6977e5d gallium/radeon: reference pipe_resource in pipe_transfer
for threaded gallium

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-17 18:30:21 +01:00
Marek Olšák
03127bb6d5 radeonsi: compile all TGSI compute shaders asynchronously
required by threaded gallium

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-17 18:30:21 +01:00
Marek Olšák
e9c6953ddb radeonsi: require that compiler threads are enabled
threaded gallium can't use pipe_context's LLVM target machine, because
create_shader_selector can be called from a non-driver thread.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-17 18:30:21 +01:00
Marek Olšák
080f322f06 trace: remove leftover assertions after pipe_resource wrapping removal 2017-03-17 18:30:21 +01:00
Marek Olšák
6c0a28084d gallium/u_upload: make the first persistent mapping unsynchronized
This is simpler for drivers.
2017-03-17 18:30:21 +01:00
Robert Bragg
a27b62e794 anv/device: init timestampPeriod from devinfo
Now that there's a timebase_scale in gen_device_info which is
effectively the 'period' this switches anv_GetPhysicalDeviceProperties
to using this common device info to initialize the timestampPeriod
device limit.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-17 16:10:22 +00:00
Robert Bragg
344d1a4015 i965: Allow a per gen timebase scale factor
Prior to Skylake the Gen HW timestamps were driven by a 12.5MHz clock
with the convenient property of being able to scale by an integer (80)
to nanosecond units.

For Skylake the frequency is 12MHz or a scale factor of 83.333333

This updates gen_device_info to track a floating point timebase_scale
factor and makes corresponding _queryobj.c changes to no longer assume a
scale factor of 80 works across all gens.

Although the gen6_ code could have been been left alone, the changes
keep the code more comparable, and it now shares a few utility functions
for scaling raw timestamps and calculating deltas. The utility for
calculating deltas takes into account 32 or 36bit overflow depending on
the current kernel version.

Note: this leaves the timestamp handling of ARB_query_buffer_object
untouched, which continues to use an incorrect scale of 80 on Skylake
for now. This is more awkward to solve since the scaling is currently
done using a very limited uint64 ALU available to the command parser
that doesn't support multiply or divide where it's already taking a
large number of instructions just to effectively multiple by 80.

This fixes piglit arb_timer_query-timestamp-get on Skylake

v2: (Ken) Update timebase_scale for platforms past Skylake/Broxton too.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-17 15:45:19 +00:00
Jason Ekstrand
28b134c75c anv/device: Remove a use of a compound literal
Older versions of GCC don't like compound literals in static const
variable declarations because they don't think it's an actual constant
value.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-17 08:40:30 -07:00
Robert Bragg
76dc49f3fb i965: bounds checks while concatenating sysfs paths
This adds some missing return value checks for all uses of snprintf in
brw_performance_query.c. This also switches a use of strncpy + strncat
for snprintf for consistency and to avoid the chance of the strncpy
leaving an unterminated string in the dest buffer if the src is too
long.

This issue with strncpy was picked up by Coverity.

CID: 1402201
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-17 13:40:29 +00:00
Emil Velikov
f8b1b9404e mesa: automake: add all headers to the tarball.
Fixes: d8d81fbc31 ("mesa: Add infrastructure for a worker thread to process GL commands.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-17 13:10:09 +00:00
Emil Velikov
d9a41ce8aa mapi: automake: add all python scripts to EXTRA_DIST
Otherwise it'll be missing in the tarball and make distcheck will fail.

Fixes: 05dd4a1104 ("glapi: Generate GL API marshalling code from the XML.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-17 13:10:09 +00:00
Jonathan Gray
9e8d6ba1d6 glapi: avoid using $< in non-suffix make rules
Using $< in non-suffix make rules is a GNU extension.  Explicitly use
the name of the python script to fix the build on OpenBSD.

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.velikov@collabore.com>
2017-03-17 13:06:26 +00:00
Alex Smith
ce4058dafd radv/ac: Fix shared memory offset calculation
The index passed to get_shared_memory_ptr is an attribute slot index,
i.e. the index of a vec4 within LDS. Therefore this must be scaled by
sizeof(vec4) to give the LDS byte offset.

Fixes: f4e499ec79 ("radv: add initial non-conformant radv vulkan driver")
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
CC: <mesa-stable@lists.freedesktop.org>
2017-03-17 09:35:48 +01:00
James Legg
e88cac1df0 radv: Fix using more than 4 bound descriptor sets
Avoid a buffer overflow in ac_nir_to_llvm.c's create_function when
using more than 4 descriptor sets. radv claims support for 8.

Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-03-17 09:12:43 +01:00
Tapani Pälli
70d25cae8b util/build-id: check dlpi_name before strstr call
According to dl_iterate_phdr man page first object visited is the
main program where dlpi_name is an empty string. This fixes segfault
on Android when using build-id as identifier.

Fixes: d4fa083e11 ("util: Add utility build-id code.")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-03-17 07:34:26 +02:00
Tapani Pälli
4d4558411d android: fix segfault within swap_buffers
Function droid_swap_buffers may get called without dri2_surf->buffer set,
in these cases we don't have a back buffer set either. Patch fixes segfault
seen with 3DMark that uses android.opengl.GLSurfaceView for rendering it's UI.

backtrace:
   #00 pc 00013f88  /system/lib/egl/libGLES_mesa.so (droid_swap_buffers+104)
   #01 pc 000117b2  /system/lib/egl/libGLES_mesa.so (dri2_swap_buffers+50)
   #02 pc 000058b2  /system/lib/egl/libGLES_mesa.so (eglSwapBuffers+386)
   #03 pc 00011329  /system/lib/libEGL.so (eglSwapBuffersWithDamageKHR+553)
   #04 pc 000118e7  /system/lib/libEGL.so (eglSwapBuffers+55)
   #05 pc 000754dc  /system/lib/libandroid_runtime.so

v2: do like other backends, call get_back_bo (Emil Velikov)

Fixes: 2acc69d ("EGL/Android: Add EGL_EXT_buffer_age extension")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-17 07:30:34 +02:00
Timothy Arceri
72ab7bb765 radv: make sure gs copy shader is retrieved from the cache with the variant
Apps can limit the size of the cache via VkAllocationCallbacks so we
can't be sure that both are always in the cache.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-03-17 16:17:10 +11:00
Timothy Arceri
2845a108a9 radv: fallback to an in-memory cache when no pipline cache is provided
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-03-17 16:17:10 +11:00
Timothy Arceri
315e8a9321 radv: always create an fallback pipeline cache
This will be used as an in-memory cache when a pipeline cache is
not provided by the app.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-03-17 16:17:10 +11:00
Timothy Arceri
4ffdab78b9 radv: move cache check inside insert and search functions
This will allow us to use fallback in-memory and on-disk caches
should the app not provide a pipeline cache.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-03-17 16:17:10 +11:00
Timothy Arceri
124ec417f9 st/mesa: call glthread_destroy() before _vbo_DestroyContext()
Otherwise we have a race condition between vbo calls in the
glthread and the _vbo_DestroyContext() call.

This fixes a bunch of piglit crashes.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-17 09:47:02 +11:00
Jason Ekstrand
08df015b9d anv/GetQueryPoolResults: Actually implement the spec
The Vulkan spec is fairly clear about when we should and should not
write query pool results.  We're also supposed to return VK_NOT_READY if
VK_QUERY_RESULT_PARTIAL_BIT is not set and we come across any queries
which are not yet finished.  This fixes rendering corruptions on The
Talos Principle where geometry flickers in and out due to bogus query
results being returned by the driver.  These issues are most noticable
on Sky Lake GT4 2hen running on "ultra" settings.

Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100182
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
2017-03-16 15:08:18 -07:00
Jason Ekstrand
81840130c0 anv/query: Invalidate the correct range
Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
2017-03-16 15:08:17 -07:00
Jason Ekstrand
4bbb4b95b8 anv/query: Fix the location of timestamp availability
Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "17.0 13.0" <mesa-dev@lists.freedesktop.org>
2017-03-16 15:08:17 -07:00
Jason Ekstrand
9e60f59e62 genxml: Add XML version tags
There's not much point to having them or not having them but this
reduces some pointless diff from the version we can auto-generate

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-16 15:08:17 -07:00
Kenneth Graunke
f51a320b12 aubinator: Use fprintf for output.
This will make it easier to choose an output file.  For now, it remains
stdout.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-16 10:48:44 -07:00
Kenneth Graunke
65a9d5eabb aubinator: Reuse decode_structure code for handling commands
The code for decoding structures and commands was almost identical.
The only differences are: we print dword headers for commands, and
we skip the first one (with the command opcode and lengths).

So, generalize decode_structure to add a starting DWord, and a flag
for printing the DWord headers, and reuse it.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-16 10:48:41 -07:00
Kenneth Graunke
f0aa8fd4e4 aubinator: Delete redundant NULL check.
handle_struct_decode() is just a wrapper around decode_structure()
with a NULL check.  But the only caller already does that NULL check.

So, just use decode_structure() directly.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-16 10:48:37 -07:00
Kenneth Graunke
65138ce019 aubinator: Fix indentation.
Three space, not four.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-16 10:48:32 -07:00
Topi Pohjolainen
bd25d9670b i965/gen8+: Do full stall when switching pipeline
just as earlier gens do.

CC: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96743
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-16 18:44:15 +02:00
Jonathan Gray
46707bc27b i965: remove uneeded asm/unistd.h include
Fix the build on OpenBSD by removing an uneeded include for asm/unistd.h.

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-16 13:56:40 +00:00
Emil Velikov
e6bef50f4c i965: automake: remove spurious white space
Unintentionally introduced by yours truly with the i965 compiler move.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-16 13:55:42 +00:00
Jonathan Gray
d2bb0c8590 i965: avoid using a GNU make pattern rule
% pattern rules are a GNU extension.  As there is only one file here
avoid patterns and globbing entirely to fix the build on non-GNU make.

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
v2 [Emil Velikov: brw_oa.py dependency]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-16 13:55:23 +00:00
Emil Velikov
ccb89e72aa docs/releasing: document how to squash/announce queued patches
In the odd case where a patch needs to be fixed, squash the appropriate
fix and document how. Add a note in the pre-release notes, such that
devs can quickly spot it.

v2: Grammar/typo fixes (Eric). Use upstream commit [SHA] as reference.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-16 13:22:40 +00:00
Emil Velikov
0f988add50 docs/releasing: release.sh is located in xorg/util-modular
Correct the silly typo s/macros/modular/ and add a reference to the
repository.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-16 13:18:13 +00:00
Emil Velikov
79562033b5 docs/releasing: remove "git clean" step
release.sh from master, does not require the tree to be clean.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-16 13:18:11 +00:00
Emil Velikov
c81c563fbb mapi: remove Xlib/xcb include in gl_marshal.py
The only use of the header is to provide the _X_INLINE macro. We already
require (and provide where needed) 'inline', plus it's used in the file
already.

So replace the macro and drop the include. This fixes the build on
platforms which lack the header - from X-less Linuxes to Androids.

Fixes: 05dd4a1104 ("glapi: Generate GL API marshalling code from the XML.")
Reported-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100223
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-16 13:12:26 +00:00
Eric Engestrom
8a82f551cd docs/specs: update Khronos registries URLs
The registries were migrated to git and are now hosted on GitHub.
The old svn is now read-only, and will not be updated anymore.

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
2017-03-16 11:50:40 +00:00
Iago Toral Quiroga
ca34a3125f anv: improve error reporting when creating pipelines
Specifically, report 'out of memory' errors that might have happened while
emitting the pipeline's batch.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-16 11:40:05 +01:00
Iago Toral Quiroga
1d7468311d anv: handle errors in emit_binding_table() and emit_samplers()
These can fail to allocate device memory, however, the driver can recover
from this error by allocating a new binding table block and trying again.

v2:
  - Instead of tracking the errors in these functions and making callers
    reset the batch's status before attempting to allocate a new block
    for the binding table, simply make callers responsible for setting
    the error status if they fail to allocate memory during the second
    attempt (Jason).

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-16 11:40:05 +01:00
Iago Toral Quiroga
dd8348c8be anv: handle errors while allocating new binding table blocks
Also, we had a couple of instances in flush_descriptor_sets() were
we were returning a VkResult directly upon error, but the return
value of this function is not a VkResult but a uint32_t dirty mask,
so simply return 0 in these cases which reduces the amount of
work the driver will do after the error has been raised.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-16 11:40:05 +01:00
Iago Toral Quiroga
be52f9693a anv/blorp: make anv_cmd_buffer_alloc_blorp_binding_table() return a VkResult
Instead of asserting inside the function, and then use use that information
to return early from its callers upon failure.

v2:
  - Make sure that clear_color_attachment() and
    clear_depth_stencil_attachment() get the VkResult as well so they
    avoid executing the batch if an error happened. (Topi)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-16 11:40:05 +01:00
Iago Toral Quiroga
a578b06d7b anv/device: assert that commands submitted to a queue are not bogus
Any errors that may have happened during the command buffer recording are
reported by vkEndCommandBuffer() and it is the application's reponsibility
to not submit broken commands to a queue.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-16 11:40:05 +01:00
Iago Toral Quiroga
a752c4ecda anv/cmd_buffer: skip vkCmdExecuteCommands() on broken command buffers
v2: Assert on secondary commands, applications should've called
    vkEndCommandBuffer() and received an error for them before (Jason)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-16 11:40:05 +01:00
Iago Toral Quiroga
801493051e anv/cmd_buffer: skip vkCmdDispatch() on broken command buffers
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-16 11:40:05 +01:00
Iago Toral Quiroga
18ec3fa2a9 anv/cmd_buffer: skip vkCmdDraw*() on broken command buffers
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-16 11:40:05 +01:00
Iago Toral Quiroga
fb9d563fb9 anv: handle memory allocation errors during queue submissions
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-16 11:40:05 +01:00
Iago Toral Quiroga
c04dbd6b3e anv/cmd_buffer: handle out of memory during vkCmdPushConstants
Fixes:
dEQP-VK.api.out_of_host_memory.cmd_push_constants

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-16 11:40:05 +01:00
Iago Toral Quiroga
94a4f0c255 anv/cmd_buffer: handle allocation errors during vkCmdBeginRenderPass()
Fixes:
dEQP-VK.api.out_of_host_memory.cmd_begin_render_pass

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-16 11:40:05 +01:00
Iago Toral Quiroga
d823f381a5 anv/cmd_buffer: skip vkCmdEndRenderPass() for broken command buffers
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-16 11:40:05 +01:00
Iago Toral Quiroga
6743456699 anv/cmd_buffer: skip vkCmdNextSubpass() for broken command buffers
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-16 11:40:05 +01:00
Iago Toral Quiroga
8174e63869 anv/cmd_buffer: report tracked errors in vkEndCommandBuffer()
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-16 11:40:05 +01:00
Iago Toral Quiroga
68d88f0237 anv: handle failures when growing reloc lists
Growing the reloc list happens through calling anv_reloc_list_add() or
anv_reloc_list_append(). Make sure that we call these through helpers
that check the result and set the batch error status if needed.

v2:
  - Handling the crashes is not good enough, we need to keep track of
    the error, for that, keep track of the errors in the batch instead (Jason).
  - Make reloc list growth go through helpers so we can have a central
    place where we can do error tracking (Jason).

v3:
  - Callers that need the offset returned by anv_reloc_list_add() can
    compute it themselves since it is extracted from the inputs to the
    function, so change the function to return a VkResult, make
    anv_batch_emit_reloc() also return a VkResult and let their callers
    do the error management (Topi)

v4:
  - Let anv_batch_emit_reloc() return an uint64_t as it originally did,
    there is no real benefit in having it return a VkResult.
  - Do not add an is_aux parameter to add_surface_state_reloc(), instead
    do error checking for aux in add_image_view_relocs() separately.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-16 11:40:05 +01:00
Iago Toral Quiroga
d4bdd871dc anv: avoid crashes when failing to allocate batches
Most of the time we use macros that handle this situation transparently,
but there are some cases were we need to handle this explicitly.

This patch makes sure we don't crash, notice that error handling takes
place in the function that actually failed the allocation,
anv_batch_emit_dwords(), which will set the status field of the batch
so it can be used at a later moment to report the error to the user.

v2:
  - Not crashing is not good enough, we need to keep track of the error
    (Topi, Jason). Iago: now that we track errors in the batch, this
    is being handled.
  - Added guards in a few more places that needed it (Iago)

v3:
  - Check result of anv_batch_emitn() for NULL before calling memset()
    in emit_vertex_input() (Topi)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-16 11:40:05 +01:00
Iago Toral Quiroga
31f5049ff1 anv: handle allocation failure in anv_batch_emit_dwords()
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-16 11:40:05 +01:00
Iago Toral Quiroga
9e69409fcf anv: handle allocation failure in anv_batch_emit_batch()
v2:
 - Call the error handler (Topi)

Fixes:
dEQP-VK.api.out_of_host_memory.cmd_execute_commands

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-16 11:40:05 +01:00
Iago Toral Quiroga
a8ce8e3542 anv: add anv_batch_set_error() and anv_batch_has_error() helpers
The anv_batch_set_error() helper will track the first error that happened
while recording a command buffer. The helper returns the currently tracked
error to help the job of internal functions that may generate errors that
need to be tracked and return a VkResult to the caller.

We will use the anv_batch_has_error() helper to guard parts of the driver
that are not safe to execute if an error has been generated while recording
a particular command buffer.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-16 11:40:05 +01:00
Iago Toral Quiroga
d0195bd067 anv/cmd_buffer: add a status field to anv_batch
The vkCmd*() functions do not report errors, instead, any errors should be
reported by the time we call vkEndCommandBuffer(). This means that we
need to make the driver robust against incosistent and/or imcomplete
command  buffer states through the command recording process, particularly,
avoid crashes due to access to memory that we failed to allocate previously.

The strategy used to do this is to track the first error ocurred while
recording a command buffer in the batch associated with it. We use the
batch to track this information because the command buffer may not be
visible to all parts of the driver that can produce errors we need to be
aware of (such as allocation failures during batch emissions).

Later patches will use this error information to guard parts of the driver
that may not be safe to execute.

v2: Move the field from the command buffer to the batch so we can track
    errors from batch emissions (Jason)

v3: Registering errors in the command buffer's batch during
    anv_create_cmd_buffer() is unnecessary, since the command buffer
    is freed at the end of the function in that case (Topi)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-16 11:40:05 +01:00
Iago Toral Quiroga
6dd06f54eb anv/cmd_buffer: report errors in vkBeginCommandBuffer()
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-16 11:40:05 +01:00
Iago Toral Quiroga
88b539c4a0 anv: do not try to ref/unref NULL shaders
This situation can happen if we failed to allocate memory for the shader.

v2:
 - We shouldn't see NULL shaders in anv_shader_bin_ref so we should not check
   for that (Jason). Make sure that callers don't attempt to call this
   function with a NULL shader and assert that this never happens (Iago).

v3:
 - All callers to anv_shader_bin_unref seem to check for NULL before calling,
   so just assert that it is not NULL (Topi)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-16 11:40:05 +01:00
Iago Toral Quiroga
bad3a2e911 anv/blorp: return early if we failed to create the shader binary
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-16 11:40:05 +01:00
Iago Toral Quiroga
e2f707ce5b intel/blorp: make upload_shader() return a bool indicating success or failure
For now we always return true, follow-up patches will handle fail scenarios.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-16 11:40:05 +01:00
Iago Toral Quiroga
808503b8f8 anv: remove unnecessary function prototype.
The function is defined right after the prototype declaration. Also, the
protoype for it is included in anv_genX.h which is included via anv_private.h.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-16 11:40:05 +01:00
Timothy Arceri
04a9ca2700 mapi: don't include X11/Xlib-xcb.h on non PTHREAD platforms
Should fix the last of the glthread build issues on windows.
2017-03-16 15:45:40 +11:00
Timothy Arceri
4a32d473fd mesa: fix glthread marshal build issues on platforms without PTHREAD 2017-03-16 15:33:08 +11:00
Timothy Arceri
643b0fd7e9 mesa: fix glthread build issues on platforms without PTHREAD 2017-03-16 14:48:09 +11:00
Marek Olšák
c83562ccaa gallium: implement the backend of threaded GL dispatch
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-16 14:14:19 +11:00
Gregory Hainaut
93bdad3253 mesa/glthread: restore the dispatch table when incompatible gl calls are detected
While a context only has a single glthread, the context itself can be
attached to several threads. Therefore the dispatch table must be
updated in all threads before the destruction of glthread. In others
words, glthread can only be destroyed safely when the context is deleted.

Fixes remaining crashes in the glx-multithread-makecurrent* tests.

V2: (Timothy Arceri) updated gl_API.dtd marshal_fail description.

Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-16 14:14:19 +11:00
Gregory Hainaut
70e715eea6 mesa/glthread: don't set a dispatch table if we aren't the owner
Fix crashes when glxMakeCurrent is called.

Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-16 14:14:19 +11:00
Eric Anholt
012bfebc07 mesa: Track the current vertex/element array buffers for glthread.
We want to support glthread on GLES contexts with reasonable apps, and on
desktop for apps that use VBOs but haven't completely moved to core GL.
To do so, we have to deal with the "the user may or may not pass user
pointers to draw calls" problem.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-16 14:14:19 +11:00
Eric Anholt
238d027ed6 mesa: Disable glthread when glBegin() is called.
glBegin() swaps dispatch tables, and we don't have any code in place for
handling that in glthread (which also messes with dispatch tables), and I
don't particularly care to at this point.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-16 14:14:19 +11:00
Eric Anholt
cd1c003b18 mesa: Add an attribute for conditions to turn off threading.
The threading for GL core is in place, but there are so few applications
actually using a core GL context that it would be nice to extend support
back.  However, some of the features of compat GL (particularly user
vertex arrays) would be so expensive to track state for that we want to be
able to disable threading when we discover that the app is using them.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-16 14:14:19 +11:00
Eric Anholt
43d4f7a227 mesa: Add support for asynchronous glDraw* on GL core.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-16 14:14:19 +11:00
Eric Anholt
b18755a457 mesa: Add support for NULL arguments like in glBufferData() in marshalling.
This will let us support things like glBufferData() that should be
asynchronous.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-16 14:14:19 +11:00
Eric Anholt
47f819d3cb mesa: Statically allocate glthread command buffer in the batch struct.
This avoids an extra pointer dereference in the marshalling functions,
which, with the instruction count doing in the low 30s, could actually
matter for main-thread performance.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-16 14:14:19 +11:00
Eric Anholt
1d6b71c5c6 glapi: Mark vertex attrib pointer functions as async.
These don't actually read data out of the pointers, they set the
pointers (or offsets in a VBO) to be used in a later draw call.

v2: Don't forget glVertexAttribIPointer, and don't bother with annotations
    on aliases.
v3: Mark CompressedTexSubImage1D as sync also.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-16 14:14:19 +11:00
Paul Berry
a4a5de6f18 mesa: Custom thread marshalling for Flush.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-16 14:14:18 +11:00
Paul Berry
154a4f2679 mesa: Custom thread marshalling for ShaderSource.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-16 14:14:18 +11:00
Eric Anholt
efd63e234a mesa: Connect the generated GL command marshalling code to the build.
v2: Rebase on the Begin/End changes, and just disable this feature on
    non-GL-core.
v3: (Timothy Arceri) enable for non-GL-core contexts. Remove
    unrelated safe_mul() hunk. while loop style fix.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-16 14:14:18 +11:00
Marek Olšák
db06e91de2 Revert "mesa: make _mesa_alloc_dispatch_table() static"
This reverts commit 4009d22b61.

glthread needs it.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-16 14:14:18 +11:00
Paul Berry
ef30ce97a6 mesa: Create pointers for multithread marshalling dispatch table.
This patch splits the context's CurrentDispatch pointer into two
pointers, CurrentClientDispatch, and CurrentServerDispatch, so that
when doing multithread marshalling, we can distinguish between the
dispatch table that's being used by the client (to serialize GL calls
into the marshal buffer) and the dispatch table that's being used by
the server (to execute the GL calls).

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-16 14:14:18 +11:00
Eric Anholt
d8d81fbc31 mesa: Add infrastructure for a worker thread to process GL commands.
v2: Keep an allocated buffer around instead of checking for one at the
    start of every GL command.  Inline the now-small space allocation
    function.
v3: Remove duplicate !glthread->shutdown check, process remaining work
    before shutdown.
v4: Fix leaks on destroy.
V5: (Timothy Arceri) fix order of source files in makefile

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-16 14:14:18 +11:00
Eric Anholt
a76a3cf664 mesa: Validate count parameters when marshalling.
Otherwise, for example, glDeleteBuffers(-1, &bo) gets you a segfault
instead of GL_INVALID_VALUE.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-16 14:14:18 +11:00
Paul Berry
05dd4a1104 glapi: Generate GL API marshalling code from the XML.
This is not yet used in the build, just generated.

v2: Add missing build dependencies.
v3: Avoid mixing declarations and code, remove logic for avoiding emitting
    code that the compiler's optimizer can deal with anyway.
v4: (Timothy Arceri) move safe_mul() genereation here from a later patch.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-16 14:14:18 +11:00
Eric Anholt
f05524ffaa glapi: Mark compressed teximage functions as sync.
Without doing some additional tracking, we won't know whether the data
will be immediate user data, or will be loaded from a PBO.  The normal
teximage functions will be sync by default because they don't know up
front what the size of their image data is.  But for compressed teximage,
we have the count information, so they would end up async by default.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-16 14:14:18 +11:00
Paul Berry
f5052f45a2 glapi: Annotate functions with "marshal" attribute.
Several API functions require special treatment in order to be marshalled
to a background thread.  Others can't be safely executed in a background
thread and need to be executed synchronously (e.g. since they return data
through a pointer argument).

This annotation will be used when code generating thread marshalling code,
to ensure that each function is marshalled in the correct way.

Note that PixelMap functions are marked as synchronous for now since
their pointer may be relative to buffer on the GPU, so we'll need
special logic to marshal them properly.

v2: Move description of attribute types to a comment in the dtd file.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-16 14:14:18 +11:00
Eric Anholt
3b7b6adf3a egl: Implement __DRI_BACKGROUND_CALLABLE
v2: (Timothy Arceri) use C99 initializers.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-16 14:14:18 +11:00
Paul Berry
6b70d9fce3 glx: Implement __DRI_BACKGROUND_CALLABLE
v2: Marek: Add DRI3 support.

v3: (Timothy Arceri) use C99 initializers.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-16 14:14:18 +11:00
Paul Berry
77630841da mesa: Add SetBackgroundContext to dd_function_table
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-16 14:14:18 +11:00
Paul Berry
5bc527d39d dri: Update dri_util to keep track of __DRI_BACKGROUND_CALLABLE
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-16 14:14:18 +11:00
Paul Berry
e043b2a1a0 dri_interface: Add new marshalling interfaces to dri_interface.h
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-16 14:14:18 +11:00
Roland Scheidegger
e1f9e9bafd gallivm: (trivial) remove duplicated line
pointed out by clang (stored value never read)
2017-03-16 04:03:29 +01:00
Roland Scheidegger
9d104dfd55 draw: (trivial) remove a unnecessary lp_build_alloca()
pointed out by clang (stored value never read)
2017-03-16 04:03:29 +01:00
Ilia Mirkin
e893b3a367 swr: support layer output in geometry shaders
This makes bin/gl-3.2-layered-rendering-gl-layer-render fail only with
2DMS_ARRAY, which is expected given the lackluster MSAA support. However
all the regular types pass.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-15 21:03:11 -04:00
Bas Nieuwenhuizen
ad4dee521d Revert "radv: Emit cache flushes before CP DMA."
This reverts commit cce43f6d8c.

Redundant, as the flush already happens at si_cp_dma_prepare.

Acked-by: Dave Airlie <airlied@redhat.com>
2017-03-16 00:55:03 +01:00
Francisco Jerez
e6469ec43b gallium/tgsi: Treat UCMP sources as floats to match the GLSL-to-TGSI pass expectations.
Currently the GLSL-to-TGSI translation pass assumes it can use
floating point source modifiers on the UCMP instruction.  See the bug
report linked below for an example where an unrelated change in the
GLSL built-in lowering code for atan2 (e9ffd12827)
caused the generation of floating-point ir_unop_neg instructions
followed by ir_triop_csel, which is translated into UCMP with a negate
modifier on back-ends with native integer support.

Allowing floating-point source modifiers on an integer instruction
seems like rather dubious design for a transport IR, since the same
semantics could be represented as a sequence of MOV+UCMP instructions
instead, but supposedly this matches the expectations of TGSI
back-ends other than tgsi_exec, and the expectations of the DX10 API.
I take no responsibility for future headaches caused by this
inconsistency.

Fixes a regression of piglit glsl-fs-tan-1 on softpipe introduced by
the above-mentioned glsl front-end commit.  Even though the commit
that triggered the regression doesn't seem to have made it to any
stable branches yet, this might be worth back-porting since I don't
see any reason why the bug couldn't have been reproduced before that
point.

Suggested-by: Roland Scheidegger <sroland@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99817
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-03-15 15:47:14 -07:00
Grazvydas Ignotas
eb5a61f77a util/disk_cache: do eviction before creating .tmp
cache_put() first creates a .tmp file and then tries to do eviction.
The recently added LRU eviction code selects non-empty directory with
the oldest access time, but that may easily be the one with just the
new .tmp file, especially on Linux where atime is updated lazily
(with "relatime" mount option, which is the default). So when cache is
small, if random doesn't hit another dir LRU keeps selecting the same
dir with just the .tmp and not deleting anything. To fix this (and the
tests), do eviction earlier.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-16 09:36:18 +11:00
Tim Rowley
a7ce0490e4 swr: validate backend state numAttributes
General protection and prevents us from smashing the stack
on the first clear state validation (a7b8d50bcb).  Fixes crash
using icc.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-15 15:08:59 -05:00
Ben Widawsky
8378c576ab gbm: Export a get modifiers
This patch originally had i965 specific code and was named:
commit 61cd3c52b868cf8cb90b06e53a382a921eb42754
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Thu Oct 20 18:21:24 2016 -0700

    gbm: Get modifiers from DRI

To accomplish this, two new query tokens are added to the extension:
__DRI_IMAGE_ATTRIB_MODIFIER_UPPER
__DRI_IMAGE_ATTRIB_MODIFIER_LOWER

The query extension only supported 32b queries, and modifiers are 64b,
so we needed two of them.

NOTE: The extension version is still set to 13, so none of this will
actually be called.

v2: Error handling of queryImage (Emil)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-15 10:36:05 -07:00
Ben Widawsky
5c6e0d1c7d i965: introduce modifier selection.
Nothing special here other than a brief introduction to modifier
selection. Originally this was part of another patch but was split out
from
gbm: Introduce modifiers into surface/bo creation by request of Emil.

Requested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-15 10:36:05 -07:00
Ben Widawsky
191ff914a2 egl/drm: Use modifiers for backbuffer creation
Split into a separate patch from the previous patch as requested by
Emil.

Requested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-15 10:36:05 -07:00
Ben Widawsky
63bd2ae745 gbm: Introduce modifiers into surface/bo creation
The idea behind modifiers like this is that the user of GBM will have
some mechanism to query what properties the hardware supports for its BO
or surface. This information is directly passed in (and stored) so that
the DRI implementation can create an image with the appropriate
attributes.

A getter() will be added later so that the user GBM will be able to
query what modifier should be used.

Only in surface creation, the modifiers are stored until the BO is
actually allocated. In regular buffer allocation, the correct modifier
can (will be, in future patches be chosen at creation time.

v2: Make sure to check if count is non-zero in addition to testing if
calloc fails. (Daniel)

v3: Remove "usage" and "flags" from modifier creation. Requested by
Kristian.

v4: Take advantage of the "INVALID" modifier added by the GET_PLANE2
series.

v5: Don't bother with storing modifiers for gbm_bo_create because that's
a synchronous operation and we can actually select the correct modifier
at create time (done in a later patch) (Jason)

v6: Make modifier condition outside the check so that dri_use will work
properly (Jason)

Cc: Kristian Høgsberg <krh@bitplanet.net>
References (v4): https://lists.freedesktop.org/archives/intel-gfx/2017-January/116636.html
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Stone <daniels@collabora.com>
2017-03-15 10:36:05 -07:00
Ben Widawsky
5e7d8d3961 i965: Implement basic modifier image creation
This is just a stub for now and will be filled in later.

This was split out of an earlier patch

Requested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-15 10:36:05 -07:00
Ben Widawsky
d075cce258 dri: Add an image creation with modifiers
Modifiers will be obtained or guessed by the client and passed in during
image creation/import. In guessing, a client might decide to simply pass
along all known modifiers

This requires bumping the DRIimage version.

As of this patch, the modifiers aren't plumbed all the way down, this
patch simply makes sure the interface level stuff is correct.

v2: Don't allow usage + modifiers

v3: Make NAND actually NAND. Bug introduced in v2. (Jason)

v4:
- s/obtains/obtained (Jason)
- Pull out i965 imlemnentation into a later patch (Emil)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Stone <daniels@collabora.com>
2017-03-15 10:36:04 -07:00
Marek Olšák
0550f3d631 radeonsi: implement TGSI opcodes TEX_LZ and TXF_LZ
This massively decreases VGPR spilling for DiRT Showdown, because we
no longer have to use v4i32 for 2D fetches when level == 0.
We now use v2i32 for those cases.

DiRT Showdown - Spilled VGPRs: -26 (-81%)

This surprisingly doesn't have any useful effect on performance (+ 0.05%).
2017-03-15 18:17:41 +01:00
Marek Olšák
a7cc9b0fcf glsl_to_tgsi: use TEX_LZ and TXF_LZ when available 2017-03-15 18:17:41 +01:00
Marek Olšák
46cbb00f53 glsl_to_tgsi: remove a redundant statement
it's the same as the last "else".
2017-03-15 18:17:41 +01:00
Marek Olšák
cca0389c72 gallium: add TGSI opcodes TEX_LZ and TXF_LZ
for better code generation in radeonsi
2017-03-15 18:17:41 +01:00
Marek Olšák
bf3cdf0fd3 gallium: add PIPE_CAP_TGSI_TEX_TXF_LZ 2017-03-15 18:17:41 +01:00
Samuel Pitoiset
7751ed39e4 radeonsi: disable sinking common instructions down to the end block
Initially this was a workaround for a bug introduced in LLVM 4.0
in the SimplifyCFG pass that caused image instrinsics to disappear
(because they were badly sunk). Finally, this is a win because it
decreases SGPR spilling and increases the number of waves a bit.

Although, shader-db results are good I think we might want to
remove it in the future once the issue is fixed. For now, enable
it for LLVM >= 4.0.

This also fixes a rendering issue with the speedometer in Dirt Rally.

More information can be found here https://reviews.llvm.org/D26348.

Thanks to Dave Airlie for the patch.

v2: - add a FIXME comment
    - use if (HAVE_LLVM >= 0x0400) instead

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99484
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97988
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-15 14:24:40 +01:00
Samuel Pitoiset
74265fd03c tgsi: add missing compute shader entry in tgsi_get_processor_name()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-15 14:16:29 +01:00
Samuel Pitoiset
38ee3246d2 radeonsi: clean up tex_fetch_ptrs()
Will also help when the src sampler register will be
TGSI_FILE_CONSTANT for bindless.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-15 14:16:26 +01:00
Emil Velikov
8a5680f248 configure.ac: bump pthread-stubs requirement
On platforms that require it, we bump the requirement to 0.4 or later.
Due to an issue with the project [design] any version earlier than it,
is bound to cause issues. For the specifics see the pthread-stubs README

Cc: Uli Schlachter <psychon@znc.in>
Cc: Jonathan Gray <jsg@jsg.id.au>
Cc: Jean-Sébastien Pédron <dumbbell@FreeBSD.org>
Cc: François Tigeot <ftigeot@wolfpond.org>
Cc: Tobias Nygren <tnn@NetBSD.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2017-03-15 11:49:27 +00:00
Emil Velikov
eec0cd71cd glx: don't expose systemTimeExtension for DRI2/DRI3/DRISW
Used/applicable to only dri1 drivers.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2017-03-15 11:48:50 +00:00
Emil Velikov
b1fb6e8d8c anv: do not open random render node(s)
drmGetDevices2() provides us with enough flexibility to build heuristics
upon. Opening a random node on the other hand will wake up the device,
regardless if it's the one we're interested or not.

v2: Rebase, explicitly require/check for libdrm
v3: Return VK_ERROR_INCOMPATIBLE_DRIVER for no devices (Ilia)
v4: Rebase

Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-15 11:38:05 +00:00
Emil Velikov
743315f269 radv: do not open random render node(s)
drmGetDevices2() provides us with enough flexibility to build heuristics
upon. Opening a random node on the other hand will wake up the device,
regardless if it's the one we're interested or not.

v2: Rebase.
v3: Return VK_ERROR_INCOMPATIBLE_DRIVER for no devices (Ilia)

Cc: Michel Dänzer <michel.daenzer@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-15 11:38:02 +00:00
Emil Velikov
8ff2937dfa radv/winsys: use drmGetDevice2 API
Analogous to previous commit

v2: Add explicit require_libdrm check.

Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (v1)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-15 11:38:00 +00:00
Emil Velikov
858170e8a4 winsys/amdgpu: use drmGetDevice2 API
Analogous to previous commit

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98502
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-15 11:37:58 +00:00
Emil Velikov
a50c4eb2a0 loader: use drmGetDevice[s]2 API
By this allows us to fetch the device list/info w/o the revision field.
At the moment retrieving the latter wakes up the device.

Note: kernel patch to resolve that should be in 4.10.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-15 11:37:55 +00:00
Emil Velikov
2c72e78ff5 autoconf/scons: bump libdrm to 2.4.75
We'll be using the drmGetDevice[s]2 API in src/loader with next patch.

v2: Rebase.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-15 11:37:39 +00:00
Emil Velikov
0fd61fb639 util/sha1: drop _mesa_sha1_{update, format} return type
Unused/unchecked by any of the callers.

v2: Fix the glsl cases that have crept in since v1

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
2017-03-15 11:18:45 +00:00
Emil Velikov
a9a4028fd7 util/sha1: rework _mesa_sha1_{init,final}
Rather than having an extra memory allocation [that we currently do not
and act accordingly] just make the API take an pointer to a stack
allocated instance.

This and follow-up steps will effectively make the _mesa_sha1_foo simple
define/inlines around their SHA1 counterparts.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
2017-03-15 11:18:43 +00:00
Emil Velikov
c96127e873 util/sha1: add non-typedef name for the SHA1_CTX struct
Using typedef(s) is not always the answer and makes it harder for people
to do clever (or one might call nasty) things with the code.

Add a struct name which we will use with follow-up commit.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
2017-03-15 11:15:53 +00:00
Bas Nieuwenhuizen
ef43eeb09f radv: Remove unused descriptor set field.
Trivial.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
2017-03-15 09:06:52 +01:00
Dave Airlie
686d060458 r600: refactor binding code for attach buffer to CB.
This refactors out the code and fixes it up to be used
for images later. It uses the code in the current RAT binding
for compute.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-15 14:33:26 +10:00
Dave Airlie
222e42e45f r600: refactor out CB setup.
This moves the code to create CB info out into
a separate function so it can be reused in images
code to create RATs.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-15 14:33:23 +10:00
Dave Airlie
0cf717821e r600: refactor texture resource words setup code.
This refactors out the code to setup a texture resource
so we can reuse it later from the images code.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-15 14:33:06 +10:00
Dave Airlie
95a976b651 r600: factor out the code to initialise a buffer resource.
This takes the code required to initialise a buffer resource
out of the texture buffer code, into it's own function.

This is going to be used for the image support later.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-15 14:32:48 +10:00
Dave Airlie
cf2af021b9 r600g: make framebuffer atom rely on dual src blend state.
In order to make ARB_shader_image_load_store, we have to share
the CB space with RATs, so we should only steal the dual src
space if we have dual src enabled.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-15 14:32:44 +10:00
Jason Ekstrand
d142c7436c intel/debug: Add a common INTEL_DEBUG=nohiz option
The GL driver had a driconf option (which doesn't make much sense) and
the Vulkan driver had a hand-rolled environment variable.  Instead,
let's tie both into the INTEL_DEBUG mechanism and unify things.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-14 21:00:09 -07:00
Jason Ekstrand
c09bb956ca anv/image: Move handling of INTEL_VK_HIZ
This makes it so that you don't get an "Implement gen7 HiZ" perf warning
when you manually disable HiZ on gen8.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-14 21:00:09 -07:00
Timothy Arceri
304b35b0e9 radv: trivial tidy ups
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-15 11:45:04 +11:00
Alan Swanson
b7e03d87e4 util/disk_cache: scale cache according to filesystem size
Select higher of current 1G default or 10% of filesystem where
cache is located.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
2017-03-15 11:15:11 +11:00
Alan Swanson
f1e9671442 util/disk_cache: actually enforce cache size
Currently only a one in one out eviction so if at max_size and
cache files were to constantly increase in size then so would the
cache. Restrict to limit of 8 evictions per new cache entry.

V2: (Timothy Arceri) fix make check tests

Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
2017-03-15 11:15:11 +11:00
Alan Swanson
af09b86732 util/disk_cache: use LRU eviction rather than random eviction
Still using fast random selection of two-character subdirectory in
which to check cache files rather than scanning entire cache.

v2: Factor out double strlen call
v3: C99 declaration of variables where used

Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-15 11:15:11 +11:00
Timothy Arceri
c2793e2c89 util/disk_cache: don't fallback to an empty cache dir on evict
If we fail to randomly select a two letter cache dir, don't select
an empty dir on fallback.

In real world use we should never hit the fallback path but it can
be hit by tests when the cache is set to a very small max value.

Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
2017-03-15 11:15:11 +11:00
Timothy Arceri
50989f87e6 util/disk_cache: use a thread queue to write to shader cache
This should help reduce any overhead added by the shader cache
when programs are not found in the cache.

To avoid creating any special function just for the sake of the
tests we add a one second delay whenever we call dick_cache_put()
to give it time to finish.

V2: poll for file when waiting for thread in test
V3: fix poll delay to really be 100ms, and simplify the wait function

Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
2017-03-15 11:15:11 +11:00
Timothy Arceri
fc5ec64ba3 util/disk_cache: add helpers for creating/destroying disk cache put jobs
V2: Make a copy of the data so we don't have to worry about it being
freed before we are done compressing/writing.

Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
2017-03-15 11:15:11 +11:00
Timothy Arceri
e2c4435b07 util/disk_cache: add thread queue to disk cache
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
2017-03-15 11:15:10 +11:00
Dave Airlie
7372e3cf5f radv/ac: workaround regression in llvm 4.0 release
LLVM 4.0 released with a pretty messy regression, that hopefully
get fixed in the future.

This work around was proposed by Tom, and it fixes the CTS regressions
here at least, I'm not sure if this will cause any major side effects,
but correctness over speed and all that.

radeonsi should possibly consider the same workaround until an llvm
fix can be found.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-15 09:51:53 +10:00
Dave Airlie
3ece76f03d radv/ac: gather4 cube workaround integer
This fix is extracted from amdgpu-pro shader traces.

It appears the gather4 workaround for integer types doesn't
work for cubes, so instead if forces a float scaled sample,
then converts to integer.

It modifies the descriptor before calling the gather.

This also produces some ugly asm code for reasons specified
in the patch, llvm could probably do better than dumping
sgprs to vgprs.

This fixes:
dEQP-VK.glsl.texture_gather.basic.cube.rgba8*

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-15 09:51:53 +10:00
Bas Nieuwenhuizen
407fa77669 radv: Set driver version to mesa version;
I couldn't really find an encoding in the spec. I'm not sure it
prescribes VK_MAKE_VERSION format, but vulkan.gpuinfo.org interprets
it that way by default. vulkaninfo gives the raw number, so we could
alternatively do something like 17001000, but that doesn't show
up right on vulkan.gpuinfo.org again. Looking at that site, the -pro
driver also uses VK_MAKE_VERSION, so keeping consistency is probably
best.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2017-03-15 00:37:56 +01:00
Bas Nieuwenhuizen
ed28ae71f5 radv: Increase api version to 1.0.42.
I've skimmed to changes from 1.0.5 to 1.0.42 and I think we have all
changes. We're still not conformant ofcourse, but this should not
regress stuff,

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2017-03-15 00:37:56 +01:00
Jason Ekstrand
2e98db68e4 util/vk: Add helpers for finding an extension struct
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-15 08:22:02 +10:00
Alex Smith
e0cc32b85b radv: Flush before copying with PKT3_WRITE_DATA in CmdUpdateBuffer
Need to flush before updating the buffer to ensure that the copy is
ordered after previous accesses (assuming the app has performed the
appropriate barriers).

This fixes potential issues due to draws prior to an update reading
the new buffer content, despite having the necessary barriers between
them.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-14 22:17:03 +01:00
Bas Nieuwenhuizen
cce43f6d8c radv: Emit cache flushes before CP DMA.
The flushes could be due to TRANSFER barriers.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-14 22:16:34 +01:00
Jan Beich
fe56c745b8 Convert sed(1) syntax to be compatible with FreeBSD and OpenBSD
BSD regex library doesn't support extended RE escapes (e.g. \+) and
shorthand character classes (e.g. \s, \S) and SVR4-style word
delimiters[1] (on DragonFly and NetBSD). Both GNU and BSD sed support
-E and -r to enable extended RE but OS X still lacks -r.

[1] https://www.illumos.org/issues/516

Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Eric Engestrom <eric.engestrom@imgtec.com> (GNU sed)
2017-03-14 17:07:04 +00:00
Jason Ekstrand
aed2714145 anv: Properly enumerate physical devices when none are present 2017-03-14 09:08:07 -07:00
Jason Ekstrand
9d559ba39d nir/constant_expressions: Refactor helper functions
Apart from avoiding some unneeded size cases, this shouldn't have any
actual functional impact.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-14 07:36:40 -07:00
Jason Ekstrand
762a6333f2 nir: Rework conversion opcodes
The NIR story on conversion opcodes is a mess.  We've had way too many
of them, naming is inconsistent, and which ones have explicit sizes was
sort-of random.  This commit re-organizes things and makes them all
consistent:

 - All non-bool conversion opcodes now have the explicit size in the
   destination and are named <src_type>2<dst_type><size>.

 - Integer <-> integer conversion opcodes now only come in i2i and u2u
   forms (i2u and u2i have been removed) since the only difference
   between the different integer conversions is whether or not they
   sign-extend when up-converting.

 - Boolean conversion opcodes all have the explicit size on the bool and
   are named <src_type>2<dst_type>.

Making things consistent also allows nir_type_conversion_op to be moved
to nir_opcodes.c and auto-generated using mako.  This will make adding
int8, int16, and float16 versions much easier when the time comes.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-03-14 07:36:40 -07:00
Jason Ekstrand
7107b32155 i965/fs: Re-arrange conversion operations
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-14 07:36:40 -07:00
Jason Ekstrand
bab4610e9c i965/vec4: Get rid of the type parameter from to/from_double
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-14 07:36:40 -07:00
Jason Ekstrand
702d1af8ba glsl/nir: Use nir_type_conversion_op
Using the helper is way better than hand-coding the universe.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-03-14 07:36:40 -07:00
Jason Ekstrand
6eb051e36f nir: Rewrite nir_type_conversion_op
The original version was very convoluted and tried way too hard to not
just have the nested switch statement that it needs.  Let's just write
the obvious code and then we know it's correct.  This fixes a bunch of
missing cases particularly with int64.

Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
2017-03-14 07:36:40 -07:00
Jason Ekstrand
9084b1db30 nir: Add a get_nir_type_for_glsl_base_type helper
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-03-14 07:36:40 -07:00
Jason Ekstrand
a136884139 nir/validate: Rework ALU bit-size rule validation
The original bit-size validation wasn't capable of properly dealing with
instructions with variable bit sizes.  An attempt was made to handle it
by looking at source and destinations but, because the validation was
done in validate_alu_(src|dest), it didn't really have the needed
information.  The new validation code is much more straightforward and
should be more correct.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-03-14 07:36:40 -07:00
Jason Ekstrand
370d68babc nir/validate: Validate that bit sizes and components always match
We've always required bit sizes to match but the rules for number of
components have been a bit loose.  You've never been allowed to source
from something with less components than you consume, but more has
always been fine.  This changes the validator to require that they match
exactly.  The fact that they don't always match has been a source of
confusion in NIR for quite some time and it's time we got rid of it.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2017-03-14 07:36:40 -07:00
Jason Ekstrand
e9a45a3d5d nir: Make image_size a variable-width intrinsic
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2017-03-14 07:36:40 -07:00
Jason Ekstrand
b377be9213 i965/fs: Use num_components from the SSA def in image intrinsics
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2017-03-14 07:36:40 -07:00
Jason Ekstrand
0bf0365393 nir/lower_tex: Use tex_instr_dest_size for txs destinations
Using coord_components of the source texture is correct for everything
except cube maps where it's off by one.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2017-03-14 07:36:20 -07:00
Jason Ekstrand
fffa4111df nir/spirv: Restrict the number of channels in texture coordinates
Some SPIR-V texturing instructions pack more than the texture coordinate
into the coordinate source.  We need to mask off the unused channels.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2017-03-14 07:36:20 -07:00
Jason Ekstrand
3c312be7b3 nir/copy_prop: Respect the source's number of components
In the near future we are going to require that the num_components in a
src dereference match the num_components of the SSA value being
dereferenced.  To do that, we need copy_prop to not remove our MOVs from
a larger SSA value into an instruction that uses fewer channels.

Because we suddenly have to know how many components each source has,
this makes the pass a bit more complicated.  Fortunately, copy
propagation is the only pass that cares about the number of components
are read by any given source so it's fairly contained.

Shader-db results on Sky Lake:

   total instructions in shared programs: 13318947 -> 13320265 (0.01%)
   instructions in affected programs: 260633 -> 261951 (0.51%)
   helped: 324
   HURT: 1027

Looking through the hurt programs, about a dozen are hurt by 3
instructions and the rest are all hurt by 2 instructions.  From a
spot-check of the shaders, the story is always the same:  They get a
vec4 from somewhere (frequently an input) and use the first two or three
components as a texture coordinate.  Because of the vector component
mismatch, we have a mov or, more likely, a vecN sitting between the
texture instruction and the input.  This means that the back-end inserts
a bunch of MOVs and split_virtual_grfs() goes to town.  Because the
texture coordinate is also used by some other calculation, register
coalesce can't combine them back together and we end up with an extra 2
MOV instructions in our shader.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2017-03-14 07:36:20 -07:00
Jason Ekstrand
60d1aac28a nir/intrinsics: Make load_barycentric_input take a 2-component coor
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
2017-03-14 07:36:20 -07:00
Jason Ekstrand
678fd00f2f anv/blorp: Only set a clear color for resolves if fast-cleared
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
2017-03-14 07:36:20 -07:00
Jason Ekstrand
273b720310 anv/blorp: Turn off AUX after doing a CCS_D resolve
For render passes with multiple subpasses on gen7, we only fast-clear at
the top but an input attachment use can cause us to do a resolve in the
middle of the render pass.  Once we've done so, we are no longer have a
fast-cleared surface so we can just set aux_usage to NONE.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
2017-03-14 07:36:20 -07:00
Tapani Pälli
773d510c66 android: add '/vulkan' to libmesa_anv_entrypoints path
otherwise generated entrypoint headers are not found during build

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-14 07:48:30 +02:00
Tapani Pälli
4734322574 android: add src/intel/compiler to libmesa_intel_compiler includes
fixes build error when brw_nir.h not found in the generated file
brw_nir_trig_workarounds.c.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-14 07:48:22 +02:00
Gwan-gyeong Mun
8f22552a4f anv: Add missing error-checking to anv_CreateDevice (v3)
This patch adds missing error-checking and fixes resource leak in
allocation failure path on anv_CreateDevice()

v2: Fixes from Jason Ekstrand's review
  a) Add missing destructors for all of the state pools on allocation
     failure path
  b) Add missing destructor for batch bo pools on allocation failure path

v3: Fixes from Emil Velikov's review
  Add missing destructor for queue and scratch_pool on allocation failure
  path

Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 21:29:43 -07:00
Dave Airlie
b8ee70384a radv: setup llvm target data layout
Ported from radeonsi, pointed out by Tom.

"This prevents LLVM from using sext instructions for local memory
offsets and allows the backend to fold immediate offsets into the
instruction. This also prevents some incorrect code generation for
ptrtoint and inttoptr instructions."

Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-14 10:33:59 +10:00
Alex Smith
c19607d59d radv: Reinitialise loaderMagic when allocating a cached command buffer
This must be set to ICD_LOADER_MAGIC by vkAllocateCommandBuffers, which
was being done when allocating a new buffer but not when reusing an
existing one in the cache. This would hit an assertion and crash in
debug builds of the Vulkan loader.

Fixes: 682248db45 ("radv: Cache command buffers in command pool.")
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-03-13 23:42:36 +01:00
Marek Olšák
cdbe4990cd gallium/radeon: disable the shader cache if dumping shaders
otherwise, cached shaders aren't dumped.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-13 23:34:52 +01:00
Marek Olšák
71a2e4e945 radeonsi: mark all bound shader buffer ranges as initialized
This should prevent cases when a buffer was incorrectly mapped without
synchronization just because this wasn't done.

Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-03-13 23:34:52 +01:00
Marek Olšák
686cd76a4c st/mesa: disable the shader cache if dumping shaders
otherwise, cached shaders aren't dumped.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-13 23:34:52 +01:00
Chad Versace
c5a0829e1f anv: Use vk_outarray in vkGetPhysicalDeviceQueueFamilyProperties
No intended change in behavior. Just a refactor.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 15:08:15 -07:00
Chad Versace
876f0ecd2f anv: Use vk_outarray in vkEnumeratePhysicalDevices (v2)
No intended change in behavior. Just a refactor.

v2: Replace vk_outarray_is_incomplete() with vk_outarray_status(). For
    Jason.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 15:08:15 -07:00
Chad Versace
62160536a0 util/vulkan: Add vk_outarray (v2)
This is a wrapper for a Vulkan output array. A Vulkan output array is
one that follows the convention of the parameters to
vkGetPhysicalDeviceQueueFamilyProperties().

v2: Replace vk_outarray_is_incomplete() with vk_outarray_status(). For
    Jason.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 15:08:11 -07:00
Lionel Landwerlin
bf47e5ba53 intel: genxml: prevent missing ; with address fields dwords
Before this change, the generator could print this kind of things :

   const uint32_t v0 =
      __gen_uint(values->ValidBit, 0, 0) |
      __gen_uint(values->FaultType, 1, 2) |
      __gen_uint(values->SRCIDofFault, 3, 10) |
      __gen_uint(values->GTTSEL, 11, 1) |
   dw[0] = __gen_combine_address(data, &dw[0], values->VirtualAddressofFault, v0);

This change fix the trailing '|'.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 17:23:49 +00:00
Julien Isorce
9df3f28a8b gallium/hud: check NULL return from u_upload_alloc
Fixes the following segmentation fault:

signal SIGSEGV: invalid address (fault address: 0x0)
 frame #0: 0x00007fffe718e117 radeonsi_dri.so hud_draw_background_quad hud_context.c:170
   167
   168 	   assert(hud->bg.num_vertices + 4 <= hud->bg.max_num_vertices);
   169
-> 170 	   vertices[num++] = (float) x1;
   171 	   vertices[num++] = (float) y1;
   172
   173 	   vertices[num++] = (float) x1;
(lldb) bt
  * frame #0: 0x00007fffe718e117 radeonsi_dri.so`hud_draw_background_quad
    frame #1: 0x00007fffe718f458 radeonsi_dri.so`hud_draw
    frame #2: 0x00007fffe712967f radeonsi_dri.so`dri_flush

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-03-13 17:20:21 +01:00
Julien Isorce
d08c0930af winsys/radeon: check null return from radeon_cs_create_fence in cs_flush
Follow-up of patch:
"radeon_cs_create_fence: check null return from radeon_winsys_bo_create"

radeon_drm_cs_flush
  radeon_cs_create_fence
    radeon_winsys_bo_create

Signed-off-by: Julien Isorce <jisorce@oblong.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-03-13 17:19:29 +01:00
Julien Isorce
d09edb0146 winsys/radeon: check null in radeon_cs_create_fence
Fixes the following segmentation fault:

radeon_drm_cs_add_buffer (bo=0x0) at radeon_drm_cs.c
  -> if (!bo->handle)
(gdb) bt
0  radeon_drm_cs_add_buffer (bo=0x0) at radeon_drm_cs.c
1  0x00007fffe73575de in radeon_cs_create_fence radeon_drm_cs.c
2  0x00007fffe7358c48 in radeon_drm_cs_flush radeon_drm_cs.c

Signed-off-by: Julien Isorce <jisorce@oblong.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-03-13 17:17:30 +01:00
Juan A. Suarez Romero
192de3f051 vulkan/wsi: include builddir for generated headers
wayland-drm-client-protocol.h is generated in builddir, so when
builddir != srcdir the header is not found, and compilation of
wsi_common_wayland.c will fail.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-13 16:04:20 +01:00
Jason Ekstrand
dd4db84640 anv: Use on-the-fly surface states for dynamic buffer descriptors
We have a performance problem with dynamic buffer descriptors.  Because
we are currently implementing them by pushing an offset into the shader
and adding that offset onto the already existing offset for the UBO/SSBO
operation, all UBO/SSBO operations on dynamic descriptors are indirect.
The back-end compiler implements indirect pull constant loads using what
basically amounts to a texelFetch instruction.  For pull constant loads
with constant offsets, however, we use an oword block read message which
goes through the constant cache and reads a whole cache line at a time.
Because of these two things, direct pull constant loads are much faster
than indirect pull constant loads.  Because all loads from dynamically
bound buffers are indirect, the user takes a substantial performance
penalty when using this "performance" feature.

There are two potential solutions I have seen for this problem.  The
alternate solution is to continue pushing offsets into the shader but
wire things up in the back-end compiler so that we use the oword block
read messages anyway.  The only reason we can do this because we know a
priori that the dynamic offsets are uniform and 16-byte aligned.
Unfortunately, thanks to the 16-byte alignment requirement of the oword
messages, we can't do some general "if the indirect offset is uniform,
use an oword message" sort of thing.

This solution, however, is recommended for a few of reasons:

 1. Surface states are relatively cheap.  We've been using on-the-fly
    surface state setup for some time in GL and it works well.  Also,
    dynamic offsets with on-the-fly surface state should still be
    cheaper than allocating new descriptor sets every time you want to
    change a buffer offset which is really the only requirement of the
    dynamic offsets feature.

 2. This requires substantially less compiler plumbing.  Not only can we
    delete the entire apply_dynamic_offsets pass but we can also avoid
    having to add architecture for passing dynamic offsets to the back-
    end compiler in such a way that it can continue using oword messages.

 3. We get robust buffer access range-checking for free.  Because the
    offset and range are baked into the surface state, we no longer need
    to pass ranges around and do bounds-checking in the shader.

 4. Once we finally get UBO pushing implemented, it will be much easier
    to handle pushing chunks of dynamic descriptors if the compiler
    remains blissfully unaware of dynamic descriptors.

This commit improves performance of The Talos Principle on ULTRA
settings by around 50% and brings it nicely into line with OpenGL
performance.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-13 07:58:00 -07:00
Jason Ekstrand
6b644e571e anv: Stall before fast-clear operations
During initial CCS bring-up, I discovered that you have to do a full CS
stall prior to doing a CCS resolve as well as afterwards.  It appears
that the same is needed for fast-clears as well.  This fixes rendering
corruptions on The Talos Principle on Sky Lake GT4.  The issue hasn't
been demonstrated on any other hardware however, given that this appears
to be a "too many things in the pipe" problem, having it be easier to
reproduce on a system with more EUs makes sense.  The issues with
resolves is demonstrable on a GT3 or GT2 so this is probably also a
problem on all GTs.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-03-13 07:57:03 -07:00
Jason Ekstrand
5e44ef4a76 anv: Accurately advertise dynamic descriptor limits
The number of dynamic descriptors is limited by both the number of
descriptors and the total number of dynamic things.  Because there isn't
a single "maximum dynamic things" limit, we need to divide by two so
that they can create the maximum of both UBOs and SSBOs.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
2017-03-13 07:57:03 -07:00
Jason Ekstrand
d36b463817 anv: Add a helper for working with VK_WHOLE_SIZE for buffers
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
2017-03-13 07:57:03 -07:00
Rob Clark
f805593b12 freedreno/ir3: fragz cannot be half precision
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-03-13 10:33:07 -04:00
Rob Clark
b1df639db6 freedreno/ir3: optimize less in glsl
Rely on nir for optimization, to reduce compile times.  Very minimal impact
on shader-db:

  total instructions in shared programs:          104170 -> 104199 (0.03%)
  total dwords in shared programs:                209664 -> 209728 (0.03%)
  total full registers used in shared programs:   7156 -> 7161 (0.07%)
  total half registers used in shader programs:   109 -> 109 (0.00%)
  total const registers used in shared programs:  24222 -> 24224 (0.01%)

                   half       full      const      instr     dwords
      helped          12         107         103         112          98
        hurt          11         104         105         115         102

But shader db runtime dropped from ~29.3s user to ~20.4s user.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-03-13 10:33:07 -04:00
Lionel Landwerlin
3278cd7610 aubinator/genxml: use gzipped files to store embedded genxml
This reduces the size of the aubinator binary from ~1.4Mb to ~700Kb.
With can now drop the checks on xxd in configure.

v2: Fix incorrect makefile dependency (Lionel)

v3: use $(PYTHON2) (Emil)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-13 13:36:31 +00:00
Lionel Landwerlin
351c951e09 intel: genxml: add script to generate gzipped genxml
v2 (from Dylan):
   Add main function
   Add missing Copyright
   Use print_function

v3: Add actually license (Dylan)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-13 13:36:27 +00:00
Jose Fonseca
b822d9c2b7 util/u_thread.h: Include stdint.h for int64_t definition.
Fixes MinGW build.  Trivial.
2017-03-13 12:23:11 +00:00
Iago Toral Quiroga
e8eeb759b7 intel: fix compiler build
compiler/brw_vec4_gs_visitor.cpp:744:39: error:
‘GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES’ was not declared in this scope
           output_vertex_size_bytes <= GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES);

Fixes: d0d4a5f43b ("i965: split EU defines to brw_eu_defines.h")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-13 13:09:24 +01:00
Christian König
8dee325752 svga: handle P016 format as well
Fixes: 62cff79378 ("gallium: add P016 format")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100180
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-13 12:49:41 +01:00
Emil Velikov
b82bd31c54 configure.ac: require pthread-stubs only where available
The project is a thing only for BSD platforms. Or in other words - for
any other platforms building/installing pthread-stubs results only in a
pthread-stub.pc file.

And even where it provides a DSO, there's a fundamental design issue
with it - see the pthread-stubs mailing list for the specifics.

v2: Update comment above the switch statement (Jon Turney).

Reviewed-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Acked-by: Gary Wong <gtw@gnu.org>
Tested-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Randy Fishel <randy.fishel@oracle.com>
Cc: Niveditha Rau <niveditha.rau@oracle.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-13 11:30:07 +00:00
Emil Velikov
9aebdb5d08 configure.ac: do not require the i965 driver for ANV
As of last few commits we have the two split, thus we no longer require
the i965 in order to have the ANV driver.

Even though ANV does not link against libdrm nor libdrm_intel, we still
require those as dependencies due to the headers they provide.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:35 +00:00
Jason Ekstrand
ee8044fd33 intel/vulkan: Get rid of recursive make
v2 [Emil Velikov]
 - Various fixes and initial stab at the Android build.
 - Keep the generation rules/EXTRA_DIST outside the conditional

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:35 +00:00
Jason Ekstrand
7f9bbcfb7b intel/tools: Use a makefile included from intel/Makefile.am
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:35 +00:00
Emil Velikov
aa09c9552c intel/compiler: whitespace cleanups
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:35 +00:00
Emil Velikov
bdc5036464 intel/compiler: link all tests again gtest, even test_eu_compact"
At the moment all the tests but test_eu_compact are actual C++ gtests.
To simplify things, we can move the gtest.la to the common TEST_LIBS.
As we're here, we can rename change the test extension [to .cpp] to
avoid using the confusing dummy.cpp.

Add a nice comment in the makefile for posterity.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:35 +00:00
Emil Velikov
f282ace678 i965: remove i965_symbols_test reference from .gitignore
The test/binary was removed back in 2012. With that one gone, we can
drop the .gitignore file all together.

Cc: Eric Anholt <eric@anholt.net>
Fixes: c885039442 ("i965: Drop the missing symbols link test.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:35 +00:00
Jason Ekstrand
700bebb958 i965: Move the back-end compiler to src/intel/compiler
Mostly a dummy git mv with a couple of noticable parts:
 - With the earlier header cleanups, nothing in src/intel depends
files from src/mesa/drivers/dri/i965/
 - Both Autoconf and Android builds are addressed. Thanks to Mauro and
Tapani for the fixups in the latter
 - brw_util.[ch] is not really compiler specific, so it's moved to i965.

v2:
 - move brw_eu_defines.h instead of brw_defines.h
 - remove no-longer applicable includes
 - add missing vulkan/ prefix in the Android build (thanks Tapani)

v3:
 - don't list brw_defines.h in src/intel/Makefile.sources (Jason)
 - rebase on top of the oa patches

[Emil Velikov: commit message, various small fixes througout]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:34 +00:00
Emil Velikov
d0d4a5f43b i965: split EU defines to brw_eu_defines.h
Split out the EU defines from the 'generic' ones, as the former are more
compiler oriented.

With a later commit we'll move brw_eu_defines.h alongside the compiler
infra to src/intel/. Pulling all the defines in there seems overzealous.

Some defines are used by both i965 and the i965 compiler. Those are
moved to brw_eu_defines.h, and annotated accordingly. The i965 users
were updated to have the extre include to indicate that.

With future work we might provide a better, split but for now this seems
reasonable.

Cc: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:34 +00:00
Emil Velikov
a72ac98160 util/bitscan: use correct signature for ffs/ffsll
Otherwise we'll get errors such as

error: conflicting types for ‘ffs’
error: conflicting types for ‘ffsll’

We might want to improve the heuristics and provide a definition only
when a native one is missing. We can address that at a later stage.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:34 +00:00
Emil Velikov
fb0832b86d i965: add missing brw_defines.h include in brw_program.c
File is using MI_LOAD_REGISTER_IMM, GEN7_CACHE_MODE_1 and others as
defined in the header.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:34 +00:00
Emil Velikov
2eefb903d5 i965: add missing brw_defines.h include in brw_program.c
File is using the PIPE_CONTROL_* macros as defined in the header.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:34 +00:00
Emil Velikov
1d80407a6a i965: add missing #include <assert.h> in brw_inst.h
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:34 +00:00
Emil Velikov
077078ce77 i965: move brw_define.h ifndef guard to the top
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:34 +00:00
Emil Velikov
8c432645bb i965: remove unused macros from brw_defines.h
The follow three groups are not used by neither the DRI module nor the
compiler.
 BRW_POLYGON_*_FACING
 BRW_POLYGON_FACING_*
 BRW_STATELESS_BUFFER_*

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:34 +00:00
Emil Velikov
7784b3c846 i965: remove unused brw_program.h include
Neither of the changed files requires the brw_program.h include. Since
we're about to move them [to src/intel/compiler] with the next commit
there's no point in having the include.

Let alone the very confusing compiler include directive
[-I${top_srcdir}/src/mesa/drivers/dri/i965/] that one would have to use.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:34 +00:00
Emil Velikov
c54c379b96 i965: remove duplicate declaration of brw_mark_surface_used
Function was made static and moved to another header with earlier
commit.

Fixes: 760c8a1d95 ("i965: Make mark_surface_used a static inline in brw_compiler.h")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:33 +00:00
Emil Velikov
b69a03e12a i965: remove dead brw_new_shader() declaration
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Fixes: 194537ebe4 ("mesa/glsl/i965: remove Driver.NewShader()")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:33 +00:00
Emil Velikov
a032002dc9 i965: remove unused brw_cs.h include
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:33 +00:00
Jason Ekstrand
e042f5fcbc anv: Stop including brw_context.h
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:33 +00:00
Jason Ekstrand
4ec5922afa intel/isl: Stop linking libi965_compiler.la into tests
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:33 +00:00
Jason Ekstrand
12f348bc98 vulkan/wsi: Generate wayland protocol headers separately from EGL
Previously, we were depending on EGL for generating the headers and
providing the protocol symbols. However, since neither Vulkan driver
actually wants to link against EGL, this is kind of pointless. It also
creates a weird build dependency.

v2 [Jason]
 - Add missing wsi/ prefix, MKDIR_GEN

v3 [Emil Velikov]
 - include BUILT_SOURCES/generation rules outside of conditional

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:33 +00:00
Emil Velikov
1d135e2561 radv/wsi: Don't include wayland headers
Unused and we'll rework the way wayland-drm-client-protocol.h is
generated with later commit.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Dave Airlie <airlied@redhat.com>
2017-03-13 11:16:30 +00:00
Jason Ekstrand
4ea9bbe1f6 anv/wsi: Don't include wayland headers
Unused and we'll rework the way wayland-drm-client-protocol.h is
generated with later commit.

v2 [Emil]
 - Also remove wayland-client.h

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:30 +00:00
Emil Velikov
d1042ef1dc configure.ac: provide a fall-back define for WAYLAND_SCANNER
In some cases, we can end up calling WAYLAND_SCANNER even when
there's no binary. Do follow the other's approach set by
AX_PROG_FLEX/BISON and set the variable to :

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:30 +00:00
Emil Velikov
c1b5ed853f wayland: move .gitignore where applicable
Strictly speaking things work as-is, but let's move the file alongside
the artefacts it references. Analogous to all other places in mesa.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:30 +00:00
Christian König
5369b5a91d st/va: add config support for 10bit decoding v2
Advertise 10bpp support if the driver supports decoding to a P016 surface.

v2: Advertise 10bpp for the decoder as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Mark Thompson <sw@jkqxz.net>
2017-03-13 08:51:44 +01:00
Christian König
e9d3e29bb3 st/va: add support for allocating 10bpp surfaces
We support P010 and P016 as targets for 10bpp video decoding.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
2017-03-13 08:51:41 +01:00
Christian König
e58a1e8f68 st/va: add support for P010 and P016 formats v3
No hardware I know off can actually support P010 natively. But we can easily
support P016 and as long as nobody decodes anything into the lower 6bits it
doesn't make any difference to P010.

v2: allow P0160 for post processing as well
v3: fix post processing once more

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
2017-03-13 08:51:38 +01:00
Christian König
f1d1deb015 st/va: clear the video surface on allocation
This makes debugging of decoding problems quite a bit easier.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
2017-03-13 08:51:35 +01:00
Christian König
1ce68af07b st/va: cleanup error handling in vlVaCreateSurfaces2
No need to have that twice.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
2017-03-13 08:51:32 +01:00
Christian König
88f3451083 radeon/uvd: enable 10bit HEVC decode v2
Just use whatever the state tracker allocated.

v2: fix msb mode

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
2017-03-13 08:51:29 +01:00
Christian König
3e1e441aa0 radeon/UVD: fix the decoding target pitch calculation
The firmware expects the value in pixel not bytes. Didn't made a difference
so far because we only used 8bpp surfaces.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
2017-03-13 08:51:25 +01:00
Christian König
cee591a224 vl/video_buffer: add support for P016
Just simply the description of the planes.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
2017-03-13 08:51:22 +01:00
Christian König
62cff79378 gallium: add P016 format
Same layout as NV12, but 16bit per channel instead of 8.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
2017-03-13 08:51:07 +01:00
Kenneth Graunke
920ab07566 i965: Delete unused last_ring local.
Dead since 071d80bde2, and causing
warnings.
2017-03-12 22:57:46 -07:00
Bas Nieuwenhuizen
7c282b3ca1 radv: Store shaders in VRAM.
Less IFETCH latency on misses. Shader code is write once read many,
so GTT doesn't make much sense anyway.

If it turns out to fragment the CPU visible VRAM too much, we can upload with SDMA.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-13 02:14:29 +01:00
Dave Airlie
e27fdbcb4c radv/ac: move to new image intrinsics.
This hooks up radv to the new image intrinsic builders.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-13 09:44:53 +10:00
Dave Airlie
3b49cee8fa radv: disabled scaled formats for transfers.
These really are only supported for vertex buffers.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-13 09:36:49 +10:00
Timothy Arceri
13d69a8519 util/u_queue: make u_queue accessible to cpp
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-13 09:50:26 +11:00
Timothy Arceri
df1d5fc442 glsl: don't use ralloc for blob creation
There is no need to use ralloc here.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-13 09:50:19 +11:00
Timothy Arceri
ca76a2ba1b gallium/util: replace pipe_thread_setname() with u_thread_setname()
They do the same thing we just moved the function to be
accessible to all of Mesa.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-12 17:49:04 +11:00
Timothy Arceri
14e6b86952 gallium/util: replace pipe_thread_get_time_nano() with u_thread_get_time_nano()
They do the same thing we just moved the function to be
accessible to all of Mesa.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-12 17:49:04 +11:00
Timothy Arceri
f8cc4c25b8 gallium/util: replace pipe_thread_create() with u_thread_create()
They do the same thing we just moved the function to be
accessible to all of Mesa.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-12 17:49:04 +11:00
Timothy Arceri
b822d9dd67 gallium/util: move u_queue.{c,h} to src/util
This will allow us to use it outside of gallium for things like
compressing shader cache entries.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-12 17:49:03 +11:00
Timothy Arceri
04ec4db8b5 gallium/util: make use of new u_thread.h in u_queue.{c,h}
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-12 17:49:03 +11:00
Timothy Arceri
fbfe887253 util: add u_thread.h
This is a minimal copy of os_thread.h from gallium in order to move
u_queue.{c,h} to this directory.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-12 17:49:03 +11:00
Timothy Arceri
a3b820308b gallium/util: use standard malloc/calloc/free in u_queue.c
This will help us moving the file to the shared src/util dir.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-12 17:49:03 +11:00
Timothy Arceri
94a6457724 gallium/util: move u_string.h to src/util/u_string.h
This will help us move u_queue.c here eventually and also provide
string function wrappers for anyone wishing to port disk_cache.c
to windows.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-12 17:43:06 +11:00
Timothy Arceri
d55d1e9805 gallium/util: remove unused header from u_string.h
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-12 17:43:06 +11:00
Timothy Arceri
ff8aad66bd gallium/util: remove unused util_strbuf*
Looks like they have been unused since 2008 b8a7eef242.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-12 17:43:06 +11:00
Timothy Arceri
b4b1dcb2c1 gallium/util: remove unused util_memmove()
This is not used anywhere and Visual Studio looks to have
supported memmove() for a long time if not always.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-12 17:43:06 +11:00
Timothy Arceri
b607aad8e1 glsl: don't recompile a shader on fallback unless needed
Because we optimistically skip compiling shaders if we have seen them
before we may need to compile them later at link time if they haven't
yet been use in a specific combination to create a program.

Rather than always recompiling we take advantage of the
gl_compile_status enum introduced in the previous patch to only
compile when we have previously skipped compilation.

This helps with regressions in app start-up times on cold cache
runs, compared with no cache.

Deus Ex: Mankind Divided start-up times:

cache disabled:               ~3m15s
cold cache master:            ~4m23s
cold cache with this patch:   ~3m33s

Acked-by: Marek Olšák <marek.olsak@amd.com>
2017-03-12 17:26:08 +11:00
Timothy Arceri
bfa95997c4 mesa/glsl: introduce new gl_compile_status enum
This will allow us to tell if a shader really has been compiled or
if the shader cache has just seen it before.

Acked-by: Marek Olšák <marek.olsak@amd.com>
2017-03-12 17:24:40 +11:00
Matt Turner
3d253d330a i965: Initialize compaction tables in unit test.
Fixes: fa4b792e83 "i965: Move brw_init_compaction_tables() to brw_create_compiler()."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100154
2017-03-10 23:16:39 -08:00
Matt Turner
fa4b792e83 i965: Move brw_init_compaction_tables() to brw_create_compiler().
... so that we can avoid threading complications or unnecessary
compaction table initializations (which just consists of setting some
pointers based on devinfo->gen).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-10 17:58:11 -08:00
Emil Velikov
32be87852b bin/get-fixes-pick-list.sh: do not mandate bash
Silly thinko on my end, as I was writing the script. There is nothing
bash specific in there.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:49 +00:00
Emil Velikov
0e94217999 bin/shortlog_mesa.sh: remove the final bashism
Remove the typeset built-in and toggle to /bin/sh

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:48 +00:00
Emil Velikov
3aa5f51c27 bin/bugzilla_mesa.sh: rework the looping method
We don't use DRYRUN (and no others scripts have one) so just drop it.

This allows us to rework the loop to the more commonly used "git .... |
while read foo; do ... done"

That in itself gets rid of the only remaining bashism and we can toggle
the shebang to /bin/sh.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:48 +00:00
Emil Velikov
1c3a1d74ec wayland-egl/wayland-egl-symbols-check: do not mandate bash
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:48 +00:00
Emil Velikov
f7e7708d75 gbm/gbm-symbols-check: do not mandate bash
Analogous to previous commit.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:48 +00:00
Emil Velikov
5a0e4f4837 egl/egl-symbols-check: do not mandate bash
There's nothing bash specific in the script.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:48 +00:00
Emil Velikov
a3782f2b7a glsl/tests: remove any bashisms
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:48 +00:00
Emil Velikov
05c1d6d564 dri: use correct shebang for gen-symbol-redefs.py
This is a python2 script and the generic "python" may point to python3.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:48 +00:00
Emil Velikov
fb187d2232 util: remove shebang from format_srgb.py
Analogous to earlier commit(s).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:48 +00:00
Emil Velikov
2e8c683f5e xmlpool: remove shebang from gen_xmlpool.py
Analogous to earlier commit(s).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:48 +00:00
Emil Velikov
6d9ad29451 genxml: remove shebang from gen_pack_header.py
Analogous to earlier commit(s).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:48 +00:00
Emil Velikov
e4c7911150 nir: remove shebang from python scripts
Analogous to earlier commit(s).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:47 +00:00
Emil Velikov
a497f44645 st/xa: suffix xa-indent{,.sh} and add a shebang line
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:47 +00:00
Emil Velikov
c79c54ae34 gallium/tools: use correct shebang for python scripts
These are python2 scripts and the generic "python" may point to
python3.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:47 +00:00
Emil Velikov
e7b01d9fc8 gallium/tools: do not hardcode bash location
It is not guaranteed to be in /bin

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:47 +00:00
Emil Velikov
6f341b9dfd gallium/tests: remove execute bit from TGSI shader - vert-uadd.sh
Just like the the dozens of other shaders, the file is parsed by
separate tool and not executed.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:47 +00:00
Emil Velikov
68c38b2431 mapi/gen: remove shebang from python scripts
All of those should be executed $PYTHON2/python2 [or equivalent] hence
why they are missing the execute bit.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:47 +00:00
Emil Velikov
9a502f5c47 mapi: do not mandate bash for es*api/ABI-check
Seemingly there is nothing bash specific in these. The Debian
checkbashisms does not spot neither run in zsh.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:47 +00:00
Emil Velikov
d73603fcdd bin/perf-annotate-jit: add .py suffix
To provide direct feedback about the file in question.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:47 +00:00
Emil Velikov
f03e7af7b9 i965: remove shebang from brw_nir_trig_workarounds.py
Analogous to earlier commit(s).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:47 +00:00
Emil Velikov
1a39f3187c i965: remove execute bit from brw_nir_trig_workarounds.py
Analogous to earlier commit(s).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:47 +00:00
Emil Velikov
be4ce4937e mesa: remove shebang from python scripts
Analogous to earlier commit(s).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:46 +00:00
Emil Velikov
d2af6f6ee0 mesa: remove execute bit from main/format_parser.py
Analogous to earlier commit(s).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:46 +00:00
Emil Velikov
a1d186cb70 amd: remove shebang from python scripts
Analogous to earlier commit(s).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:46 +00:00
Emil Velikov
f6180a5ab7 amd: remove execute bit from python scripts
Analogous to earlier commit(s).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:46 +00:00
Emil Velikov
168d801149 gallium: remove shebang from python scripts
Analogous to earlier commit(s).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:46 +00:00
Emil Velikov
2ea1ce2701 gallium: remove execute bit from the python script(s)
Analogous to earlier commit(s).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:46 +00:00
Emil Velikov
5d15fe446d svga: remove shebang from svgadump/svga_dump.py
Analogous to earlier commit(s).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:46 +00:00
Emil Velikov
bd9bb86bc3 svga: remove execute bit from svga_dump.py
The file is used to generate svgadump/svga_dump.c... in theory at least.
Atm. the file is checked in-tree but that is about to change later
commits.

As we get to that we'll use $PYTHON2 or equivalent as used throughout
the tree.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:46 +00:00
Emil Velikov
cc9533c53f freedreno: remove shebang from ir3_nir_trig.py
Analogous to earlier commit(s).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:46 +00:00
Emil Velikov
55ffbbf571 freedreno: remove execute bit from ir3_nir_trig.py
The file is meant to be called with $(PYTHON2) and not executed
directly.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:46 +00:00
Emil Velikov
56e58e01e4 glsl: remove shebang from python scripts
All of the scripts are [must be] executed via $PYTHON2 [or equivalent]
hence why they are missing the execute bit.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:46 +00:00
Emil Velikov
eca18d440d glsl/tests: remove execute bit from compare_ir python script
Nearly all the python scripts used in-tree are invoked via $PYTHON2 or
equivalent. As such having the execute bit not needed and generally
ill-advised.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:45 +00:00
Emil Velikov
7473fcd40b glsl/tests: suffix .sh/.py files as applicable
This makes it easier/clearer as to:
 - if the file should have the execute bit set (.py should not)
 - do we need the shebang in the first place and if so what it should be

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:45 +00:00
Emil Velikov
32d153c428 mesa: drop the execute bit from gl.xml
This is a spec file which is parsed by scripts.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:45 +00:00
Emil Velikov
45a37c98e7 mapi/glapi: remove unused next_available_offset.sh
Afaict there was no [documented] users since it was introduced.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:45 +00:00
Ben Widawsky
2ee34bd5dc gbm: Export a per plane getter for offset
Unlike stride, there was no previous offset getter, so it can be right
on the first try.

v2: Return EINVAL when plane is greater than total planes to make it
match the similar APIs.
Avoid leak after fromPlanar (Daniel)
Make sure when getting offsets we consider dumb images (Daniel)

v3: Use Jason's recommendation for handling the non-planar case.

v4: Return int64_t so we can get real errors

v5: Add an assertion for dumb BOs (Jason)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Stone <daniels@collabora.com>
2017-03-09 15:35:44 -08:00
Ben Widawsky
7f6209e46f gbm: Export a per plane getter for stride
v2: Preserve legacy behavior when plane is 0 (Jason Ekstrand)
EINVAL when input plane is greater than total planes (Jason Ekstrand)
Don't leak the image after fromPlanar (Daniel)
Move bo->image check below plane count preventing bad index succeeding (Daniel)

v3: Fix DRIimage leak (using Jason's recommended change)
Make plane 0 return planar stride. This might break legacy behavior (Jason)

v4: Move bogus hunk for get_handle_for_plane to the right patch (Jason)
Fix error handling path to be cleaner (Jason)

v5: Add assert for dumb BOs to make sure plane == 0 (Jason)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Stone <daniels@collabora.com>
2017-03-09 15:35:44 -08:00
Ben Widawsky
ed4cf2440d gbm: Create a gbm_device getter for stride
This will be used so we can query information per plane.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Stone <daniels@collabora.com>
2017-03-09 15:35:44 -08:00
Ben Widawsky
f9567ab435 gbm: Export a getter for per plane handles
v2: Make the error return be -1 instead of 0 because I think 0 is
actually valid.

v3: Set errno to EINVAL when the specified plane is above the total
planes. (Jason Ekstrand)
Return the bo's handle if there is no image ie. for dumb images like cursor (Daniel)

v4:
- Add assertions about plane == 0 (Jason)
- Add a comment about new restriction on planar dumb bo which is not an
earlier patch in the series.
- Correctly refactor from v2 in this patch; it ended up rebased into the
wrong patch.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Stone <daniels@collabora.com>
2017-03-09 15:35:44 -08:00
Ben Widawsky
42eacddfc0 gbm: Export a plane getter function
This will be used by clients that need to know the number of planes
allocated for them on behalf of the GL or other API. The best current
example of this is when an extra "plane" is allocated to store
compression data for the primary plane.

v2: Return 1 for cases where there is no image, ie. dumb bo (Daniel)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Stone <daniels@collabora.com>
2017-03-09 15:35:44 -08:00
Ben Widawsky
770b06588f gbm: Explicitly disallow a planar dumb BO
As more GBM functionality support planes is being evaluated, it becomes
clear that a dumb bo can never actually be planar. It's questionable
whether it was ever feasible to do this, and later functionality will
implicitly assume a dumb BO is non-planar.

v2: Include stdbool.h

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Daniel Stone <daniels@collabora.com>
2017-03-09 15:35:44 -08:00
Anuj Phogat
29e2ba0756 i965: Rename brw_format_for_mesa_format() to brw_isl_format_for_mesa_format()
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-09 09:47:30 -08:00
Robert Bragg
a678b79ef4 i965: Add more Haswell OA metrics sets
This extends the brw_oa_hsw.xml to expose these additional queries:

- Compute Metrics Basic Gen7.5
- Compute Metrics Extended Gen7.5
- Memory Reads Distribution Gen7.5
- Memory Writes Distribution Gen7.5
- Metric set Sampler Balance

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-09 13:45:51 +00:00
Robert Bragg
458468c136 i965: Expose OA counters via INTEL_performance_query
This adds support for exposing basic Observation Architecture
performance counters on Haswell.

This support is based on the i915 perf kernel interface which is used
to configure the OA unit, allowing Mesa to emit MI_REPORT_PERF_COUNT
commands around queries to collect counter snapshots.

To take into account the small chance that some of the 32bit counters
could wrap around for long queries (~50 milliseconds for a GT3 Haswell @
1.1GHz) the implementation also collects periodic metrics.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-09 13:45:50 +00:00
Robert Bragg
a98ffe2477 exec_list: Add a foreach_list_typed_from macro
This allows iterating list nodes from a given start point instead of
necessarily the list head.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-09 13:45:50 +00:00
Robert Bragg
e56550565e i965: Add script to gen code for OA counter queries
Avoiding lots of error prone boilerplate and easing our ability to add +
maintain support for multiple OA performance counter queries for each
generation:

This adds a python script to generate code for building up
performance_queries from the metric sets and counters described in
brw_oa_hsw.xml as well as functions to normalize each counter based on
the RPN expressions given.

Although the XML file currently only includes a single metric set, the
code generated assumes there could be many sets.

The metrics as described in XML get translated into C structures
which are registered in a brw->perfquery.oa_metrics_table hash table
keyed by the GUID of the metric set in XML.

v2: numerous python style improvements (Dylan)
v3: Makefile.am fixups (Emil)
v4: Pattern rule for codegen + orthogonal .c and .h rules (Robert)

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-09 13:45:44 +00:00
Robert Bragg
f46e58e018 i965: extend query/counter structs for OA queries
In preparation for generating code from brw_oa_hsw.xml for describing OA
performance counter queries this adds some OA specific members to
brw_perf_query that our generated code will initialize:

- The oa_metric_set_id is the ID we will pass to
  DRM_IOCTL_I915_PERF_OPEN, and is an ID got via sysfs under:
  /sys/class/drm/<card>/metrics/<guid/id

- The oa_format is the OA report layout we will request from the kernel

- The accumulator offsets determine where the different groups of A, B
  and C counters are located within an intermediate 64bit 'accumulator'
  buffer.

Additionally brw_perf_query_counter now has 64bit or float _read()
callback members for OA counters.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-09 12:53:07 +00:00
Robert Bragg
eaab41c9db i965: brw_context.h additions for OA unit query codegen
In preparation for generating code from the XML performance counter meta
data, this makes some additions to brw_context.h for this code to be
able to reference.

It adds a brw->perfquery.oa_metrics_table hash table for indexing built
up query descriptions by the GUID that is expected to be advertised by
the kernel (via sysfs) to be able to use that query.

It adds an 'OA_COUNTERS' brw_query_kind to be assigned to queries built
up by generated code.

It adds a brw->perfquery.sys_vars structure to have a consistent place
to represent the different system variables like $EuCoresTotalCount and
$EuSlicesTotalCount that are referenced by OA counter normalization
equations.

  Although extending + referencing gen_device_info for these variables
  was considered, these are some of the (mostly minor) reasons for
  going with a dedicated structure:

  - Currently we only need this info for the performance_query backend
    and it might be a bit tedious to go back and initialize the state
    for pre-Haswell devinfo structures.
  - Considering the $SubsliceMask then the requirement for how multiple
    per-slice masks are packed only comes from how the variables are
    references by availability tests in XML, and might not be a good
    general representation for tracking subslice masks if another use
    case arises.
  - If we used gen_device_info then we'd likely want to avoid making
    assumptions about the C types during codegen and adding explicit
    casts, while that's not necessary with a dedicated struct with all
    members being uint64_t.
  - This structure and the code for initializing it is currently shared
    (just through copy & paste) with a few other projects dealing with
    OA counters, and that's been convenient so far.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-09 12:53:07 +00:00
Robert Bragg
b79268174b i965: XML description of Haswell OA metric set
In preparation for exposing Gen Observation Architecture performance
counters via INTEL_performance_query this adds an XML description for an
initial 'Render Metrics Basic Gen7.5' query and corresponding counters.

The intention is to auto generate code for building a query from these
counters as well as the code for normalizing the individual counters.

Note that the upstream for this XML data is currently GPU Top:

  https://github.com/rib/gputop

The files are maintained under gputop-data/ and they are themselves
derived from files in an internal 'MDAPI XML' schema. There are scripts
under gputop-scripts/ and make rules in gputop-data/Makefile.xml for
maintaining these files.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-09 12:53:07 +00:00
Pierre Moreau
655c395f65 nv50/ir: check for origin insn in findOriginForTestWithZero
Function arguments do not have an "origin" instruction, causing a
NULL-pointer dereference without this check.

Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-03-09 12:42:46 +01:00
Samuel Pitoiset
d54b498694 mesa/main: make use of lookup_samplerobj_locked()
There is no need to check sampler == 0 twice. This removes now
unused _mesa_lookup_samplerobj_locked().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-03-09 11:01:37 +01:00
Samuel Pitoiset
58b4ae0411 mesa/main: inline {begin,end}_samplerobj_lookups()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-03-09 11:01:31 +01:00
Grazvydas Ignotas
8cd83a6c81 glsl/blob: clear padding bytes
Since blob is intended for serializing data, it's not a good idea to
leave padding holes with uninitialized data, which may leak heap
contents and hurt compression if the blob is later compressed, like
done by shader cache. Clear it.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-09 20:41:02 +11:00
Grazvydas Ignotas
61bbb25a08 util/disk_cache: fix size subtraction on 32bit
Negating size_t on 32bit produces a 32bit result. This was effectively
adding values close to UINT_MAX to the cache size (the files are usually
small) instead of intended subtraction.
Fixes 'make check' disk_cache failures on 32bit.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-09 20:26:30 +11:00
Grazvydas Ignotas
926bcacfd3 util/disk_cache: fix compressed size calculation
It incorrectly doubles the size on each iteration.

Fixes: 85a9b1b5 "util/disk_cache: compress individual cache entries"

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-09 20:26:23 +11:00
Lionel Landwerlin
f81ede4699 glsl: builtin: always return clones of the builtins
Builtins are created once and allocated using their own private ralloc
context. When reparenting IR that includes builtins, we might be steal
bits of builtins. This is problematic because these builtins might now
be freed when the shader that includes then last is disposed. This
might also lead to inconsistent ralloc trees/lists if shaders are
created on multiple threads.

Rather than including builtins directly into a shader's IR, we should
include clones of them in the ralloc context of the shader that
requires them. This fixes double free issues we've been seeing when
running shader-db on a big multicore (72 threads) server.

v2: Also rename _mesa_glsl_find_builtin_function_by_name() to better
    reflect how this function is used. (Ken)

v3: Rename ctx to mem_ctx (Ken)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-09 08:30:36 +00:00
Kenneth Graunke
071d80bde2 i965: Delete render ring prelude.
This was a hook I came up when trying to do the initial performance
counter work years ago.  Nothing's used it for a long time, and the
upcoming performance counter support doesn't want it either.

So, goodbye render ring prelude.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-03-08 23:01:21 -08:00
Vinson Lee
d64ded7b50 swr: s/uint/enum pipe_render_cond_flag/
Fix build error.

swr_context.cpp: In function ‘void swr_blit(pipe_context*, const pipe_blit_info*)’:
swr_context.cpp:336:44: error: invalid conversion from ‘uint {aka unsigned int}’ to ‘pipe_render_cond_flag’ [-fpermissive]
                                       ctx->render_cond_mode);
                                       ~~~~~^~~~~~~~~~~~~~~~

Fixes: b0d3938430 ("gallium: s/uint/enum pipe_render_cond_flag/ for set_render_condition()")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100133
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-03-08 21:43:07 -08:00
Bas Nieuwenhuizen
7d6e1a341a radv: Don't flush the CB before doing a fast clear eliminate.
The only way we write CMASK/DCC compressed textures through shaders
is fast clears and CMASK/DCC inits, which have their own flushes.
Hence the CB cache is always up to date.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-09 02:35:28 +01:00
Bas Nieuwenhuizen
8700329785 radv: Don't emit cache flushes on subpass switch.
I think we should only flush right before an action (draw/dispatch etc.),
as otherwise it is too easy to issue redundant flushes.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-09 02:35:23 +01:00
Bas Nieuwenhuizen
9251f8b35e radv: Only flush for the needed stages, and before the flushes.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-09 02:35:19 +01:00
Bas Nieuwenhuizen
f92a118434 radv: Don't invalidate CB/DB for images that aren't modified outside CB/DB.
Without stores, the only writes are fast clears, transfers and metadata
initialization, each of which have the appropiate invalidations already.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-09 02:35:14 +01:00
Bas Nieuwenhuizen
0567ab0407 radv: Flush more caches after writes.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-09 02:35:10 +01:00
Bas Nieuwenhuizen
7a600bbc81 radv: Don't flush for fixed-function reading.
The data should always be in memory after a src flush.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-09 02:35:05 +01:00
Bas Nieuwenhuizen
dd094e4ff9 radv: Invalidate the correct caches for CB/DB dst barriers.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-09 02:35:01 +01:00
Bas Nieuwenhuizen
b075eb7d47 radv: Determine cache flushes per object.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-09 02:34:42 +01:00
Samuel Pitoiset
2568d9d0cd mesa/main: remove unused _mesa_new_texture_image()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2017-03-09 01:57:20 +01:00
Dave Airlie
e6902be900 radv/ac: fixup texture coord to have right number of channels.
Jason has patches to add validation to this area, this should fix
radv shaders.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-09 09:17:11 +10:00
Timothy Arceri
0e34966340 st/nine: pass NULL to ureg_get_tokens()
The number of tokens in never used and the pointer is NULL checked
so just pass NULL.

Reviewed-by: Axel Davy <axel.davy@ens.fr>
2017-03-09 09:29:07 +11:00
Matt Turner
a45cd8107d docs: ARB_shader_atomic_counter_ops is enabled on i965/gen7+.
This extension was enabled in commit 40dd45d0c6 ("i965: Enable
ARB_shader_atomic_counter_ops") but the commit failed to update the
release notes or features.txt. The release notes ship has sailed, since
the commit was in 13.0.
2017-03-08 13:58:52 -08:00
Eric Anholt
19f571ba6d vc4: Fix math with a condition flag set.
Math results land in r4, regardless of the condition.  To implement them,
we just need to ensure that the results are moved out of r4 (as often
happens anyway, the values is live across another math instruction), so
that we can attach the condition to the MOV.

Fixes dEQP-GLES2.functional.shaders.random.all_features.fragment.93 and a
couple others, that were assertion failing that their conditions hadn't
been handled during the QIR->QPU stage.
2017-03-08 13:44:17 -08:00
Eric Anholt
615f6653b0 vc4: Fix register pressure cost estimates when a src appears twice.
This ended up confusing the scheduler for things like fabs (implemented as
fmaxabs x, x) or squaring a number, and it would try to avoid scheduling
them because it appeared more expensive than other instructions.

Fixes failure to register allocate in
dEQP-GLES2.functional.uniform_api.random.3 with almost no shader-db
effects (+.35% max temps)
2017-03-08 13:44:17 -08:00
Eric Anholt
0fca01d027 vc4: Report to shader-db how many threads a fragment shader has.
Doing instruction count analysis when we emit the thread switches that
will save us from tons of stalls is kind of missing the point.
2017-03-08 13:44:17 -08:00
Eric Anholt
61359324c1 Revert "vc4: Lazily emit our FS/VS input loads."
This reverts commit 292c24ddac.  It broke a
lot of GLES2 deqp, and I see at least one problem that will require some
serious rework to fix.
2017-03-08 13:44:17 -08:00
Marek Olšák
ab12a126fd radeonsi: fix elimination of literal VS outputs
broken when switched to the new intrinsics.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-03-08 19:56:36 +01:00
Fabio Estevam
78c5772633 loader: Move non-error message to debug level
Currently when running mesa on imx6 the following loader warnings
are seen:

# kmscube -D /dev/dri/card1
MESA-LOADER: device is not located on the PCI bus
MESA-LOADER: device is not located on the PCI bus
MESA-LOADER: device is not located on the PCI bus
Using display 0x1920948 with EGL version 1.4

As this is not an error message, change it to debug level in
order to have a cleaner log output.

Signed-off-by: Fabio Estevam <festevam@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-08 16:35:00 +00:00
Mauro Rossi
61c38d14b7 android: r600: fix libmesa_amd_common dependency
Adding libmesa_amd_common dependency and exporting its headers,
avoids the following building error:

external/mesa/src/gallium/drivers/r600/evergreen_compute.c:29:10: fatal error: 'ac_binary.h' file not found
         ^
1 error generated.

Fixes: 3bbbb63 "automake: r600: radeonsi: correctly manage libamd_common.la linking"
Fixes: 503fb13 "radeon/ac: switch to ac_shader_binary_config_start()"

v2 [Emil Velikov: drop unneeded LOCAL_EXPORT_C_INCLUDE_DIRS]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-08 16:27:23 +00:00
Emil Velikov
1fe4d638a1 gallium/targets: rework the empty targets removal
Earlier commit added extra tracking and we've attempted to remove the
vdpau/other folder if empty. V2 of said commit dropped the pipe
to /dev/null and the explicit "true" override.

Sadly both of those are needed since there's no guarantee that the
folder will be empty before we [mesa] make install.

Since we're bringing those two back, there's no need to track if we've
installed anything, and simply do "rm -d foo/ &>/dev/null || true"

Tested-by: Andy Furniss <adf.lists@gmail.com>
Reported-by: Andy Furniss <adf.lists@gmail.com>
Fixes: 1cd4fde053 ("gallium/targets: don't leave an empty target directory(ies)")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-08 16:23:07 +00:00
Brian Paul
2f3f5728f7 util/indices: minor clean-ups
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-08 08:50:21 -07:00
Brian Paul
a0927da006 radeonsi: s/uint/enum pipe_shader_type/
This can probably be done in more places in the driver.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-08 08:50:20 -07:00
Brian Paul
b0d3938430 gallium: s/uint/enum pipe_render_cond_flag/ for set_render_condition()
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-08 08:50:20 -07:00
Brian Paul
2b9ab605aa gallium: s/uint/enum pipe_shader_type/ for set_constant_buffer()
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-08 08:50:20 -07:00
Brian Paul
73bafb5ee3 gallium: s/unsigned/enum pipe_shader_type/ for get_compiler_options()
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-08 08:50:20 -07:00
Brian Paul
1564a768ae virgl: s/unsigned/enum pipe_shader_type/
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-08 08:50:20 -07:00
Brian Paul
6614b060fb swr: s/unsigned/enum pipe_shader_type/
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-08 08:50:20 -07:00
Brian Paul
f676c700cc softpipe: s/unsigned/enum pipe_shader_type/
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-08 08:50:20 -07:00
Brian Paul
0fc5110a6e llvmpipe: s/unsigned/enum pipe_shader_type/
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-08 08:50:20 -07:00
Brian Paul
4aec68176d freedreno: s/unsigned/enum pipe_shader_type/
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-08 08:50:20 -07:00
Brian Paul
7532ed106f etnaviv: s/unsigned/enum pipe_shader_type/
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-08 08:50:20 -07:00
Brian Paul
b4191b712b draw: s/unsigned/enum pipe_shader_type/
and some s/uint/enum pipe_shader_type/

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-08 08:50:20 -07:00
Brian Paul
ed66c9d7b8 cso: s/unsigned/enum pipe_shader_type/
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-08 08:50:20 -07:00
Brian Paul
637e5719b5 gallium: s/unsigned/enum pipe_shader_type/ for pipe_screen::get_shader_param()
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-08 08:50:20 -07:00
Tapani Pälli
db5f9c3177 anv: change BLOCK_POOL_MEMFD_SIZE to exactly 2GB
This is what comment above definition says and change fixes issue with
32bit build where BLOCK_POOL_MEMFD_SIZE is used as ftruncate parameter
and constant currently gets converted from 4294967296 to 0.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-08 07:57:55 +02:00
Matt Turner
58b69eedd3 Revert "configure.ac: Use PKG_CHECK_VAR for wayland-scanner."
This reverts commit 8a26e94439.
2017-03-07 21:24:05 -08:00
Matt Turner
0b361f9d35 Revert "configure.ac: Use PKG_CHECK_VAR for libclc."
This reverts commit 706074cc96.
2017-03-07 21:24:05 -08:00
Chris Wilson
05520ba490 i965: Remove use of deprecated drm_intel_aub routines
With mesa/drm commit cd2f91e18db087edf93fed828e568ee53b887860
Author: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Date:   Fri Jul 31 10:47:50 2015 -0700

    intel: Drop aub dumping functionality

the drm_intel_aub routines are mere stubs and do nothing. Likewise
remove our invocations.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-07 16:40:03 -08:00
Jason Ekstrand
4483c5d57c spirv: Silence unused variable warnings in release mode
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-03-07 15:22:16 -08:00
Jason Ekstrand
0421813588 anv: Make the framebuffer-renderpass format assert non-fatal
This should let Dota 2 run on debug builds though it will spew errors
like mad.  Hopefully, Valve will get this fixed sooner rather than
later.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-07 15:22:16 -08:00
Jason Ekstrand
33301d949f anv: Drop the anv_validate block helper
Over the course of driver development, we've come up with a number of
different schemes for adding giant blocks of asserts inside the driver.
This one is only being used once in anv_pipeline.c and the way it's
being used actually generates compiler warnings in release builds.  This
commit drops the anv_validate macro and just puts the contents of the
one validation function in side of a "#ifdef DEBUG" guard.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-07 15:22:16 -08:00
Jason Ekstrand
a316d8f406 anv: Get rid of the stub() macros
Except for a few unimplemented things on gen7, we don't really have
stubs anymore so we should drop this.  This commit replaces the few gen7
stub() calls with explicitly labeled finishme's and makes the sparse
binding stuff silently no-op or return a FEATURE_NOT_PRESENT error.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-07 15:22:16 -08:00
Jason Ekstrand
1488d079cb anv: Remove a pointless finishme
We've been supporting multiple shaders per module for some time now.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-07 15:22:16 -08:00
Jason Ekstrand
1a43792783 anv: Convert the HiZ finishme's to perf_warn
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-07 15:22:16 -08:00
Jason Ekstrand
201fc83df7 anv: Add a performance warning helper
This acts identically to anv_finishme except that it only dumps out
these nice log messages if you run with INTEL_DEBUG=perf.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-07 15:22:16 -08:00
Timothy Arceri
20234cfe3a st/mesa: don't propagate uniforms when restoring from cache
We will have already loaded the uniforms when the parameter list
was restored from cache.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-08 09:45:48 +11:00
Damien Grassart
e25c92a72d radv: remove duplicate initialization of alphaToOne feature
Fixes a GCC warning when compiling with -Wextra:
radv_device.c:463:47: warning: initialized field overwritten [-Woverride-init]

Signed-off-by: Damien Grassart <damien@grassart.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-08 06:00:34 +10:00
Dave Airlie
d81bd2f754 radv: disable mip point pre clamping.
No idea what this does, but disabling it fixes a bunch
of failing CTS tests in the lod area, so let's go with that.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-08 05:50:46 +10:00
Fredrik Höglund
162beb2abb radv/ac: fix multiple descriptor sets with dynamic buffers
The dynamic_offset_offset in the descriptor set binding layout is
relative to the dynamic_offset_start for the set in the pipeline
layout.

Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-03-07 20:23:32 +01:00
Fredrik Höglund
71bb1a9c3c radv: fix the size of the dynamic_buffers array
A buffer descriptor is 16 bytes, not 16 dwords.

Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-03-07 20:23:26 +01:00
Fredrik Höglund
0941d1a574 radv: fix the dynamic buffer index in vkCmdBindDescriptorSets
This fixes the wrong dynamic buffer descriptors being updated when
firstSet > 0.

Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-03-07 20:23:04 +01:00
Matt Turner
69063d0561 configure.ac: Ensure libomxil-bellagio exists before invoking pkg-config.
I was already tired of seeing the message

    Package libomxil-bellagio was not found in the pkg-config search path.
    Perhaps you should add the directory containing `libomxil-bellagio.pc'
    to the PKG_CONFIG_PATH environment variable
    No package 'libomxil-bellagio' found

on every configure, but I just got a distro bug reported where the user
was confused by this message and thought it indicated a bug.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-07 07:27:45 -08:00
Matt Turner
86c023f973 configure.ac: Ensure libva is enabled before invoking pkg-config.
PKG_CHECK_VAR can only check --variable=$NAME, so it cannot handle
modversion.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-07 07:27:45 -08:00
Matt Turner
706074cc96 configure.ac: Use PKG_CHECK_VAR for libclc.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-07 07:27:45 -08:00
Matt Turner
8a26e94439 configure.ac: Use PKG_CHECK_VAR for wayland-scanner.
Available since pkg-config-0.28 and pkgconf-0.8.10.

The removal of the AC_PATH_PROG is intentional. Use pkg-config.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-07 07:27:45 -08:00
Matt Turner
f73903f09b configure.ac: Fix error message in radeon_llvm_check().
It printed the version of LLVM ($1):

   configure: error: 3.6.0 requires libelf when using llvm

instead of the driver name ($2):

   configure: error: r600 requires libelf when using llvm

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
2017-03-07 07:27:45 -08:00
Matt Turner
e457e6abec build: Replace NEED_RADEON_LLVM with HAVE_GALLIUM_LLVM.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-07 07:27:45 -08:00
Bas Nieuwenhuizen
6424795f52 radv: Use the subresource range in HTILE initialization.
v2: fix levelCount assert.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-07 09:58:33 +01:00
Bas Nieuwenhuizen
3b455c1cb7 radv: Use winsys HTILE info.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-07 09:58:27 +01:00
Bas Nieuwenhuizen
dbecbab5aa radv/amdgpu: Let addrlib calculate the HTILE parameters.
Still not sure we can support miptrees when sampling from
HTILE enabled textures.

Added the tcCompatible winsys stuff while I'm at it.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-07 09:58:21 +01:00
Dave Airlie
03f5405fc2 amd/common: document PREDICATION OP 3 as 64-bit bool.
This just documents some info for possible future use.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-07 15:20:01 +10:00
Dave Airlie
b26249781e radv: handle z offset for 3d image <-> buffer copies.
This fixes:
dEQP-VK.pipeline.render_to_image.3d.huge.depth.r8g8b8a8_unorm

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-07 04:02:00 +00:00
Dave Airlie
c5947e9787 radv: move fast clear before resolve into own loop.
Don't fast clear inside the meta loop as things get
confused, fixes a crash in:
dEQP-VK.api.copy_and_blit.resolve_image.whole_array_image.2_bit

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-07 04:01:53 +00:00
Bas Nieuwenhuizen
0ab2dd361f radv: Disable HTILE for textures with multiple layers/levels.
It has issues and the fix I'm working on is too complicated for stable,
so disable for now.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
CC: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
2017-03-06 23:58:57 +01:00
Dave Airlie
6bae1e44a9 radv: Properly handle destroying NULL devices and instances
Ported from anv:
3d33a23e anv: Properly handle destroying NULL devices and instances

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-07 08:17:03 +10:00
Dave Airlie
5c45d2051a radv/ac: introduce i1true/i1false to context.
This uses these in a few places, and fixes one or two
cases which were using da as 32-bit instead of bool.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-07 08:17:03 +10:00
Dave Airlie
ca884aef86 radv/ac: handle Z export using new builder.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-07 08:17:03 +10:00
Dave Airlie
bf2be50774 radv/ac: move to using common ac_get_image_intr_name.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-07 08:17:03 +10:00
Dave Airlie
10ae83a9c2 radeonsi/ac: move get_image_intr_name to common
This code is used in radv, so move to common build code.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-07 08:17:03 +10:00
Timothy Arceri
7eb85b8204 gallium/util: remove unused header from u_queue.c
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-07 09:12:16 +11:00
Timothy Arceri
60a2c2507d gallium/util: remove unused pipe_thread_destroy()
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-07 09:12:16 +11:00
Timothy Arceri
d82d8be614 gallium/util: replace pipe_thread_wait() with thrd_join()
Replace done using:
find ./src -type f -exec sed -i -- \
's:pipe_thread_wait(\([^)]*\)):thrd_join(\1, NULL):g' {} \;

Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-07 09:12:16 +11:00
Timothy Arceri
da40ac65c7 gallium/util: remove PIPE_THREAD_ROUTINE()
This was made unnecessary with fd33a6bcd7.

This was mostly done with:
find ./src -type f -exec sed -i -- \
's:PIPE_THREAD_ROUTINE(\([^,]*\), \([^)]*\)):int\n\1(void \*\2):g' {} \;

With some small manual tidy ups.

Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-07 09:12:16 +11:00
Timothy Arceri
e92293a601 gallium/util: replace pipe_condvar with cnd_t
pipe_condvar was made unnecessary with fd33a6bcd7.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-07 09:07:33 +11:00
Timothy Arceri
e5375ba028 gallium/util: replace pipe_thread with thrd_t
pipe_thread was made unnecessary with fd33a6bcd7.

V2: fix compile error in u_queue.c

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-07 08:53:27 +11:00
Timothy Arceri
628e84a58f gallium/util: replace pipe_mutex_unlock() with mtx_unlock()
pipe_mutex_unlock() was made unnecessary with fd33a6bcd7.

Replaced using:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_unlock(\([^)]*\)):mtx_unlock(\&\1):g' {} \;

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-07 08:53:05 +11:00
Timothy Arceri
ba72554f3e gallium/util: replace pipe_mutex_lock() with mtx_lock()
replace pipe_mutex_lock() was made unnecessary with fd33a6bcd7.

Replaced using:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_lock(\([^)]*\)):mtx_lock(\&\1):g' {} \;

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-07 08:52:38 +11:00
Timothy Arceri
be188289e1 gallium/util: replace pipe_mutex_destroy() with mtx_destroy()
pipe_mutex_destroy() was made unnecessary with fd33a6bcd7.

Replace was done with:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_destroy(\([^)]*\)):mtx_destroy(\&\1):g' {} \;

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-07 08:52:16 +11:00
Timothy Arceri
75b47dda0c gallium/util: replace pipe_mutex_init() with mtx_init()
pipe_mutex_init() was made unnecessary with fd33a6bcd7.

Replace was done using:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_init(\([^)]*\)):(void) mtx_init(\&\1, mtx_plain):g' {} \;

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-07 08:52:07 +11:00
Timothy Arceri
acdcaf9be4 gallium/util: remove pipe_static_mutex()
This was made unnecessary with fd33a6bcd7.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-07 08:48:16 +11:00
Timothy Arceri
2efddc63ee gallium/util: replace pipe_mutex with mtx_t
pipe_mutex was made unnecessary with fd33a6bcd7.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-07 08:48:11 +11:00
Timothy Arceri
464d4806c1 gallium/util: replace pipe_condvar_broadcast() with cnd_broadcast()
pipe_condvar_broadcast() was made unnecessary with fd33a6bcd7.

Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-07 08:23:26 +11:00
Timothy Arceri
5e56c2c79d gallium/util: replace pipe_condvar_signal() with cnd_signal()
pipe_condvar_signal() was made unnecessary with fd33a6bcd7.

Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-07 08:23:26 +11:00
Timothy Arceri
74c879ac75 gallium/util: replace pipe_condvar_wait() with cnd_wait()
pipe_condvar_wait() was made unnecessary with fd33a6bcd7.

Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-07 08:23:26 +11:00
Timothy Arceri
1e0314281a gallium/util: replace pipe_condvar_destroy() with cnd_destroy()
pipe_condvar_destroy() was made unnecessary with fd33a6bcd7.

Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-07 08:23:26 +11:00
Timothy Arceri
3f58242863 gallium/util: replace pipe_condvar_init() with cnd_init()
pipe_condvar_init() was made unnecessary with fd33a6bcd7.

Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-07 08:23:26 +11:00
Marek Olšák
63d7a12fad st/dri: reduce dri_fill_st_options() params
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2017-03-07 08:16:46 +11:00
Marek Olšák
696c5115b9 st/dri: use local pointer to st_context_iface
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2017-03-07 08:16:39 +11:00
Gregory Hainaut
2ab5eccf5d glapi: fix typo in count_scale
2*4=8

Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-03-07 08:11:40 +11:00
Kenneth Graunke
7782936cbc i965: Return NULL from initScreen2, not false.
This returns a pointer, not a boolean.  No actual effect, but cleaner.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-06 12:38:15 -08:00
Kenneth Graunke
b5b123ac8f i965: Make a devinfo local variable.
screen->devinfo.gen is annoying to type and linewrap.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-06 12:38:15 -08:00
Kenneth Graunke
951f56cd43 i965: Delete vestiges of resource streamer code.
We never actually used the resource streamer in any shipping build
of Mesa.  We have no plans to do so in the future.  We looked into
using it in Vulkan, and concluded that it was unusable.  We're not
the only ones to arrive at the conclusion that it's not worth using.

So, drop the last vestiges of resource streamer support and move on.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-06 12:38:15 -08:00
Kenneth Graunke
4dc785728a i965: Drop duplicate #defines now that we've bumped libdrm requirements.
We've updated our libdrm requirement, and it will already provide these.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-06 12:38:15 -08:00
Samuel Pitoiset
4317cd96d3 getteximage: fix _mesa_GetTextureSubImage()
Oops.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100088
Fixes: 5ae54c0cf7 ("getteximage: avoid to lookup textures with id 0")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-06 21:36:56 +01:00
Grazvydas Ignotas
ff494fe999 ralloc: don't leave out the alignment factor
Experimentation shows that without alignment factor gcc and clang choose
a factor of 16 even on IA-32, which doesn't match what malloc() uses (8).
The problem is it makes gcc assume the pointer is 16 byte aligned, so
with -O3 it starts using aligned SSE instructions that later fault,
so always specify a suitable alignment factor.

Cc: Jonas Pfeil <pfeiljonas@gmx.de>
Fixes: cd2b55e5 "ralloc: Make sure ralloc() allocations match malloc()'s alignment."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100049
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Tested by: Mike Lothian <mike@fireburn.co.uk>
Tested by: Jonas Pfeil <pfeiljonas@gmx.de>
2017-03-06 11:28:48 -08:00
Grazvydas Ignotas
b384c23b9e i965: don't require 64bit cmpxchg
There are still some distributions trying to support unfortunate people
with old or exotic CPUs that don't have 64bit atomic operations. The
only thing preventing compile of the Intel driver for them seems to be
initialization of a debug variable.

v2: use call_once() instead of unsafe code, as suggested by Matt Turner

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93089
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
2017-03-06 11:07:20 -08:00
Alex Smith
290d7e892d radv: Emit pending flushes before executing a secondary command buffer
If we have any pending flushes on the primary command buffer, these
must be performed before executing the secondary buffer.

This fixes potential corruption when the contents of a subpass which
clears any of its render targets are given in a secondary buffer: the
flushes after a fast clear would not have been performed until the
vkCmdEndRenderPass call.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
2017-03-06 19:46:14 +01:00
Samuel Pitoiset
052c81faa1 mesa/main: remove useless check in _mesa_IsSampler()
_mesa_lookup_samplerobj() returns NULL if sampler is 0.

v2: use _mesa_lookup...(...) != NULL

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-06 18:01:38 +01:00
Samuel Pitoiset
5ae54c0cf7 getteximage: avoid to lookup textures with id 0
This fixes the following assertion when the key is 0.

main/hash.c:181: _mesa_HashLookup_unlocked: Assertion `key' failed.

Fixes: 633c959fae ("getteximage: Return correct error value when texure object is not found")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-06 18:01:38 +01:00
Marek Olšák
5ac6ab701f docs/relnotes/17.1.0: document the new LLVM requirement 2017-03-06 17:35:36 +01:00
Marek Olšák
c416d8a3bc gallium/radeon: don't monitor SDMA busyness on EG/Cayman/SI
It's always busy.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99955

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-06 14:13:04 +01:00
Marek Olšák
7e1faa79d3 radeonsi: drop support for LLVM 3.6 & 3.7
They are too old.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-06 14:13:04 +01:00
Marek Olšák
d5d74fe2b5 radeonsi: set the convergent attribute where needed
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-06 14:13:04 +01:00
Marek Olšák
ef883fc554 gallivm,ac: add LP_FUNC_ATTR_CONVERGENT
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-06 14:13:04 +01:00
Marek Olšák
9b08f044be radeonsi: fix LLVM 3.9 - don't use non-matching attributes on declarations
Call site attributes are used since LLVM 4.0.

This also reverts commit b19caecbd6
"radeon/ac: fix intrinsic version check", because this is the correct fix.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-06 14:13:04 +01:00
Mark Thompson
6398a09213 st/omx: Set end-of-frame flag on bitstream output buffers
Since all output buffers are whole frames, this should always be set.

Technically, setting this flag is is optional (see OpenMAX IL section
3.1.2.7.1), but some clients assume that it will be used and
therefore buffer indefinitely thinking that all output buffers are
fragments of the first frame when it is not set.

Signed-off-by: Mark Thompson <sw@jkqxz.net>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-03-06 14:05:43 +01:00
Mark Thompson
6d95358aac st/omx: Fix port format enumeration
From OpenMAX IL section 4.3.5:
"The value of nIndex is the range 0 to N-1, where N is the number of
formats supported by the port.  There is no need for the port to
report N, as the caller can determine N by enumerating all the
formats supported by the port.  Each port shall support at least one
format.  If there are no more formats, OMX_GetParameter returns
OMX_ErrorNoMore (i.e., nIndex is supplied where the value is N or
greater)."

Only one format is supported, so N = 1 and OMX_ErrorNoMore should be
returned if nIndex >= 1.  The previous code here would return the
same format for all values of nIndex, resulting in an infinite loop
when a client attempts to enumerate all formats.

Signed-off-by: Mark Thompson <sw@jkqxz.net>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-03-06 14:05:17 +01:00
Mark Thompson
0798fddb50 st/va: Fix forward/backward referencing for deinterlacing
The VAAPI documentation is not very clear here, but the intent
appears to be that a forward reference is forward from a frame in the
past, not forward to a frame in the future (that is, forward as in
forward prediction, not as in a forward reference in source code).
This interpretation is derived from other implementations, in
particular the i965 driver and the gstreamer client.

In order to match those other implementations, this patch swaps the
meaning of forward and backward references as they currently appear
for motion-adaptive deinterlacing.

Signed-off-by: Mark Thompson <sw@jkqxz.net>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-03-06 14:05:05 +01:00
Mark Thompson
c93a157078 st/va: Support fractional framerate in misc parameter
Signed-off-by: Mark Thompson <sw@jkqxz.net>
Acked-by: Christian König <christian.koenig@amd.com>
2017-03-06 14:04:59 +01:00
Andy Furniss
012b6d3fe7 st/va encode handle ntsc framerate rate control
Tested with ffmpeg and gst-vaapi. Without this bits per
frame is set way too low for fractional framerates.

v2: Mark Thompson: simplify calculation.
    Use float.

Signed-off-by: Andy Furniss <adf.lists@gmail.com>
Acked-by: Christian König <christian.koenig@amd.com>
2017-03-06 14:04:24 +01:00
Bas Nieuwenhuizen
f3dc318464 radv: Use the new L2 writeback flag.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-06 09:16:05 +01:00
Bas Nieuwenhuizen
66e12d4073 radv: Add L2 writeback.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-06 09:15:51 +01:00
Timothy Arceri
6b657cecd5 util/disk_cache: fix make check
Fixes make check after 11f0efec2e which caused disk cache
to create an additional directory.
2017-03-06 16:39:55 +11:00
Dave Airlie
2e73ccb485 radv/ac: use bitfield extract new intrinsics.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-06 15:27:33 +10:00
Dave Airlie
9c7309b09b radv/ac: move to new kill build.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-06 15:27:33 +10:00
Dave Airlie
a2652719f3 radv/ac: move to using new export intrinsics.
This uses the new code in build to do exports.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-06 15:27:33 +10:00
Dave Airlie
2830ece0fc radv/ac: switch to new intrinsics for pkrtz and clamp.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-06 15:27:32 +10:00
Dave Airlie
cc59e24a6b radv: drop Z24 support.
This isn't exposed in -pro, the hw docs say it is deprecated,
so let's not bother with it.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-05 23:32:36 +00:00
Grazvydas Ignotas
6aaadd8728 radv: use VK_NULL_HANDLE for handles
Avoids warnings on 32bit.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-03-06 00:10:42 +01:00
Grazvydas Ignotas
a5446e3187 radv: check for upload alloc failure
Mainly to avoid gcc's complains about uninitialized ptr and offset use
later in that code.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-03-06 00:10:42 +01:00
Grazvydas Ignotas
666fe622e1 radv: don't use uninitialized value on failure
Mainly to avoid a warning.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-03-06 00:10:42 +01:00
Grazvydas Ignotas
5458b02305 radv: avoid casting warnings on 32bit
Use the same helpers as for other handle<->pointer conversions.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-03-06 00:10:42 +01:00
Bas Nieuwenhuizen
fb7e4e16e7 radv/amdgpu: Add some debug flags.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-06 00:10:23 +01:00
Bas Nieuwenhuizen
682248db45 radv: Cache command buffers in command pool.
So that we don't keep allocating BOs for the IBs and upload buffers.

We run some risk of memory increase with e.g. a bimodal size
distribution of command buffers, but I haven't noticed a significant
increase with dota2 and talos.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-06 00:07:51 +01:00
Timothy Arceri
e3a01a5d1b Revert "glsl: Switch to disable-by-default for the GLSL shader cache"
This reverts commit 0f60c6616e.

Piglit and all games tested so far seem to be working without
issue. This change will allow wide user testing and we can decided
before the next release if we need to turn it off again.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-06 09:38:07 +11:00
Timothy Arceri
ee8d2e2804 docs: update envvars.html to reflect having a cache per arch 2017-03-06 09:33:20 +11:00
Timothy Arceri
11f0efec2e util/disk_cache: support caches for multiple architectures
Previously we were deleting the entire cache if a user switched
between 32 and 64 bit applications.

V2: make the check more generic, it should now work with any
platform we are likely to support.

V3: Use suggestion from Emil to make even more generic/fix issue
with __ILP32__ not being declared on gcc for regular 32-bit builds.

Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-03-06 09:27:01 +11:00
Grazvydas Ignotas
175d4aa8f5 util/disk_cache: mark read-only arguments const
No functional changes.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-06 09:23:17 +11:00
Dave Airlie
b19caecbd6 radeon/ac: fix intrinsic version check
Reported-by: 375gnu@gmail.com
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100068

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-06 06:05:58 +10:00
Bas Nieuwenhuizen
a247215469 radv: Merge fast clear flushes.
Don't flush multiple times if we clear multiple attachments. Also allows
doing the depth clear in parallel with the fast color clears.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-05 20:40:31 +01:00
Tim Rowley
a01a104216 relnotes: [swr] note addition of gs, increased llvm requirement
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-05 07:33:49 -06:00
Tim Rowley
bb8a4242ff docs: update features.txt for swr geometry shaders
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-05 07:33:49 -06:00
Tim Rowley
c307092557 swr: [rasterizer core] fix primID provoking vertex for GS
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-05 07:33:49 -06:00
Tim Rowley
f1d7284117 swr: implement geometry shaders
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-05 07:33:49 -06:00
Tim Rowley
08a82363ba configure.ac: increase required swr llvm to 3.9.0
GS implementation uses the masked.{gather,store} intrinsics,
introduced in llvm-3.9.0.  swr llvm version requirement in
automake and scons now match (scons already needed >= 3.9).

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-05 07:33:49 -06:00
Kenneth Graunke
6f71d9adc1 i965: Clamp texture buffer size to GL_MAX_TEXTURE_BUFFER_SIZE.
The OpenGL 4.5 specification's description of TexBuffer says:

"The number of texels in the texture image is then clamped to an
 implementation-dependent limit, the value of MAX_TEXTURE_BUFFER_SIZE."

We set GL_MAX_TEXTURE_BUFFER_SIZE to 2^27.  For buffers with a byte
element size, this is the maximum possible size we can encode in
SURFACE_STATE.  If you bind a buffer object larger than this as a
texture buffer object, we'll exceed that limit and hit an isl assert:

   assert(num_elements <= (1ull << 27));

To fix this, clamp the size in bytes to MaxTextureSize / texel_size.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-03-04 22:46:50 -08:00
Emil Velikov
eaf4a106bd automake: move wayland-drm prior to Vulkan
Earlier commit was picked from a larger series, but did not consider
that it removed the vulkan <> wayland-drm interdependency.

Rather than reverting everything, temporarily move wayland-drm further
up to resolve the issue. Since it [wayland-drm] does not have any
in-mesa dependencies that's perfectly safe.

Cc: Vedran Miletić <vedran@miletic.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100060
Fixes: e135ce6f08 ("vulkan: Build common Vulkan code earlier")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Javier Jardón <jjardon@gnome.org>
2017-03-04 23:44:14 +00:00
Mauro Rossi
6facb0c08f android: fix libz dynamic library dependencies
Fixes a series of libz related building errors:

target SharedLib: gallium_dri_32
(out/target/prod...SHARED_LIBRARIES/gallium_dri_intermediates/LINKED/gallium_dri.so)
external/elfutils/libelf/elf_compress.c:117: error: undefined reference to 'deflateInit_'
...
external/elfutils/libelf/elf_compress.c:244: error: undefined reference to 'inflateEnd'
clang++: error: linker command failed with exit code 1 (use -v to see
invocation)

Fixes: 85a9b1b "util/disk_cache: compress individual cache entries"
2017-03-04 21:47:26 +00:00
Timothy Arceri
28fd6556c3 svga: pass NULL to ureg_get_tokens()
The number of tokens in never used and the pointer is NULL checked
so just pass NULL.

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-03-05 08:15:51 +11:00
Ilia Mirkin
8e6d67685e nvc0: take extra pushbuf space into account for pushbuf_space calls
See detailed explanation of why this is needed in commit eb60a89bc3.
This spot was missed/overlooked. Basically as a result of the fact
that BEGIN_* ends up calling PUSH_SPACE, which in turn adds an extra 8
to the requested amount, we have to be mindful of that when doing bare
nouveau_pushbuf_space calls.

Reportedly this fixes some crashes when replaying a hitman trace taken
on radeonsi.

Fixes: eb60a89bc3 ("nouveau: take extra push space into account for pushbuf_space calls")
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Reported-by: Karol Herbst <nouveau@karolherbst.de>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-03-04 17:48:27 +01:00
Ilia Mirkin
32dd8d59b6 nvc0: increase alignment to 256 for texture buffers on fermi
When binding as textures, the alignment can be 16. However when binding
as an image, the address has to be aligned to 256. (Also when binding as
an RT, but that can't happen with GL or current gallium APIs.)

Reported-by: Roy Spliet <nouveau@spliet.org>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-03-04 17:48:27 +01:00
Tapani Pälli
66b62be4bb android: fix outdir for gen_enum_to_str files
when files are being generated the value of $intermediates var content can be
completely random, this makes sure that outdir is the wanted one.

Fixes: 3f2cb699 ("android: vulkan: add support for libmesa_vulkan_util")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-04 16:38:33 +00:00
Xiaosong Wei
2acc69da8c EGL/Android: Add EGL_EXT_buffer_age extension
This patch implements the EGL_EXT_buffer_age extension for Android.
https://www.khronos.org/registry/EGL/extensions/EXT/EGL_EXT_buffer_age.txt

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-04 16:37:12 +00:00
Emil Velikov
2b1e22f9d8 docs: add news item and link release notes for 17.0.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-04 15:56:58 +00:00
Emil Velikov
1b19304f3f docs: add sha256 checksums for 17.0.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 5c9273152c)
2017-03-04 15:55:10 +00:00
Emil Velikov
6a4f6a49d4 docs: add release notes for 17.0.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 8fee1d348c)
2017-03-04 15:55:09 +00:00
Emil Velikov
1cd4fde053 gallium/targets: don't leave an empty target directory(ies)
Some drivers do not support certain targets - for example nouveau
doesn't do VAAPI, while freedreno doesn't do of the video backends.

As such if we enter vdpau when building freedreno/ilo/etc, a vdpau/
folder will be created, empty library will be build and almost
immediately removed. Thus keeping an empty vdpau/ folder around.

There are two ways to fix this.

 * add substantial tracking in configure/makefiles so that we never end
up in targets/vdpau
 Downsides:
Error prone, as the configure checks and the 'include
gallium/drivers/foo/Automake.inc' can easily get out of sync.

 * remove the folder, if empty, alongside the empty library.
 Downsides:
In the latter case vdpau/ might be empty before the mesa build has
started, yet we'll remove it either way.

This patch implements the latter option, as the downside isn't that
significant, plus the patch is way shorter ;-)

v2: use has_drivers to track since TARGET_DRIVERS can contain space,
hence neither string comparison nor -n/-z works correctly.

Gentoo Bugzilla: https://bugs.gentoo.org/545230
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-03-04 15:26:43 +00:00
Emil Velikov
342e5fdb64 radv: use enum_to_str util functions.
Port of e9dcb17962
vulkan/util: Add generator for enum_to_str functions

Cc: Bas Nieuwenhuizen <basni@google.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-04 15:05:14 +00:00
Jason Ekstrand
e135ce6f08 vulkan: Build common Vulkan code earlier
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-04 14:46:53 +00:00
Jason Ekstrand
b3135c3cf3 anv: Advertise shaderInt64 on Broadwell and above
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-03-03 13:59:29 -08:00
Jason Ekstrand
bc456749bd nir/int64: Properly handle imod/irem
The previous implementation was fine for GLSL which doesn't really have
a signed modulus/remainder.  They just leave the behavior undefined
whenever either source is negative.  However, in SPIR-V, there is a
defined behavior for negative arguments.  This commit beefs up the pass
so that it handles both correctly.  Tested using a hacked up version of
the Vulkan CTS test to get 64-bit support.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-03-03 13:59:27 -08:00
Jason Ekstrand
9745bef308 nir/builder: Add an int64 immediate helper
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-03-03 13:59:24 -08:00
Kenneth Graunke
46cd549c2b genxml: Fill out Gen4 and G45 XML.
This is a work in progress - some things may still need fixing.
But it should be in pretty decent shape.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-03 10:23:17 -08:00
Marek Olšák
7f1446a8a1 ac: normalize build helper names
s/emit/build/

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 17:30:07 +01:00
Marek Olšák
8bde7fb3fc ac: replace SI.vs.load.input with amdgcn.buffer.load.format
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 17:30:07 +01:00
Marek Olšák
94811dc66c radeonsi: move SI.vs.load.input building into amd/common
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 17:30:07 +01:00
Marek Olšák
52660484c1 radeonsi: detect and mark loads/stores from read-only/write-only memory 2017-03-03 17:29:56 +01:00
Marek Olšák
97e21cfa25 ac: replace llvm.SI.tbuffer.store with llvm.amdgcn.buffer.store if ADD_TID=0
ADD_TID doesn't work. Needs more investigation.

v2: remove leftover dead code

Reviewed-by: Dave Airlie <airlied@redhat.com> (v1)
2017-03-03 15:29:30 +01:00
Marek Olšák
684339827c radeonsi: use the writeonly LLVM attribute 2017-03-03 15:29:30 +01:00
Marek Olšák
8cfdbba6c7 ac: remove offen parameter from ac_build_buffer_store_dword
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 15:29:30 +01:00
Marek Olšák
1bc88c02c0 radeonsi: enable TC L2 for tessellation offchip stores
Vulkan does the same thing.
2017-03-03 15:29:30 +01:00
Marek Olšák
27439dfdae radeonsi: merge and simplify tbuffer_store functions
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 15:29:30 +01:00
Marek Olšák
b46e412c2e radeonsi: set noalias on input shader pointers 2017-03-03 15:29:30 +01:00
Marek Olšák
d4324ddb89 radeonsi: replace AMDGPU.bfe.* with amdgcn.*bfe
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 15:29:30 +01:00
Marek Olšák
9c09592086 radeonsi: move kill intrinsic building into amd/common
just a cleanup

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 15:29:30 +01:00
Marek Olšák
e729dc7c46 radeonsi: set readnone on reads from read-only memory 2017-03-03 15:29:30 +01:00
Marek Olšák
25c7969a5a radeonsi: replace SI.buffer.load.dword with amdgcn.buffer.load 2017-03-03 15:29:30 +01:00
Marek Olšák
653ac0b389 radeonsi: replace SI.packf16 with amdgcn.cvt.pkrtz 2017-03-03 15:29:30 +01:00
Marek Olšák
4b2e5b9389 ac: replace old image intrinsics with new ones
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 15:29:30 +01:00
Marek Olšák
c6a3911e5d radeonsi: remove last use of llvm.SI.resinfo
and move one function up to reuse the code.
2017-03-03 15:29:30 +01:00
Marek Olšák
ad18d7f040 radeonsi: move image intrinsic building to amd/common
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 15:29:30 +01:00
Marek Olšák
2b3ebe307c ac: replace SI.export with amdgcn.exp.*
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 15:29:30 +01:00
Marek Olšák
369f4a8726 radeonsi: move llvm.SI.export building to amd/common
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 15:29:30 +01:00
Marek Olšák
9af03318aa ac: unify build_type_name_for_intr functions
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 15:29:30 +01:00
Marek Olšák
f8c823b103 radeonsi: set unorm=1 for TGSI_TEXTURE_SHADOWRECT as well
It was harmless, because we also set unorm in the sampler state.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 15:29:30 +01:00
Marek Olšák
b5744310d4 gallivm, ac: add writeonly and inaccessiblememonly attributes
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 15:29:30 +01:00
Marek Olšák
455c79b24f tgsi/scan: record load/store/atomic image usage
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-03 15:29:30 +01:00
Eric Anholt
3958c01762 glapi: Fix a comment typo
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-03 20:29:12 +11:00
Alejandro Piñeiro
a54f0ad6d3 mesa/main: *TextureSubImage* generates INVALID_OPERATION on wrong target
Equivalent *TexSubImage* methods generates INVALID_ENUM.

From OpenGL 4.5 spec, section 8.6 Alternate Texture Image
Specification Commands:

   "An INVALID_ENUM error is generated by *TexSubImage* if target does
    not match the command, as shown in table 8.15."

And:

   "An INVALID_OPERATION error is generated by *TextureSubImage* if
    the effective target of texture does not match the command, as
    shown in table 8.15."

Fixes:
GL45-CTS.direct_state_access.textures_copy_errors

v2: slightly change commit summary (Samuel)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-03-03 08:14:53 +01:00
Ben Widawsky
d844d8e4d5 i965: Add Kaby Lake brandstrings
While here, use the spacing defined in Ark.
https://ark.intel.com/products/codename/82879/Kaby-Lake

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2017-03-02 21:00:02 -08:00
Grazvydas Ignotas
4dc42ae792 tgsi/ureg: return correct token count in ureg_get_tokens
Valgrind reports that the shader cache writes uninitialized data to disk.
Turns out ureg_get_tokens() is returning the count of allocated tokens
instead of how many are actually used, so the cache writes out unused
space at the end. Use the real count instead.

This change should not cause regressions elsewhere because the only
ureg_get_tokens() user that cares about token count is the shader cache.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-03 12:11:55 +11:00
Timothy Arceri
6084855528 radeonsi: add support for an on-disk shader cache
V2:
- when loading from disk cache also binary insert into memory cache.
- check that the binary loaded from disk is the correct size. If not
  delete the cache item and skip loading from cache.

V3:
- remove unrequired variable

Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-03 12:09:08 +11:00
Timothy Arceri
85a9b1b562 util/disk_cache: compress individual cache entries
This reduces the cache size for Deus Ex from ~160M to ~30M for
radeonsi (these numbers differ from Grigori's results below
probably due to different graphics quality settings).

I'm also seeing the following improvements in minimum fps in the
Shadow of Mordor benchmark on an i5-6400 CPU@2.70GHz, with a HDD:

no-cache:                    ~10fps
with-cache-no-compression:   ~15fps
with-cache-and-compression:  ~20fps

Note: The with cache results are from the second run after closing
and opening the game to avoid the in-memory cache.

Since we mainly care about decompression I went with
Z_BEST_COMPRESSION as suggested on irc by Steinar H. Gunderson
who has benchmarked decompression speeds.

Grigori Goronzy provided the following stats for Deus Ex: Mankind
Divided start-up times on a Athlon X4 860k with a SSD:

No Cache                                 215 sec

Cold Cache zlib BEST_COMPRESSION         285 sec
Warm Cache zlib BEST_COMPRESSION         33 sec

Cold Cache zlib BEST_SPEED               264 sec
Warm Cache zlib BEST_SPEED               33 sec

Cold Cache no compression                266 sec
Warm Cache no compression                34 sec

The total cache size for that game is 48 MiB with BEST_COMPRESSION,
56 MiB with BEST_SPEED and 170 MiB with no compression.

These numbers suggest that it may be ok to go with Z_BEST_SPEED
but we should gather some actual decompression times before doing
so. Other options might be to do the compression in a separate
thread, this might allow us to use a higher compression algorithim
such as LZMA.

Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Acked-by: Marek Olšák <marek.olsak@amd.com>
2017-03-03 12:09:08 +11:00
Timothy Arceri
5afde61752 util/disk_cache: add support for detecting corrupt cache entries
V2: fix pointer increments for writing/reading crc

Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
2017-03-03 12:09:08 +11:00
Samuel Pitoiset
9fc86d4f53 glsl: fix subroutine mismatch between declarations/definitions
Previously, when q.subroutine was set to 1, a new subroutine
declaration was added to the AST, while 0 meant a subroutine
definition has been detected by the parser.

Thus, setting the q.subroutine flag in both situations is
obviously wrong because a new type identifier is added instead
of trying to match the declaration. To fix it up, introduce
ast_type_qualifier::is_subroutine_decl() to differentiate
declarations and definitions easily.

This fixes a regression with:
arb_shader_subroutine/compiler/direct-call.vert

Cc: Mark Janes <mark.a.janes@intel.com>
Fixes: be8aa76afd ("glsl: remove unecessary flags.q.subroutine_def")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100026
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-03 00:57:57 +01:00
Matt Turner
10f2c86aa3 genxml: Depend on Makefile.am for generated sources.
Depending on the generated Makefile means that all generated sources are
recreated after ./configure.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-02 15:49:00 -08:00
Matt Turner
7d1195c1e4 clover: Work around build failure with AltiVec.
Bugzilla: https://bugs.gentoo.org/show_bug.cgi?id=587210
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68504
Acked-by: Francisco Jerez <currojerez@riseup.net>
2017-03-02 15:49:00 -08:00
Nanley Chery
d7d64f1091 anv/image: Allow HiZ on input attachment-capable depth/stencil images
While an input attachment may only take on one of those two layouts,
other depth/stencil attachments that use the same image may have
HiZ-enabled layouts. Improves the average frame rate on a release
candidate of a proprietary Vulkan benchmark by 9.94% over 3 runs on my
SKL GT4.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery
76b8cc2a1c anv/cmd_buffer: Centralize automatic layout transitions
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery
0a72b5f3cb anv/cmd_buffer: Add attachment transitioning functions
This is needed to transition input attachments.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery
9950774f8b anv/blorp: Encapsulate subpass id querying
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery
c78a959bcf anv/cmd_buffer: Enable render pass awareness
v2: Update cmd_state_reset (Jason Ekstrand)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery
c0223d052b anv/pass: Store subpass attachment reference list
We'll loop through this array when performing automatic layout
transitions.

v2: Adjust formatting of an assignment (Jason Ekstrand)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery
8f6a17c8e7 anv/pass: Fix size of anv_render_pass:subpass_attachments
Don't allocate space for resolve attachments if the subpass has none.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery
608d17b80e anv: Store the user's VkAttachmentReference
We will be using the image layout. Store the full struct directly from
the user.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery
6326f0f4be anv/cmd_buffer: Remove extra resolve for certain depth buffers
Due to recent commits, the sampler now bypasses the auxiliary HiZ buffer
when reading from a depth image subresource that is in the general
layout. Remove this unneeded resolve.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery
ea744912b3 anv/cmd_buffer: Conditionally choose the sampled image surface state
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery
5408d3fd05 anv/descriptor_set: Store aux usage of sampled image descriptors
v2: Rebase onto latest changes
v3: Account for NULL image_view in aux_usage assignment

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:55 -08:00
Nanley Chery
efc2222323 anv/image: Create an additional surface state for sampling
This will be used to sample a depth input attachment without having to
pass through the HiZ buffer.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:54 -08:00
Nanley Chery
f3621f4e71 anv/image: Simplify setup of HiZ sampler surface state
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:54 -08:00
Nanley Chery
258af3a856 anv/image: Remove extra dependency on HiZ-specific variable
surf_usage is only useful to image views that may use HiZ buffers.
Storage image views don't use HiZ buffers.

v2: Update commit message and add an assertion.

Fixes: 055ff2ec52 ("anv: Replace anv_image_has_hiz() with ISL_AUX_USAGE_HIZ")
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:54 -08:00
Nanley Chery
54d29ee65f anv: Update the HiZ sampling helper
Validate the inputs, verify that this image has a depth
buffer, use gen_device_info instead of

v2:
- Add parenthesis (Jason Ekstrand)
- Make parameters const
- Use gen_device_info instead of gen
- Pass aspect to missed function in transition_depth_buffer

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:54 -08:00
Nanley Chery
172747a963 anv/cmd_buffer: Replace layout_to_hiz_usage()
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:54 -08:00
Nanley Chery
425e33bcdb anv/image: Add anv_layout_to_aux_usage()
This function supersedes layout_to_hiz_usage().

v2:
- Don't find the optimal buffer for layout transitions (Jason Ekstrand).
- Pass the devinfo instead of the gen (Jason Ekstrand)
- Update the function documentation.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:54 -08:00
Nanley Chery
178f9e5f29 anv/pass: Avoid accessing attachment array out of bounds
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 13:17:54 -08:00
Jonas Pfeil
cd2b55e536 ralloc: Make sure ralloc() allocations match malloc()'s alignment.
The header of ralloc needs to be aligned, because the compiler assumes
that malloc returns will be aligned to 8/16 bytes depending on the
platform, leading to degraded performance or alignment faults with ralloc.

Fixes SIGBUS on Raspberry Pi at high optimization levels.

This patch is not perfect for MSVC, as maybe in the future the alignment
for the most demanding data type might change to more than 8.

v2: Commit message reword/typo fix, and add a bigger explanation in the
    code (by anholt)

Signed-off-by: Jonas Pfeil <pfeiljonas@gmx.de>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: mesa-stable@lists.freedesktop.org
2017-03-02 13:01:45 -08:00
Bruce Cherniak
a7b8d50bcb swr: fix crash in swr_update_derived following st/mesa state changes
Recent change to st/mesa state update logic caused major regressions to
swr validation code.

swr uses the same validation logic (swr_update_derived) for both draw
and Clear calls.  New st/mesa state update logic results in certain state
objects not being set/bound during Clear.  This was causing null ptr
exceptions.  Creation of static dummy state objects allows setting these
pointers during Clear validation, without interfering with relevant state
validation.

Once fixed, new logic also highlighted an error in dirty bit checking for
fragment shader and clip validation.

(The alternative is to have a simplified validation routine for Clear.
Which may do that at some point.)

Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2017-03-02 13:39:56 -06:00
Bruce Cherniak
74aa6fd9a0 docs: update features.txt for GL_ARB_clear_texture with swr
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2017-03-02 13:39:56 -06:00
Bruce Cherniak
dd649a541d swr: enable clear_texture with util_clear_texture
Passes corresponding piglit tests.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-02 13:39:52 -06:00
Gregory Hainaut
b36050143f doc: GL_ARB_buffer_storage is supported on llvmpipe/swr
At least, the extension is exported (gallium capability
PIPE_CAP_BUFFER_MAP_PERSISTENT_COHERENT is 1)

Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-02 17:31:04 +00:00
Emil Velikov
b23db2b840 automake: i965: list correct header in Makefile.source
Fixes: 7ac47b1af7 ("i965: Add a header for brw_vec4_vs_visitor")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-02 17:30:33 +00:00
Brian Paul
b95ead850b svga: fix crash regression since e027935a79
During the first update of the hw_clear_state atoms, we may not yet
have a current rasterizer state object.  So, svga->curr.rast may be
NULL and we crash.

Add a few null pointer checks to work around this.  Note that these
are only needed in the state update functions which are called for
'clear' validation.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-03-02 10:11:19 -07:00
Brian Paul
69fb8f3cae svga: s/unsigned/pipe_prim_type/
And add some default switch cases to silence compiler warnings.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-03-02 10:11:19 -07:00
Brian Paul
a9ff377d40 svga: whitespace fixes in svga_context.h
Trivial.
2017-03-02 10:11:13 -07:00
Brian Paul
49134c0549 svga: whitespace and formatting fixes in svga_stage.c
Trivial.
2017-03-02 10:11:04 -07:00
Robert Foss
88becf7302 mesa: Avoid read of uninitialized variable
The is_color_attachement variable is later read when handling two
separate error cases, where only one of the cases results in the
variable being initialized.

This can be avoided by giving the variable a safe default value.

Coverity-Id: 1398631
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-02 15:45:19 +00:00
Lionel Landwerlin
af5f13e58c anv: add VK_KHR_descriptor_update_template support
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 10:34:06 +00:00
Lionel Landwerlin
9f60ed98e5 anv: add VK_KHR_push_descriptor support
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 10:34:06 +00:00
Lionel Landwerlin
12dee851a3 anv: descriptor: make descriptor writing take a stream allocator
This allows us to allocate surface states from the command buffer when
pushing descriptor sets rather than allocating them through a
descriptor set pool.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 10:34:06 +00:00
Lionel Landwerlin
194fa58285 anv: descriptors: extract writing of descriptors elements
This will be reused later on.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 10:34:06 +00:00
Lionel Landwerlin
c2d199adec anv: make layout size computation helper available across compilation units
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 10:34:06 +00:00
Lionel Landwerlin
c83e33e6ee anv: move buffer_view declaration
We will need this declaration closer for readability later.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 10:34:06 +00:00
Tomasz Figa
06758c1e8a mesa: Use _mesa_has_OES_geometry_shader() when validating draws
In validate_DrawElements_common() we need to check for OES_geometry_shader
extension to determine if we should fail if transform feedback is
unpaused. However current code reads ctx->Extensions.OES_geometry_shader
directly, which does not take context version into account. This means
that if the context is GLES 3.0, which makes the OES_geometry_shader
inapplicable, we would not validate the draw properly. To fix it, let's
replace the check with a call to _mesa_has_OES_geometry_shader().

Fixes following dEQP tests on i965 with a GLES 3.0 context:

dEQP-GLES3.functional.negative_api.vertex_array#draw_elements
dEQP-GLES3.functional.negative_api.vertex_array#draw_elements_incomplete_primitive
dEQP-GLES3.functional.negative_api.vertex_array#draw_elements_instanced
dEQP-GLES3.functional.negative_api.vertex_array#draw_elements_instanced_incomplete_primitive
dEQP-GLES3.functional.negative_api.vertex_array#draw_range_elements
dEQP-GLES3.functional.negative_api.vertex_array#draw_range_elements_incomplete_primitive

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-03-02 00:37:17 -08:00
Kenneth Graunke
58793e514b i965: Replace BRW_SURFACEFORMAT_* with ISL_FORMAT_*.
One less set of enums.  Dropped the #defines from brw_defines.h and ran:

$ for file in *.cpp *.c *.h; do sed -i \
      -e 's/BRW_SURFACEFORMAT_/ISL_FORMAT_/g' \
      -e 's/ISL_FORMAT_ASTC_[A-Zxs0-9_]*/\U&/g' $file; \
  done

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 00:30:45 -08:00
Chris Wilson
92281b2c7f i965: Only flush the batchbuffer if we need to zero the SO offsets
If we don't have pipelined register access (e.g. Haswell before kernel
v4.2), then we can only implement EXT_transform_feedback by reseting the
SO offsets *between* batches. However, if we do have pipelined access to
the SO registers on gen7, we can simply emit an inline reset of the SO
registers without a full batch flush.

v2 [by Ken]: Simplify after recent kernel feature detection changes.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-02 00:30:41 -08:00
Iago Toral Quiroga
7ad692d8e2 anv: do not subtract the base layer to compute depth in 3DSTATE_DEPTH_BUFFER
According to the PRM description of the Depth field:

  "This field specifies the total number of levels for a volume texture
   or the number of array elements allowed to be accessed starting at the
   Minimum Array Element for arrayed surfaces"

However, ISL defines array_len as the length of the range
[base_array_layer, base_array_layer + array_len], so it already represents
a value relative to the base array layer like the hardware expects.

v2: Depth is defined as a U11-1 field, so subtract 1 from
    the actual value (Jason)

This fixes a number of new CTS tests that would crash otherwise:
dEQP-VK.pipeline.render_to_image.*

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 09:04:03 +01:00
Iago Toral Quiroga
64bf78270d isl: document the meaning of the array_len field in isl_view
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-02 09:03:42 +01:00
Jacob Lifshay
3d8feb38e8 vulkan/wsi: Improve the DRI3 error message
This commit improves the message by telling them that they could probably
enable DRI3.  More importantly, it includes a little heuristic to check
to see if we're running on AMD or NVIDIA's proprietary X11 drivers and,
if we are, doesn't emit the warning.  This way, users with both a discrete
card and Intel graphics don't get the warning when they're just running
on the discrete card.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99715
Co-authored-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Rene Lindsay <rjklindsay@hotmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Cc: "17.0" <mesa-dev@lists.freedesktop.org>
2017-03-01 19:11:47 -08:00
Jason Ekstrand
424ac809bf i965: Do int64 lowering in NIR
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-03-01 17:00:20 -08:00
Jason Ekstrand
074f5ba0b5 nir: Add a simple int64 lowering pass
The algorithms used by this pass, especially for division, are heavily
based on the work Ian Romanick did for the similar int64 lowering pass
in the GLSL compiler.

v2: Properly handle vectors

v3: Get rid of log2_denom stuff.  Since we're using bcsel, we do all the
    calculations anyway and this is just extra instructions.

v4:
 - Add back in the log2_denom stuff since it's needed for ensuring that
   the shifts don't overflow.
 - Rework the looping part of the pass to be easier to expand.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-03-01 17:00:20 -08:00
Jason Ekstrand
86e749b1ad spirv: Use nir_builder for control flow
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-03-01 17:00:20 -08:00
Jason Ekstrand
95972cd4fd nir/lower_indirect: Use nir_builder control-flow helpers
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-03-01 17:00:20 -08:00
Jason Ekstrand
3ce8eeb5a1 nir/lower_gs_intrinsics: Use nir_builder control-flow helpers
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-03-01 17:00:20 -08:00
Jason Ekstrand
c75f965ab7 glsl/nir: Use nir_builder's new control-flow helpers
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-03-01 17:00:20 -08:00
Jason Ekstrand
e27c716ad7 nir/builder: Add support for easily building control-flow
Each of the pop functions (and push_else) take a control flow parameter as
their second argument.  If NULL, it assumes that the builder is in a block
that's a direct child of the control-flow node you want to pop off the
virtual stack.  This is what 90% of consumers will want.  The SPIR-V pass,
however, is a bit more "creative" about how it walks the CFG and it needs
to be able to pop multiple levels at a time, hence the argument.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-03-01 17:00:20 -08:00
Jason Ekstrand
d5b355ce5f i965: Move intel_debug.h to intel/common/gen_debug.h
This is shared between the Vulkan and GL drivers as it's a requirement
of the back-end compiler.  However, it doesn't really belong in the
compiler.  We rename the file to match the prefix of the other stuff in
common and because libdrm defines an intel_debug.h and this avoids a
pile of possible name conflicts.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-03-01 16:14:03 -08:00
Jason Ekstrand
8048c1953c i965: Reduce cross-pollination between the DRI driver and compiler
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-01 16:14:03 -08:00
Jason Ekstrand
a2195e561a i965: Move select_clip_planes to brw_vs.c
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-03-01 16:14:03 -08:00
Jason Ekstrand
818bfdfa15 i965: Delete brw_do_cubemap_normalize
This hasn't been used for quite some time now but we never bothered to
get rid of it when we dropped GLSL IR support for vec4.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-01 16:14:03 -08:00
Jason Ekstrand
7ac47b1af7 i965: Add a header for brw_vec4_vs_visitor
brw_vs.h is not a compiler file but brw_vec4_visitor is definitely a
compiler thing.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-01 16:14:02 -08:00
Jason Ekstrand
1c318af743 i965: Move a bunch of pre-compile and link stuff to brw_program.h
It's all GL-specific and brw_program.h is not part of i965_compiler.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-01 16:14:02 -08:00
Jason Ekstrand
fbb9171968 i965: Move image uniform setup to brw_nir_uniforms.cpp
It's the only thing that's using it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-01 16:14:02 -08:00
Jason Ekstrand
820ae39725 i965: Move channel_expressions and vector_splitting to brw_program.h
They're GL-specific.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-01 16:14:02 -08:00
Jason Ekstrand
760c8a1d95 i965: Make mark_surface_used a static inline in brw_compiler.h
One of these days, I'd like to see this function go away all together
but for now, let's at least put it near the struct it updates.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-01 16:14:02 -08:00
Jason Ekstrand
f33d2b5d05 i965: Move BRW_ATTRIB_WA_* defines to brw_compiler.h
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-01 16:14:02 -08:00
Jason Ekstrand
4e274bcf66 i965: Move BRW_MAX_DRAW_BUFFERS to brw_compiler.h
It does sort-of go with MAX_UBO and friends but MAX_DRAW_BUFFERS is an
actual hardware constant based on the number of things we can blend
rather than an arbitrary "number of things allowed in GL" like some of
the other maximums are.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-01 16:14:02 -08:00
Jason Ekstrand
2523241660 i965/inst: Stop using fi_type
It's a mesa define that's trivial to inline.  This removes a dependence
on main/imports.h.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-01 16:14:02 -08:00
Jason Ekstrand
ffeb738112 i965: Move brw_register_blocks to brw_fs.cpp
Its one and only caller is brw_compile_fs which lives there.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-01 16:14:02 -08:00
Jason Ekstrand
5b87c7e0e3 i965: Move SHADER_TIME_STRIDE to brw_compiler.h
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-01 16:14:02 -08:00
Jason Ekstrand
f85ef11501 i965: Move SOL binding #defines to brw_compiler.h
While we're at it, we also change the GEN6 binding macro to be a start
index that gets added to the binding.  This makes things a bit more
explicit.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-01 16:14:02 -08:00
Jason Ekstrand
81e5bdf072 i964/gs: Move MAX_GS_INPUT_VERTICES to brw_vec4_gs_visitor.h
It's only users are in brw_vec4_gs_visitor and gen6_vec4_gs_visitor.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-01 16:14:01 -08:00
Jason Ekstrand
c6a719b64f i965/gs: Add the gl_prim_to_hw_prim table to vec4_gs_visitor.cpp
It's currently in brw_util.c but that's the only bit of brw_util.c
that's shared between the compiler and the rest of the GL driver.
It's just a fairly obvious table so the duplication isn't bad.  It's
certainly less pain than trying to figure out how to share the code.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-01 16:14:01 -08:00
Jason Ekstrand
035616cb8e i965: Don't use MAX_SURFACES in mark_surface_used
Vulkan doesn't respect MAX_SURFACES so this assert isn't valid in that
case.  It should, however, assert that it isn't insanely large.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-01 16:14:01 -08:00
Jason Ekstrand
0d2c9ce1ce i965: Get rid of BRW_PRIM_OFFSET
This is a relic of when we wired up meta to be able to use RECTLIST
primitives.  It's no longer needed.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-03-01 16:14:01 -08:00
Jason Ekstrand
406321caeb i965/vue_map: Stop using GLbitfield types
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-03-01 16:13:58 -08:00
Jason Ekstrand
45d3dbebb2 i965: Move assign_common_binding_table_offsets to brw_program
This isn't used by Vulkan and is specific to the way the GL driver
works.  There's no reason to have it in common compiler code.  Also, it
relies on BRW_MAX_* defines which are defined in brw_context.h

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-03-01 16:13:55 -08:00
Jason Ekstrand
8123402fd1 i965: Move some gen4 WM defines to brw_compiler.h
These go in wm_prog_key so they're part of the compiler interface.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-03-01 16:13:27 -08:00
Jason Ekstrand
34ede38194 i965: Move brw_disassemble_inst to brw_eu.h
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-03-01 16:13:26 -08:00
Jason Ekstrand
f9c9d551ea i965: Move some helpers from brw_context.h to brw_shader.h
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-03-01 16:13:24 -08:00
Jason Ekstrand
b97782c364 i965: Move a couple of #defines from brw_context to brw_compiler
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-03-01 16:13:09 -08:00
Jason Ekstrand
2c58709023 glsl/int64: Fix a typo in imod64
The zy swizzle gives us one component of quotient and one component of
remainder.  What we wanted was zw for the remainder.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-03-01 15:31:44 -08:00
Jason Ekstrand
e647c4fbd9 util/build-id: Return a pointer rather than copying the data
We're about to use the build-id as the starting point for another SHA1
hash in the Intel Vulkan driver, and returning a pointer is far more
convenient.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-03-01 15:31:44 -08:00
Jason Ekstrand
e3d33a23e6 anv: Properly handle destroying NULL devices and instances
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "17.0 13.0" <mesa-dev@lists.freedesktop.org>
2017-03-01 15:31:44 -08:00
Robert Bragg
f3ec9d33c6 mesa: Fix performance query id check
The queryid_valid() function asserts that an ID given by an application
isn't zero since the spec explicitly reserves an ID of zero as invalid.

The implementation was written as if the ID was a signed integer and
based on the assumption that queryid_to_index() is simply subtracting
one from the ID. It was broken because in fact the ID was stored in an
unsigned int and testing for an index >= 0 would always succeed.

This adds a spec quote to clarify why zero is considered invalid and
checks for zero before even passing the ID to queryid_to_index() for
then checking the upper bound.

This is a v2 of a patch originally posted by Juha-Pekka (thanks)

Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
2017-03-01 23:01:48 +00:00
Tobias Klausmann
6d600cf632 amd/common: Fix build with new ac_add_function_attr()
Fix usage of ac_add_function_attr() and make it known!

common/ac_nir_to_llvm.c: In function 'create_llvm_function':
common/ac_nir_to_llvm.c:265:4: error: implicit declaration of function
'ac_add_function_attr' [-Werror=implicit-function-declaration]
    ac_add_function_attr(main_function, i + 1, AC_FUNC_ATTR_BYVAL);
    ^~~~~~~~~~~~~~~~~~~~

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-03-01 23:53:38 +01:00
Daniel Stone
a1727aa75e egl/wayland: Don't use DRM format codes for SHM
The wl_drm interface (akin to X11's DRI2) uses the standard set of DRM
FourCC format codes. wl_shm copies this, except for ARGB8888/XRGB8888,
which use their own definitions.

Make sure we only use wl_shm format codes when we're working with
wl_shm. Otherwise, using swrast with 32bpp formats would fail with an
error.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Daniel Stone <daniels@collabora.com> (v1)
Fixes: cb5e799448 ("egl/wayland: unify dri2_wl_create_surface implementations")

v2: [Emil Velikov: move to dri2_wl_create_window_surface]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com> (IRC)
2017-03-01 18:36:55 +00:00
Kenneth Graunke
c0e9e61c9a mesa: Drop unused STATE_TEXRECT_SCALE program statevars.
The last user is now gone.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
2017-03-01 10:27:38 -08:00
Kenneth Graunke
f356d05393 i965: Drop unused STATE_TEXRECT_SCALE code.
In the past, we used this on Gen4-5 to transform non-normalized texture
coordinates (for sampler2DRect) to normalized ones.  We also used it on
Gen6-7.5 for sampler2DRect with GL_CLAMP.

Jason dropped this code in 6c8ba59cff
in favor of using nir_lower_tex(), which just does a textureSize()
call.  But we were still setting up these state references for
useless uniform data.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
2017-03-01 10:27:36 -08:00
Kenneth Graunke
4061bbccf2 egl: Ensure ResetNotificationStrategy matches for shared contexts.
Fixes:
dEQP-EGL.functional.robustness.negative_context.invalid_robust_shared_context_creation

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
2017-03-01 10:26:42 -08:00
Marek Olšák
940da36a65 gallivm,ac: add function attributes at call sites instead of declarations
They can vary at call sites if the intrinsic is NOT a legacy SI intrinsic.
We need this to force readnone or inaccessiblememonly on some amdgcn
intrinsics.

This is only used with LLVM 4.0 and later. Intrinsics only used with
LLVM <= 3.9 don't need the LEGACY flag.

gallivm and ac code is in the same patch, because splitting would be
more complicated with all the LEGACY uses all over the place.

v2: don't change the prototype of lp_add_function_attr.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com> (v1)
2017-03-01 18:59:36 +01:00
Marek Olšák
408f370710 gallivm,ac: remove unused FUNC_ATTR_LAST enums
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-03-01 18:59:36 +01:00
Nicolai Hähnle
40c77bbf83 st/mesa: inform the driver of framebuffer changes before compute dispatches
Even though compute shaders cannot access the framebuffer, there is a
synchronization issue when a compute dispatch accesses a texture that
was previously bound and drawn to as a framebuffer.

Section 9.3 (Feedback Loops Between Textures and the Framebuffer) of
the OpenGL 4.5 spec rather implicitly clarifies that undefined behavior
results if the texture is still attached to the currently bound
framebuffer. However, the feedback loop is broken when the application
changes the framebuffer binding before a compute dispatch, and the
state tracker needs to let the driver known about this.

Fixes GL45-CTS.compute_shader.pipeline-post-fs on SI family Radeons.

Cc: mesa-stable@lists.freedesktop.org

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-03-01 18:59:36 +01:00
Nicolai Hähnle
911391bd70 st/glsl_to_tgsi: avoid iterating past the head of the instruction list
exec_node::get_prev() does not guard against going past the beginning
of the list, so we need to add explicit checks here.

Found by ASAN in piglit arb_shader_storage_buffer_object-rendering.

Cc: mesa-stable@lists.freedesktop.org

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-03-01 18:59:36 +01:00
Marc Dietrich
64b215223f r600g: fix build without opencl and static llvm libs
radeon_llvm_check and friends were never called in the no-opencl case,
which ended up with an empty llvm module list. As --enable-opencl always
requires --enable-llvm, we can use the latter as the guard.

Signed-off-by: Marc Dietrich <marvin24@gmx.de>
[Emil Velikov: commit message polish]
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-01 13:22:48 +00:00
Samuel Pitoiset
be8aa76afd glsl: remove unecessary flags.q.subroutine_def
This bit is definitely not necessary because subroutine_list
can be used instead. This frees one more bit in the flags.q
struct which is nice because arb_bindless_texture will need
4 bits for the new layout qualifiers.

No piglit regressions found (including compiler tests) with
"-t subroutine".

v2: set the subroutine flag for validating illegal flags

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-01 14:15:31 +01:00
Emil Velikov
ca7d2025a7 vulkan: provide vk.xml as argument to the python generator
Do not hardcode the file in the python script, but pass it via the build
system(s). The latter is the only one that should know about the file
location/tree structure.

Cc: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-02-28 18:53:04 +00:00
Emil Velikov
14281c9035 automake: vulkan: rename/reuse VULKAN_UTIL_{GENERATED_,}FILES list
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-28 14:13:09 +00:00
Mauro Rossi
3f2cb699cf android: vulkan: add support for libmesa_vulkan_util
The following changes are implemented:

Add src/vulkan/Android.mk to build libmesa_vulkan_util
Android.mk: add src/vulkan to SUBDIR to build new module
intel/vulkan: fix libmesa_vulkan_util,vk_enum_to_str.h dependencies
Add -o OUTPUT_PATH option in src/vulkan/util/gen_enum_to_str.py script
Use -o OUTPUT_PATH option in automake generation rules for vk_enum_to_str.{c,h}

Fixes: e9dcb17 "vulkan/util: Add generator for enum_to_str functions"
Fixes: 8e03250 "vulkan: Combine wsi and util makefiles"
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

[Emil Velikov]
 - Move parser within main()
 - Use --outdir instead of -o
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-28 01:24:41 +01:00
Emil Velikov
3bbbb63801 automake: r600: radeonsi: correctly manage libamd_common.la linking
Since both r600 and radeonsi use code from libamd_common they need to
static link it. At the same time, adding a common library to LIB_DEPS is
fragile [can lean to multiple symbol definitions] and non-obvious - I
had to do a double-take how things work atm.

So follow the libradeon.la approach and put common libraries in
TARGET_RADEON_COMMON

Fixes: 936f5407a7 ("gallium/radeon: Add libamd_common.a to TARGET_LIB_DEPS also for r600")
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2017-02-28 10:55:46 +00:00
Emil Velikov
8af447d6f0 glx/tests: automake: add dispatch-index-check to the tarball
Otherwise we'll fail at `make distcheck'

Fixes: 3cc33e7640 ("glx: add GLXdispatchIndex sort check")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-28 16:18:27 +00:00
Emil Velikov
3935690d58 automake: anv: add missing include $(top_srcdir)/src/vulkan/util
Otherwise we'll fail to find the header and `make distcheck` will bail.

Fixes: e9dcb17962 ("vulkan/util: Add generator for enum_to_str functions")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-28 14:08:17 +00:00
Samuel Iglesias Gonsálvez
0dddad5b1b i965/fs: emit MOV_INDIRECT with the source with the right register type
This was hiding bugs as it retyped the source to destination's type.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-03-01 06:50:35 +01:00
Samuel Iglesias Gonsálvez
d8122128bc i965/fs: fix source type when emitting MOV_INDIRECT to read ICP handles
When generating the MOV INDIRECT instruction, the source type is ignored
and it is set to destination's type. However, this is going to change in a
later patch, so we need to explicitly set the proper source type.

brw_vec8_grf() creates an float type's fs_reg by default, when the
ICP handle is actually unsigned. This patch fixes these cases before
applying the aforementioned patch.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-03-01 06:50:35 +01:00
Samuel Iglesias Gonsálvez
56266df7ed i965/fs: fix indirect load DF uniforms on BSW/BXT
The lowered BSW/BXT indirect move instructions had incorrect
source types, which luckily wasn't causing incorrect assembly to be
generated due to the bug fixed in the next patch, but would have
confused the remaining back-end IR infrastructure due to the mismatch
between the IR source types and the emitted machine code.

v2:
- Improve commit log (Curro)
- Fix read_size (Curro)
- Fix DF uniform array detection in assign_constant_locations() when
  it is acceded with 32-bit MOV_INDIRECTs in BSW/BXT.

v3:
- Move changes in assign_constant_locations() to other patch.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-03-01 06:50:35 +01:00
Samuel Iglesias Gonsálvez
a497ab6838 i965/fs: detect different bit size accesses to uniforms to push them in proper locations
Previously, if we had accesses with different sizes to the same uniform, we might not
push it aligned with the bigger one. This is a problem in BSW/BXT when we access
an array of DF uniform with both direct and indirect addressing because for the latter
we use 32-bit MOV INDIRECT instructions. However this problem can happen with other
generations and bitsizes.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-03-01 06:50:29 +01:00
Samuel Iglesias Gonsálvez
7427425247 i965/fs: mark last DF uniform array element as 64 bit live one
This bug can make that we don't detect the end of a contiguous area
correctly and push larger areas than the real ones.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-03-01 06:50:10 +01:00
Dave Airlie
e66be3d3bb radv: fix txs for sampler buffers
I messed this up when I wrote it, this fixes:
dEQP-VK.memory.pipeline_barrier.*uniform_texel_buffer.*

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-01 08:02:24 +10:00
Marek Olšák
8c838730d0 amd/common: fix ASICREV_IS_POLARIS11_M for Polaris12
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-02-28 21:44:30 +01:00
Bas Nieuwenhuizen
6e9fb1de7f radv: Don't allocate space for unused immutable samplers.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-02-28 20:48:18 +01:00
Bas Nieuwenhuizen
137b06b437 radv/ac: Use constants for immutable samplers.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-02-28 20:48:14 +01:00
Bas Nieuwenhuizen
500e6e40f6 radv: Detect if all immutable samplers for a binding are equal.
We can then use constants for indexed loads.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-02-28 20:48:10 +01:00
Bas Nieuwenhuizen
dd2a0c7aef radv: Store the immutable samplers as uint32_t[4].
So we don't need to know about radv_sampler in ac_nir_to_llvm.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-02-28 20:46:02 +01:00
Brendan King
884f65e185 egl/dri3: implement query surface hook
This is a DRI3 version of a change made for DRI2
(4d6d4f939e, "egl/dri2: implement query surface hook"),
that fixed failures in dEQP-EGL.functional.resize.surface_size.grow
and dEQP-EGL.functional.resize.surface_size.shrink.

Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Mark Janes <mark.a.janes@intel.com>
Cc: Chad Versace <chadversary@chromium.org>
Signed-off-by: Brendan King <Brendan.King@imgtec.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
2017-02-28 10:11:42 +00:00
Michel Dänzer
936f5407a7 gallium/radeon: Add libamd_common.a to TARGET_LIB_DEPS also for r600
Fixes build failure with --enable-opencl --enable-xvmc:

make[4]: Entering directory '/home/daenzer/src/mesa-git/mesa/build-amd64/src/gallium/targets/xvmc'
  CXXLD    libXvMCgallium.la
../../../../src/gallium/drivers/r600/.libs/libr600.a(evergreen_compute.o): In function `evergreen_create_compute_state':
/home/daenzer/src/mesa-git/mesa/build-amd64/src/gallium/drivers/r600/../../../../../src/gallium/drivers/r600/evergreen_compute.c:254: undefined reference to `ac_elf_read'
../../../../src/gallium/drivers/r600/.libs/libr600.a(evergreen_compute.o): In function `r600_shader_binary_read_config':
/home/daenzer/src/mesa-git/mesa/build-amd64/src/gallium/drivers/r600/../../../../../src/gallium/drivers/r600/evergreen_compute.c:189: undefined reference to `ac_shader_binary_config_start'
/home/daenzer/src/mesa-git/mesa/build-amd64/src/gallium/drivers/r600/../../../../../src/gallium/drivers/r600/evergreen_compute.c:189: undefined reference to `ac_shader_binary_config_start'
collect2: error: ld returned 1 exit status
Makefile:760: recipe for target 'libXvMCgallium.la' failed

Fixes: dc4c551a34 ("radeon/ac: switch from radeon_elf_read() to ac_elf_read()")
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-02-28 16:35:21 +09:00
Kenneth Graunke
b8cd78eaa1 i965: Move intel_resolve_map.[ch] from i965_compiler_FILES to i965_FILES
I have no idea why these were part of the compiler files.  They're
miptree related code, and the compiler doesn't appear to use them.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-02-27 22:56:59 -08:00
Timothy Arceri
4d0d81379e gallium/r600: fix r600 build when OpenCL is enabled
Fixes build regression caused by d90bf4ef3e
2017-02-28 15:42:18 +11:00
Timothy Arceri
d90bf4ef3e radeon: remove unused radeon_elf_util.{c,h}
We now use the shared code in AMD common instead.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-02-28 13:20:31 +11:00
Timothy Arceri
503fb134e8 radeon/ac: switch to ac_shader_binary_config_start()
For radeonsi we could probably switch to
ac_shader_binary_read_config(). However the functions have
diverged so just share this helper for now.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-02-28 13:20:31 +11:00
Timothy Arceri
f0aaa4b3a4 radeon/ac: make ac_shader_binary_config_start() available externally
The read config functions are different for r600 and radeonsi so
we can't just share the one in amd common. So just share this
instead.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-02-28 13:20:31 +11:00
Timothy Arceri
dc4c551a34 radeon/ac: switch from radeon_elf_read() to ac_elf_read()
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-02-28 13:20:31 +11:00
Timothy Arceri
69a687189e radeon/ac: switch from radeon_shader_binary to ac_shader_binary
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-02-28 13:20:31 +11:00
Timothy Arceri
affc8314cb radeon/ac: add llvm_ir_string to ac_shader_binary struct
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-02-28 13:20:31 +11:00
Kenneth Graunke
63d1ebca3a ralloc: Delete autofree handling.
There was exactly one user of this, and I just removed it.

It also accessed an implicit global context, with no locking.  This
meant that it was only safe if all callers of ralloc_autofree_context()
held the same lock...which is a pretty terrible thing for a utility
library to impose.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-02-27 15:46:12 -08:00
Kenneth Graunke
aa8bb9fc15 compiler: Free types in _mesa_glsl_release_types() rather than autofree.
Instead of using ralloc_autofree_context() to install an atexit()
handler to ralloc_free(glsl_type::mem_ctx), we can simply free them
from _mesa_glsl_release_types().

This is effectively the same, because _mesa_glsl_release_types() is
called from _mesa_destroy_shader_compiler(), which is called from Mesa's
one_time_fini() function, which Mesa installs as an atexit() handler.

The one advantage here is that it ensures the built-in functions are
destroyed before the types.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-02-27 15:46:12 -08:00
Jan Vesely
010fecb853 clover: Dump linked binary to a different file
this allows to pass the generated files directly to llc or bugpoint

v2: add atomic counter ID
v3: remove extra scope operator, constify

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-02-27 16:11:48 -05:00
Dave Airlie
800b82ea13 radv: fix depth format in blit2d.
For blitting we need to use the depth or stencil format, never
the combined.

This fixes:
dEQP-VK.texture.shadow.2d.nearest.less_or_equal_d32_sfloat_s8_uint
and a few others.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-28 06:11:54 +10:00
Dave Airlie
1121ce4525 radv/formats: add fast clear for 8-bit signed ints.
These formats are used by some CTS tests, may as well fill them in.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-28 06:11:50 +10:00
Samuel Pitoiset
ec623f77eb mesa/main: refactor sampler parameter error codepath
This is similar to what we do in the texture error codepath.
While we are at it, update the specification comment with
latest GL 4.5 spec.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-02-27 19:42:23 +01:00
Samuel Pitoiset
e69fd0b43c glsl: reject samplers not declared as uniform/function params earlier
This improves consistency with image variables and atomic
counters which are already rejected the same way.

Note that opaque variables can't be treated as l-values, which
means only the 'in' function parameter is allowed.

v2: rewrite commit message

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
2017-02-27 19:42:00 +01:00
Samuel Pitoiset
08a052966f glsl: use is_sampler() anywhere it's possible
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-02-27 19:41:14 +01:00
Samuel Pitoiset
e12f4edf9c glsl: use is_image() anywhere it's possible
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-02-27 19:41:11 +01:00
Samuel Pitoiset
46562a062b glsl: add missing blend_support qualifier in validate_flags()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2017-02-27 19:40:12 +01:00
Samuel Pitoiset
87ee1729d0 glsl: use an enum for AMD_conservative_depth layout qualifiers
The main idea behind this is to free some bits in the flags.q
struct because currently all 64-bits are used and we can't
add more layout qualifiers without reaching a static assert.

In order to do that (mainly for ARB_bindless_texture), use an
enumeration for the AMD_conservative_depth layout qualifiers
because it's forbidden to declare more than one depth qualifier
for gl_FragDepth.

Note that ast_type_qualifier::merge_qualifier() will prevent
using duplicate layout qualifiers by returning a compile-time
error.

No piglit regressions found (including compiler tests) with
RX480 on RadeonSI.

v2: use a switch case

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Andres Gomez <agomez@igalia.com> (v1)
2017-02-27 19:39:37 +01:00
Samuel Pitoiset
de2727925a glsl: add has_shader_image_load_store()
Preliminary work for ARB_bindless_texture which can interact
with ARB_shader_image_load_store.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-02-27 19:33:10 +01:00
Samuel Pitoiset
ea8086861f drirc: add force_glsl_version=440 for The Culling
This game uses GLSL 430 but the interpolation qualifiers in
some shaders don't match, which ends up in a link error. GLSL
440 spec removed this restriction, force it.

This fixes the following link error, as well as serious
rendering problems.

error: vertex shader output `out_TEXCOORD1' specifies noperspective
interpolation qualifier, but fragment shader input specifies no
interpolation qualifier

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-02-27 19:32:55 +01:00
Jason Ekstrand
76c8327e6e anv: Bump advertised version to 1.0.42
We've been following the spec changes.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-02-27 09:44:46 -08:00
Jason Ekstrand
54dd42eb94 vulkan: Update registry and headers to 1.0.42
This brings in a bunch of new extensions
2017-02-27 09:44:45 -08:00
Elie TOURNIER
082d5b1aee nir: Delete unused arg in get_iteration
nir_const_value is not needed in get_iteration

Signed-off-by: Elie Tournier <tournier.elie@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-02-27 14:35:16 +00:00
Eric Engestrom
077879cf5e docs: fix a few typos
Noticed a couple, found the rest using vimspell.

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-27 14:15:10 +00:00
Grazvydas Ignotas
7f268cf12b gallium/u_queue: set num_threads correctly if not all threads start
If i-th thread could not be created it means we have i threads,
not i+1, because we start from 0.

Fixes: 404d0d5 "gallium/u_queue: add an option to have multiple worker threads"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-02-27 14:49:46 +01:00
Grazvydas Ignotas
9936121935 gallium/u_queue: fix a crash with atexit handlers
Commit 4aea8fe ("gallium/u_queue: fix random crashes when the app calls
exit()") added a atexit handler which calls
util_queue_killall_and_wait() for each queue to stop the threads.
However the app is also free to use atexit handlers to clean up things,
leading to util_queue_destroy() call which will also call
util_queue_killall_and_wait() for the same queue again, causing threads
being joined twice, and that is undefined. This happens with libglut,
for example. A simple fix is to just set num_threads to 0 as there are
no more valid threads after util_queue_killall_and_wait() returns.

Fixes: 4aea8fe "gallium/u_queue: fix random crashes when the app calls exit()"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-02-27 14:49:15 +01:00
Bas Nieuwenhuizen
43d833ae97 radv: Use correct size for availability flag.
Per spec, VK_QUERY_RESULT_64_BIT specifies the integer size and the
availability flag is an integer. We apparently handled this correctly
already for the copy to buffer case.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
2017-02-27 01:33:10 +01:00
Bas Nieuwenhuizen
8ea34a98c0 radv: Only use PKT3_OCCLUSION_QUERY when it doesn't hang.
PKT3_OCCLUSION_QUERY hangs when used in a nested IB. This only
calls it when in a primary command buffer and we change
GetQueryPoolResults to not need it. CmdCopyQueryPoolResults
still needs it so we break that behavior for secondary command buffers.
However, that would hang already and using an unitialized value is
better than a hang.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
2017-02-27 01:33:10 +01:00
Bas Nieuwenhuizen
bb878db7eb radv: Reset emitted compute pipeline when calling secondary cmd buffer.
Otherwise if the new compute pipeline is the same as the last used
pipeline before the call, we don't emit it again.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
2017-02-27 01:33:10 +01:00
Dave Airlie
15f47027ad radv: add support for NV_dedicated_allocation
This adds initial support for NV_dedicated_allocation, then
uses it for the wsi image/memory allocation paths internally
in the driver.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-27 00:22:51 +00:00
Andres Rodriguez
35189d3279 radv/winsys: fix freeing imported memory.
This bo->fd wasn't setting some stuff correctly that could
lead to crashes for anything using this path later.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-27 00:22:39 +00:00
Dave Airlie
f695735ed6 vulkan/wsi/radv: add initial prime support (v1.1)
This is a complete rewrite of my previous rfc patches.

This adds the ability to present to a different GPU that rendering
using a driver side operation that can copy from the tiled to
linear shared image.

This does prime support completely in the swapchain present code,
and each queue has a precreated command buffer for each image
and for the each queue family. This means presenting should work
on graphics and compute queues and transfer in the future.

v1.1: initialise needs_linear_copy in swapchain.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-27 05:42:16 +10:00
Bas Nieuwenhuizen
336b05c49a radv/ac: Add integer->integer casts.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-02-26 19:59:27 +01:00
Eric Engestrom
5b5ffb795f check: add support for running test as standalone
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
2017-02-26 13:39:45 +00:00
Eric Engestrom
cd35a119ad check: make any failure fatal
Previously, only the last error code was returned.
Using `set -e` makes the script quit on any unhandled error.

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
2017-02-26 13:39:43 +00:00
Eric Engestrom
a1e5e55989 check: mark two tests are requiring bash
Requirement was removed just before pushing, but it's actually needed
for heredocs (`<<<`).

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
2017-02-26 13:39:12 +00:00
Mike Lothian
47c49f6190 st/nine: Drop USER_INDEX_BUFFERS check
This fixes 4a883966c1 where the
PIPE_CAP was removed.

Now USER_INDEX_BUFFERS are always enabled remove the check and only
check for cmst_active directly.

v2: Axel pointed out the code was still needed when cmst was inactive,
    Rebase on master too
v3: Drop struct member user_ibufs also && fixup shortlog (Edward).
v4: Fix negation
v5: Use the right variable name csmt != cmst

Fixes: 4a883966c1 ("gallium: remove PIPE_CAP_USER_INDEX_BUFFERS")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99953
Reported-and-tested-by: Vinson Lee <vlee@freedesktop.org> (v1)
Cc: Marek Olšák <marek.olsak@amd.com>
Cc: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-02-25 23:20:18 +11:00
Constantine Charlamov
abb1c645c4 st/nine: make use of common uploaders v4
Make use of common uploaders that landed recently to Mesa

v2: fixed formatting, broken due to thunderbird configuration

v3: per Axel comment: added a comment into NineDevice9_DrawPrimitiveUP

v4: per Axel comment: changed style of the comment
2017-02-25 09:31:10 +01:00
Timothy Arceri
6b4bb24acf compiler: style clean-ups in blob.h
Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
2017-02-25 13:30:28 +11:00
Brian Paul
fcf466383a svga: fix MSVC build error after PIPE_CAP_USER_INDEX_BUFFERS removal
Need to specify the zero for the struct initializer.  My earlier test
of the patch series was with MinGW, not MSVC.

Trivial.
2017-02-24 19:07:10 -07:00
Eric Anholt
292c24ddac vc4: Lazily emit our FS/VS input loads.
This reduces register pressure in both types of shaders, by reordering the
input loads from the var->data.driver_location order to whatever order
they appear first in the NIR shader.  These instructions aren't
reorderable at our QIR scheduling level because the FS takes two in
lockstep to do an interpolation, and the VS takes multiple read
instructions in a row to get a whole vec4-level attribute read.

shader-db impact:
total instructions in shared programs: 76666 -> 76590 (-0.10%)
instructions in affected programs:     42945 -> 42869 (-0.18%)
total max temps in shared programs: 9395 -> 9208 (-1.99%)
max temps in affected programs:     2951 -> 2764 (-6.34%)

Some programs get their max temps hurt, depending on the order that the
load_input intrinsics appear, because we end up being unable to copy
propagate an older VPM read into its only use.
2017-02-24 17:01:29 -08:00
Eric Anholt
f06915d7b7 vc4: Refactor the load_input code out of the intrinsic code.
It's going gain most of ntq_setup_inputs(), so simplify it first.
2017-02-24 16:31:54 -08:00
Eric Anholt
84a304eb96 vc4: Track the last block we emitted at the top level.
This will be used for delaying our VPM reads (which must be unconditional)
until just before they're used.
2017-02-24 16:31:54 -08:00
Eric Anholt
99d4203ad5 vc4: Emit max number of temps in the shader-db output.
We need to be paying attention to optimization's impact on this -- even if
we reduce instruction count, increasing max temps in general is likely to
cause us to fail to register allocate on some shaders, which means that
those won't run at all.
2017-02-24 16:31:54 -08:00
Vinson Lee
30a4b25efe util/disk_cache: Use backward compatible st_mtime.
Fix Mac OS X build error.

  CC       libmesautil_la-disk_cache.lo
In file included from disk_cache.c:46:
./disk_cache.h:57:20: error: no member named 'st_mtim' in 'struct stat'
   *timestamp = st.st_mtim.tv_sec;
                ~~ ^

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99918
Fixes: 207e3a6e4b ("util/radv: move *_get_function_timestamp() to utils")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-24 16:06:40 -08:00
Vinson Lee
c3f9540a0c glsl: Fix missing-braces warning.
CXX    glsl/ast_to_hir.lo
glsl/ast_to_hir.cpp: In member function 'virtual ir_rvalue* ast_declarator_list::hir(exec_list*, _mesa_glsl_parse_state*)':
glsl/ast_to_hir.cpp:4846:42: warning: missing braces around initializer for 'unsigned int [16]' [-Wmissing-braces]

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2017-02-24 16:04:06 -08:00
Marek Olšák
c7878b0167 ac: silence a warning
trivial
2017-02-25 00:16:38 +01:00
Marek Olšák
35915af6c9 radeonsi: fix broken tessellation on Carrizo and Stoney
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99850

Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2017-02-25 00:03:09 +01:00
Marek Olšák
e027935a79 st/mesa: don't update unrelated states in non-draw calls such as Clear
If a VAO isn't bound and u_vbuf isn't enabled because of the Core profile,
we'll get user vertex buffers in drivers if we update vertex buffers
in glClear. So don't do that.

This fixes a regression since disabling u_vbuf for Core profiles.

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-02-25 00:03:09 +01:00
Marek Olšák
cc2f92b09f st/mesa: set blend state for PBO readbacks
v2: restore the state

Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-02-25 00:03:09 +01:00
Marek Olšák
a40b76143d st/mesa: reset sample_mask, min_sample, and render_condition for PBO ops
Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-02-25 00:03:09 +01:00
Marek Olšák
1a36bea445 st/mesa: don't check st->vp in update_clip
The clip state is updated before VS, so it can be NULL for the first draw
call. Just remove the unnecessary dependency on st->vp.

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-02-25 00:03:09 +01:00
Marek Olšák
d17b8d08a3 trace: remove pipe_resource wrapping
Not needed. ddebug does the same thing. The limitation is that drivers
can only use pipe_resource::screen through pipe_resource_reference.

This unbreaks trace, because pipe_context uploaders aren't wrapped,
so trace doesn't understand buffers returned by them.

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-02-25 00:03:09 +01:00
Marek Olšák
4a883966c1 gallium: remove PIPE_CAP_USER_INDEX_BUFFERS
all drivers support it

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>  (VMware driver only)
2017-02-25 00:03:09 +01:00
Marek Olšák
4700f409fb st/mesa: assume all drivers support user index buffers
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>  (VMware driver only)
2017-02-25 00:03:09 +01:00
Marek Olšák
e78ccee933 svga: implement user index buffers
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>  (VMware driver only)
2017-02-25 00:03:09 +01:00
Marek Olšák
7fff5b77f1 freedreno: add support for user index buffers
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-02-25 00:03:09 +01:00
Marek Olšák
19c51e072b etnaviv: add support for user index buffers
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-02-25 00:03:09 +01:00
Marek Olšák
f139b6fb4f gallium/util: add new helpers for user index buffer uploading
v3: split from the etnaviv patch; fix new_ib.buffer leak

Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>  (VMware driver only)
2017-02-25 00:03:09 +01:00
Elie TOURNIER
b10197e3a4 nir: delete magic number
Signed-off-by: Elie Tournier <tournier.elie@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-02-24 13:02:24 -08:00
Roland Scheidegger
c3a94d9195 gallium/util: (trivial) fix util_clear_render_target
the format of the rt can be different than the one of the texture, so must
propagate the format explicitly to the helper. Broken since
3f9c5d6244 (but unused by st/mesa).
2017-02-24 20:39:56 +01:00
Emil Velikov
9833488974 util: automake: add sha1/README to the tarball
Suggested-by: Andreas Boll <andreas.boll.dev@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-24 17:38:16 +00:00
Emil Velikov
6854716f37 mapi: remove unused mapi.[ch]
The final user of it was st/vega.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
2017-02-24 17:37:02 +00:00
Emil Velikov
93369aa928 blorp: automake: add TODO to the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
2017-02-24 17:37:00 +00:00
Emil Velikov
ab6fa871ef anv: automake: add TODO to the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
2017-02-24 17:36:59 +00:00
Emil Velikov
aa63b7fa16 vc4: automake: add the kernel/README to the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
2017-02-24 17:36:57 +00:00
Emil Velikov
f64a7c74c3 nir: automake: add the README to the tarball
Similar to other accompanying documentation we have in-tree.
For example glsl/README.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
2017-02-24 17:36:45 +00:00
Emil Velikov
e3ad2d40db radv/entrypoints: Only generate entrypoints for supported features
This changes the way radv_entrypoints_gen.py works from generating a
table containing every single entrypoint in the XML to just the ones
that we actually need.  There's no reason for us to burn entrypoint
table space on a bunch of NV extensions we never plan to implement.

RADV implements VK_AMD_draw_indirect_count, so add that to the list.

Port of 114c281e70
"and/entrypoints: Only generate entrypoints for supported features"

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2017-02-24 17:36:25 +00:00
Robert Bragg
d1bb7895b9 main/performance_query: s/GLboolean/bool/
Ideally would have caught these when adding the interface but this just
switches a few return types for the INTEL_performance_query backend
interface to bool instead of GLboolean.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-24 17:16:11 +00:00
Eric Engestrom
1534fc6d10 eglapi: replace linear entrypoint search with binary search
Tested with dEQP-EGL.functional.get_proc_address.*

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-24 17:00:50 +00:00
Eric Engestrom
d25dea0c68 egl: make sure entrypoints list is always sorted
Starting with the next commit, badly sorting this list will break the
eglGetProcAddress().

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-24 17:00:50 +00:00
Eric Engestrom
557f3181bf egl: distribute all tests
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-24 17:00:50 +00:00
Eric Engestrom
f92fd4d7a8 eglapi: move entrypoints list out to its own file
This will allow us to make sure the list is always sorted in the next
commit.

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-24 17:00:50 +00:00
Eric Engestrom
2b3cd82e18 eglapi: sort entrypoints list
Let's make that comment true.
If will also be necessary in a couple commits (using bsearch).

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-24 17:00:50 +00:00
Eric Engestrom
3b69c4a8e8 eglapi: use macro to map entrypoints to functions
As of the last 3 commits, there's a function for each entrypoint.

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-24 17:00:50 +00:00
Eric Engestrom
66d5ec5f3f eglapi: add entrypoint for eglClientWaitSyncKHR
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-24 17:00:50 +00:00
Eric Engestrom
b7f6f3b3e5 eglapi: add entrypoint for eglDestroySyncKHR
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-24 17:00:50 +00:00
Eric Engestrom
df7fa30aec eglapi: add entrypoint for eglDestroyImageKHR
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-24 17:00:50 +00:00
Thomas Hellstrom
7b82efe4ee st/va: Fix up YV12 to NV12 putImage conversion
Use the utility u_copy_nv12_from_yv12 to implement this similarly to
how it's been done in the VPAU state tracker. The old code mixed up
planes and fields and didn't correctly handle video surfaces in
interlaced format.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
2017-02-24 16:44:34 +01:00
Thomas Hellstrom
3a418322ec st/vdpau: Provide YV12 to NV12 putBits conversion v2
mplayer likes putting YV12 data, and if there is a buffer format mismatch,
the vdpau state tracker would try to reallocate the video surface as an
YV12 surface. A virtual driver doesn't like reallocating and doesn't like YV12
surfaces, so if we can't support YV12, try an YV12 to NV12 conversion
instead.

Also advertize that we actually can do the getBits and putBits conversion.

v2: A previous version of this patch prioritized conversion before
reallocating. This has been changed to prioritize reallocating in this version.

Cc: Christian König <christian.koenig@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
2017-02-24 16:44:33 +01:00
Leo Liu
5398d006de configure.ac: check require_basic_egl only if egl enabled
Otherwise the configuration fails when building independant libs
like vdpau, vaapi or omx

Fixes: 1ac40173c2 ("configure.ac: simplify EGL requirements for
drivers dependent on EGL")

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
2017-02-24 09:48:47 -05:00
Eric Engestrom
3cc33e7640 glx: add GLXdispatchIndex sort check
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-24 14:44:58 +00:00
Lars Hamre
caf4252a01 docs: update features.txt for GL_ARB_clear_texture with llvmpipe and softpipe
Signed-off-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-02-24 15:41:26 +01:00
Lars Hamre
a876b50b20 softpipe: enable clear_texture with util_clear_texture
Passes all corresponding piglit tests.

Signed-off-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-02-24 15:41:13 +01:00
Lars Hamre
12f2058b47 llvmpipe: enable clear_texture with util_clear_texture
Passes all corresponding piglit tests.

Signed-off-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-02-24 15:40:57 +01:00
Lars Hamre
3f9c5d6244 gallium: implement util_clear_texture
v3: have util_clear_texture mirror the pipe function (Roland Scheidegger)
v2: rework util clear functions such that they operate on a resource
    instead of a surface (Roland Scheidegger)

Creates a util_clear_texture function for implementing the GL_ARB_clear_texture
in softpipe and llvmpipe.

Signed-off-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-02-24 15:40:11 +01:00
Jerome Duval
62e27170a7 haiku/winsys: fix dt prototype args 2017-02-24 14:10:57 +00:00
Jerome Duval
40b0c8666c haiku: build fixes around debug defines 2017-02-24 14:10:57 +00:00
Dave Airlie
ccb70d6f53 radv: add sample mask output support
This adds support to write to sample mask from the fragment shader.

We can optimise this later like radeonsi.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-24 10:31:53 +10:00
Dave Airlie
8282c5c771 radv/ac: refactor our fmask sample index fixup.
This refactors out the sample index fixup between
txf and image load.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-24 10:31:49 +10:00
Dave Airlie
5e9ead0fa2 radv: fetch sample index via fmask for image coord as well.
This follows the txf_ms code, I can't figure out why amdgpu-pro
doesn't do this in their shaders, they must know someone we don't.

This fixes:
dEQP-VK.pipeline.multisample_shader_builtin.sample_id.*

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-24 10:31:44 +10:00
Dave Airlie
bdcbe7c76b radv: add sample mask input support
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-24 10:31:35 +10:00
Dave Airlie
58c97a0791 radv: enable location at sample when persample is forced.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-24 10:31:30 +10:00
Dave Airlie
fc430c391b radv: fix interpolation at wrong place for offset interp
The code was interpolating at the offset from the sample,
not the offset from the center. Also fix for persample interpolation
modes we should force the pixel center to be at the sample.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-24 10:31:19 +10:00
George Kyriazis
dcac48bfee swr: fix index buffers with non-zero indices
Fix issue with index buffers that do not contain a 0 index.  0 index
can be a non-valid index if the (copied) vertex buffers are a subset of the
user's (which happens because we only copy the range between min & max).
Core will use an index passed in from the driver to replace invalid indices.

Only do this for calls that contain non-zero indices, to minimize performance

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>

cost.
2017-02-23 16:36:18 -06:00
George Kyriazis
669d8f626f swr: add fetch shader cache
For now, the cache key is all of FETCH_COMPILE_STATE.

Use new/delete for swr_vertex_element_state, since we have to call the
constructors/destructors of the struct elements.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-02-23 16:36:13 -06:00
Timothy Arceri
987d8037ca st/mesa: free shader cache buffer on fallback
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2017-02-24 09:01:59 +11:00
Timothy Arceri
c24d0aaa9a st/mesa: fix crash in shader cache cased by race condition
If a thread doesn't load GLSL IR from cache but does load TGSI
from cache (that was created by another thread) than it will
crash due to expecting gl_program_parameter_list to have been
restored from the GLSL IR cache and not be null.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2017-02-24 09:01:59 +11:00
Jason Ekstrand
261092f7d4 anv: Enable MSAA compression
This just enables basic MSAA compression (no fast clears) for all
multisampled surfaces.  This improves the framerate of the Sascha
"multisampling" demo by 76% on my Sky Lake laptop.  Running Talos on
medium settings with 8x MSAA, this improves the framerate in the
benchmark by 80%.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-02-23 12:10:42 -08:00
Jason Ekstrand
42b10b175d anv/blorp/clear_subpass: Only set surface clear color for fast clears
Not all clear colors are valid.  In particular, on Broadwell and
earlier, only 0/1 colors are allowed in surface state.  No CTS tests are
affected outright by this because, apparently, the CTS coverage for
different clear colors is pretty terrible.  However, when multisample
compression is enabled, we do hit it with CTS tests and this commit
prevents regressions when enabling MCS on Broadwell and earlier.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-02-23 12:10:42 -08:00
Pohjolainen, Topi
042cc201f2 intel/isl: Apply render target alignment constraints for MCS
v2: Instead of having the same block in isl_gen7,8,9.c add it
    once into isl.c::isl_choose_image_alignment_el() instead.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-02-23 12:10:42 -08:00
Lionel Landwerlin
34e29b2ebd intel/isl: add MCS width constraint 16 samples
v3 (Jason Ekstrand): Add a comment explaining why

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-02-23 12:10:42 -08:00
Jason Ekstrand
3885375195 intel/isl: Return surface creation success from aux helpers
The isl_surf_init call that each of these helpers make can, in theory,
fail.  We should propagate that up to the caller rather than just
silently ignoring it.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-02-23 12:10:42 -08:00
Kenneth Graunke
e6e8475b0f glsl: Raise a link error for non-SSO ES programs with a TES but no TCS.
OpenGL allows the TCS to be missing and supplies an implicit passthrough
shader, but OpenGL ES does not (see section 7.3 of the ES 3.2 spec,
cited above in the code).

One open question is how to handle this for ARB_ES3_2_compatibility.
This patch raises the link error for all ES shading language programs,
but it might make sense to base it on the API.  The approach taken in
this patch is more restrictive, but should still allow any valid ES
programs to work in GL.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2017-02-23 11:07:06 -08:00
Samuel Iglesias Gonsálvez
a9c488f285 isl/state: fix assert on raw buffer surface state minimum size
From IVB PRM, SURFACE_STATE::Height:

"For typed buffer and structured buffer surfaces, the number of
 entries in the buffer ranges from 1 to 2^27 . For raw buffer
 surfaces, the number of entries in the buffer is the number of bytes
 which can range from 1 to 2^30."

The minimum value is 1, according to the spec. The spec quote
was already added into the code by 028f6d8317.

Fixes crashing tests under:

dEQP-VK.robustness.buffer_access.*

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-02-23 11:46:47 +01:00
Iago Toral Quiroga
42b9057447 glsl: enable early_fragment_tests implicitly with post_depth_coverage
From ARB_post_depth_coverage:

   "This extension allows the fragment shader to control whether values in
    gl_SampleMaskIn[] reflect the coverage after application of the early
    depth and stencil tests.  This feature can be enabled with the following
    layout qualifier in the fragment shader:

       layout(post_depth_coverage) in;

    Use of this feature implicitly enables early fragment tests."

And a bit later it also adds:

   "early_fragment_tests" requests that fragment tests be performed before
    fragment shader execution, as described in section 15.2.4 "Early Fragment
    Tests" of the OpenGL Specification. If neither this nor post_depth_coverage
    are declared, per-fragment tests will be performed after fragment shader
    execution."

Fixes:
GL45-CTS.post_depth_coverage_tests.PostDepthSampleMask

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-02-23 11:21:44 +01:00
Samuel Iglesias Gonsálvez
6ca4347c82 glsl: refactor get_variable_being_redeclared() to return always an ir_variable pointer
It will return the current variable ('var') or the earlier declaration ('earlier') in
case of redeclaration of that variable.

In order to distinguish between both, 'is_redeclaration' boolean will indicate in which
case we are.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-02-23 06:56:45 +01:00
Samuel Iglesias Gonsálvez
a73a618933 glsl: fix heap-use-after-free in ast_declarator_list::hir()
The get_variable_being_redeclared() function can free 'var' because
a re-declaration of an unsized array variable can establish the size, so
we set the array type to the 'earlier' declaration and free 'var' as it is
not needed anymore.

However, the same 'var' is referenced later in ast_declarator_list::hir().

This patch fixes it by picking the ir_variable_mode from the proper
ir_variable.

This error was detected by Address Sanitizer.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Suggested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99677
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2017-02-23 06:56:16 +01:00
Charmaine Lee
043883647a st/wgl: flush with ST_FLUSH_WAIT before releasing shared contexts
Before releasing a shared context, flush the context
with ST_FLUSH_WAIT to make sure all commands are executed.
This ensures that rendering to any shared resources is completed
before they will be referenced by another context.

Fixes an intermittent flickering with Photoshop. (VMware bug# 1779340)

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-02-18 09:36:42 -08:00
Charmaine Lee
d793b54c4e st: add ST_FLUSH_WAIT to st_context_flush()
When st_context_flush() is called with ST_FLUSH_WAIT,
the function will return after the fence is completed.

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-02-18 09:36:42 -08:00
Dave Airlie
b71e6538a8 radv/ac: handle gs->copy shader clip distances.
This fixes up the clip distance passing between the geometry
shader and the copy shader. It packs the clip and cull distances
into one or two consecutive slots, and avoids wasting space and
make sure the gs output and copy shader input agree on where
things are stored.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-23 15:31:41 +10:00
Dave Airlie
bec584ec0e radv/ac: pass clips properly from vertex->geometry shader stages.
This works out the geometry shader clip/cull inputs separately
to the outputs, and uses that information to read from the ES->GS
ring buffer. It stores the clip/cull distances packed into one
or two slots. It fixes the es output emission and gs input
reading to match.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-23 15:31:37 +10:00
Dave Airlie
c2cfb54f13 radv/ac: rename num clips/cull to output clips/culls
As geom shaders can have different ones on entry and exit.

also move to uint8_t as these are never that big.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-23 15:31:10 +10:00
Dave Airlie
c2ed2685fd vulkan/wsi: move image count to shared structure.
For prime support I need to access this, so move it in advance.

[airlied: fix int->uint32_t]

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-23 15:30:32 +10:00
Timothy Arceri
4711e54336 radeon: fix r600 builds when old version of llvm is present
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-02-23 14:05:55 +11:00
Dylan Baker
fb26e6c0d4 vulkan: Fix gen_enum_to_str in out of tree builds
In some configurations the util directory is created when building out
of tree, but not others. This patch ensures that it's created.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-and-Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-02-22 17:08:52 -08:00
Jason Ekstrand
1bd0e9ca33 anv/Makefile: Gather all the genX files into one place
While we're here, we also fix the alphabetization of the list of
genx_* files.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-22 15:07:18 -08:00
Timothy Arceri
2f3290ac28 r600/radeonsi: enable glsl/tgsi on-disk cache
For gpu generations that use LLVM we create a timestamp string
containing both the LLVM and Mesa build times, otherwise we just
use the Mesa build time.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-23 09:20:22 +11:00
Timothy Arceri
27cecafefd st/mesa: get on-disk shader cache
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-23 09:20:22 +11:00
Timothy Arceri
8239eef2f7 ddebug/rbug/trace: add get_disk_shader_cache() to pass-throughs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-23 09:20:22 +11:00
Timothy Arceri
4be98ed5fd gallium: add get_disk_shader_cache() callback
V2: Provide more detail in callback description and add description to
    screen.rst

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-23 09:20:22 +11:00
Timothy Arceri
9f506d817e st/mesa: implement a tgsi on-disk shader cache
Implements a tgsi cache for the OpenGL state tracker.

V2: add support for compute shaders

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-23 09:20:22 +11:00
Timothy Arceri
b9de1c2e02 st/mesa: add sha1 field to st program structs
This will be used to share the sha1 computed by the tgsi load
function with the tgsi write function.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-23 09:20:22 +11:00
Timothy Arceri
0d5130bdd0 st/mesa: move set_prog_affected_state_flags() to st_program.c
We want to use this in the new tgsi shader cache so we move it here
and make it available externally.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-23 09:20:22 +11:00
Timothy Arceri
d258055c8b util/disk_cache: fix bug with deleting old cache dirs
If there was more than a single directory in the .cache/mesa dir
then it would only remove one (or none) of the directories.

Apparently Valgrind was also reporting:
Conditional jump or move depends on uninitialised value

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-02-23 09:20:22 +11:00
Dylan Baker
8e03250fcf vulkan: Combine wsi and util makefiles
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-02-22 13:12:02 -08:00
Dylan Baker
e9dcb17962 vulkan/util: Add generator for enum_to_str functions
This adds a python generator to produce enum_to_str functions for
Vulkan from the vk.xml API description. It supports extensions as well
as core API features, and the generator works with both python2 and
python3.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-02-22 13:12:02 -08:00
Thomas Hellstrom
bda59f6e41 Revert "st/vdpau: Fix multithreading"
This reverts commit f1e5dfbe3c.

For a detailed discussion see
https://lists.freedesktop.org/archives/mesa-dev/2017-February/145283.html

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
2017-02-22 21:50:15 +01:00
Nayan Deshmukh
b8861911c5 vl: u_upload_alloc might fail to allocate buffer in bicubic filter
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-02-22 21:49:19 +01:00
Marek Olšák
7ce8adad43 gallium: reorder fields in pipe_draw_info
sizeof(struct pipe_draw_info) = 104 -> 88

Also, vertices_per_patch is switched to ubyte, because it can't be more
than 32.

Seemed-reasonable-to: Roland Scheidegger
2017-02-22 20:36:40 +01:00
Marek Olšák
3b04566bba gallium/hud: handle a thread switch for API-thread-busy monitoring
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-22 20:26:39 +01:00
Marek Olšák
31e7ba7124 gallium/hud: prevent an infinite loop
v2: use UINT64_MAX / 11

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-22 20:26:39 +01:00
Marek Olšák
24847dd1b5 gallium/u_queue: isolate util_queue_fence implementation
it's cleaner this way.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-22 20:26:39 +01:00
Marek Olšák
4aea8fe7e0 gallium/u_queue: fix random crashes when the app calls exit()
This fixes:
    vdpauinfo: ../lib/CodeGen/TargetPassConfig.cpp:579: virtual void
    llvm::TargetPassConfig::addMachinePasses(): Assertion `TPI && IPI &&
    "Pass ID not registered!"' failed.

v2: use list_head, switch the call order in destroy

Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-22 20:26:39 +01:00
Robert Bragg
a96c9564e3 i965: Implement INTEL_performance_query backend
This adds a bare-bones backend for the INTEL_performance_query extension
that exposes pipeline statistics.

Although this could be considered redundant given that the same
statistics are already available via query objects, they are a simple
starting point for this extension and it's expected to be convenient for
tools wanting to have a single go to api to introspect what performance
counters are available, along with names, descriptions and semantic/data
types.

This code is derived from Kenneth Graunke's work, temporarily removed
while the frontend and backend interface were reworked.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-02-22 19:16:21 +00:00
Robert Bragg
0e7464f0a9 mesa: Model INTEL perf query backend after query obj BE
Instead of using the same backend interface as AMD_performance_monitor
this defines a dedicated INTEL_performance_query interface that is
modelled more on the ARB_query_buffer_object interface (considering the
similarity of the extensions) with the addition of vfuncs for
initializing and enumerating query and counter info.

Compared to the previous backend, some notable differences are:

- The backend is free to represent counters using whatever data
  structures are optimal/convenient since queries and counters are
  enumerated via an iterator api instead of declaring them using
  structures directly shared with the frontend.

  This is also done to help us support the full range of data and
  semantic types available with INTEL_performance_query which is awkward
  while using a structure shared with the AMD_performance_monitor
  backend since neither extension's types are a subset of the other.

- The backend must support waiting for a query instead of the frontend
  simply using glFinish().

- Objects go through 'Active' and 'Ready' states consistent with the
  query object backend (hopefully making them more familiar). There is
  no 'Ended' state (which used to show that a query has ended at least
  once for a given object). There is a new 'Used' state, set when a
  query is first begun which implies that we are expecting to get
  results back for the object at some point. There's no equivalent to
  the 'EverBound' state since the spec doesn't require there to be a
  limbo state between generating IDs and associating them with an object
  on query Begin.

The INTEL_performance_query and AMD_performance_monitor extensions are
now completely orthogonal within Mesa main (though a driver could
optionally choose to implement both extensions within a unified backend
if that were convenient for the sake of sharing state/code).

v2: (Samuel Pitoiset)
- init PerfQuery.NumQueries in frontend
- s/return_string/output_clipped_string/
- s/backed/backend/ typo
- remove redundant *bytesWritten = 0
v3:
- Add InitPerfQueryInfo for lazy probing of available queries
v4:
- Clean up some internal usage of GL typedefs (Ken)

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-02-22 14:07:09 +00:00
Robert Bragg
d83a33a9de mesa: Separate INTEL_performance_query frontend
To allow the backend interfaces for AMD_performance_monitor and
INTEL_performance_query to evolve independently based on the more
specific requirements of each extension this starts by separating
the frontends of these extensions.

Even though there wasn't much tying these frontends together, this
separation intentionally copies what few helpers/utilities that were
shared between the two extensions, avoiding any re-factoring specific to
INTEL_performance_query so that the evolution will be easier to follow
later.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-02-22 12:12:27 +00:00
Thomas Hellstrom
ccc8720cf7 gallium/vl: Simplify the matrix filter fragment shader
It looks like it was partly copied from the median filter fragment shader
and unnecessesarily saved a lot of temporary values.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-02-22 10:22:17 +01:00
Thomas Hellstrom
f1e5dfbe3c st/vdpau: Fix multithreading
The vdpau state tracker allows multiple threads access to the same gallium
context simultaneously. We can fix this either by locking the same mutex
each time the context is used or by using a different gallium context for
each mutex domain. Here we do the latter, although I'm not sure that's really
the best option.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Acked-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-02-22 10:20:37 +01:00
Thomas Hellstrom
bcc9fd378d gallium/vl: Parameter substitution in the csc matrix computation
Makes the code significantly more readable.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-02-22 10:20:07 +01:00
Thomas Hellstrom
4c3fe3257d gallium/vl: Simplify usage of full range matrices
When looking at the full range matrices, it becomes obvious that the difference
between the standard matrices and the full range matrices is that the full
range matrices are multiplied by 1.164. Together with offsetting the y value
with -16/255, this will scale and offset RGB with the desired quantities.

However, the standard SMPTE 240M matrix seems to differ a bit since the
U and V coefficients are only multiplied with 1.138 to get the full range
matrix. This would actually alter the color somewhat so I figure that's an
error. The full range matrix is consistent with Nvidia's VDPAU implementation.

We can also incorporate the ybias in the brightness simplifying the
calculation somewhat.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-02-22 10:19:27 +01:00
Thomas Hellstrom
f01e947cdb gallium/vl Fix brightness matrix description
The brightness matrix doesn't actually match the procamp matrix and
what's calculated in vl_csc_get_matrix.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-02-22 10:18:30 +01:00
Thomas Hellstrom
ec8139e50c gallium/vl: Don't map vertex buffers on creation
It will cause multiple simultaneous maps of the same vertex buffer and
flushed-while-mapped warnings.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-02-22 10:17:51 +01:00
Thomas Hellstrom
f2872bf8c3 gallium/vl: Add sampler views to video filter fragment shaders
Needed for at least the svga driver.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-02-22 10:17:07 +01:00
Thomas Hellstrom
53b4584555 gallium/vl: declare sampler views in compositor shaders
The svga driver relies on the existence of these sampler views.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-02-22 10:15:16 +01:00
Brian Paul
b87ef9e606 util: fix MSVC build issue in disk_cache.h
Windows doesn't have dlfcn.h.  Protect the code in question
with #if ENABLE_SHADER_CACHE test.  And fix indentation.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-02-21 20:54:46 -07:00
Dave Airlie
40e0dbf96c radv: fix typo in the subpass barrier patch.
Fixes: dbb0eaccc radv: handle subpass cache flushes

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-22 02:22:30 +00:00
Rafael Antognolli
d71e1f32c6 i965/gen6+: Enable arb_transform_feedback_overflow_query.
This extension adds new query types which can be used to detect overflow
of transform feedback buffers. The new query types are also accepted by
conditional rendering commands.

v3:
    - s/gen7+/gen6+/ in the relnotes (Jordan Justen)

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-02-21 16:28:32 -08:00
Rafael Antognolli
924a1b90aa i965: Add support for xfb overflow query on conditional render.
Enable the use of a transform feedback overflow query with
glBeginConditionalRender. The render commands will only execute if the
query is true (i.e. if there was an overflow).

Use ARB_conditional_render_inverted to change this behavior.

v4:
    - reuse MI_MATH calcs from hsw_queryob (Kenneth)
    - fallback to software conditional rendering when MI_MATH is not
      available (Kenneth)

v5:
    - check query->Target (Kenneth)

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-02-21 16:28:32 -08:00
Rafael Antognolli
d03ec496ee i965: Add support for xfb overflow on query buffer objects.
Enable getting the results of a transform feedback overflow query with a
buffer object.

v4:
    - hsw_overflow_result_to_gpr0 a public function, so it can be used
      by conditional render. (Kenneth)
    - fix typo grp0/gpr0 (Kenneth)
    - rename load_gen_written_data_to_regs to
      load_overflow_data_to_cs_gprs (Kenneth)

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-02-21 16:28:32 -08:00
Rafael Antognolli
5933ec86fd i965: add plumbing for ARB_transform_feedback_overflow_query.
When querying for transform feedback overflow on one or all of the
streams, store information about number of generated and written
primitives. Then check whether generated == written.

v2:
    - use only SO_PRIM_STORAGE_NEEDED, do not fallback to
      CL_INVOCATION_COUNT. (Kenneth)

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-02-21 16:28:32 -08:00
Rafael Antognolli
a80ebff1b9 mesa: Track transform feedback overflow query objects.
Also update checks on conditional rendering.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-02-21 16:28:31 -08:00
Rafael Antognolli
273bab26af mesa: Add types for ARB_transform_feedback_oveflow_query.
Add some basic types and storage for the queries of this extension.

v2:
    - update date of extension (Kenneth)

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-02-21 16:28:31 -08:00
Eric Engestrom
89af6bf2cb gallium/docs: use imgmath instead of pngmath
WARNING: sphinx.ext.pngmath has been deprecated. Please use
	sphinx.ext.imgmath instead.

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-22 00:01:08 +00:00
Eric Engestrom
d88a0dffe3 gallium/docs: fix section title formatting
src/gallium/docs/source/tgsi.rst:3488: WARNING: Title underline too short.

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-22 00:01:01 +00:00
Eric Engestrom
5aa7fa2bbf gallium/docs: add missing newlines
Without these, mathjax considers these as the continuation of the
previous line.

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-22 00:00:57 +00:00
Eric Engestrom
3ae77c912e gallium/docs: add missing math formatting
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-22 00:00:51 +00:00
Eric Engestrom
3a0d2c54cf gallium/docs: fix sublist formatting
src/gallium/docs/source/context.rst:95: ERROR: Unexpected indentation.

Sub lists need to be surrounded by a blank line.

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-22 00:00:38 +00:00
Timothy Arceri
0441e6bc8b util/disk_cache: create timestamp and gpu_id dirs when MESA_GLSL_CACHE_DIR is used
The make check test is also updated to make sure these dirs are created.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-22 08:40:14 +11:00
Timothy Arceri
207e3a6e4b util/radv: move *_get_function_timestamp() to utils
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-22 08:40:00 +11:00
Kenneth Graunke
ed6b47f435 docs: Update features.txt and relnotes for GL_ARB_transform_feedback2 2017-02-21 12:38:13 -08:00
Kenneth Graunke
0a7b252c5b i965: Enable ARB_transform_feedback2 on Sandybridge.
The only feature over and above ES 3.0 is DrawTransformFeedback().

We already have to do the whole SOL_NUM_PRIMS_WRITTEN counter dance in
order to compute the SVBI value for ResumeTransformFeedback(), at which
point our existing GetTransformFeedbackVertexCount() implementation will
do the trick (though with a stall to CPU map the buffer).

Someday, we could probably implement DrawTransformFeedback() more
efficiently, using the "Load Internal Vertex Count" feature of
3DSTATE_SVB_INDEX and the 3DPRIMITIVE indirect vertex count bit.

Rumor has it this allows people to use WebGL 2.0 on Sandybridge.

Note that we don't need pipelined register writes like Gen7+ because
we use the 3DSTATE_SVB_INDEX command rather than MI_LOAD_REGISTER_MEM.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99842
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-02-21 12:38:13 -08:00
Kenneth Graunke
0235757422 i965: Properly reset SVBI counters on ResumeTransformFeedback().
This fixes Piglit's ARB_transform_feedback2/change-objects-while-paused
GLES 3.0 test.  When resuming the transform feedback object, we need to
reset the SVBI counters so we continue writing at the correct point in
the buffer.

Instead of SO_WRITE_OFFSET counters (with a DWord offset), we have the
Streamed Vertex Buffer Index (SVBI) counters, which contain a count of
vertices emitted.

Unfortunately, there's no straightforward way to store the current SVBI
counter values to a buffer.  They're not available in a register.  You
can use a bit in the 3DSTATE_SVB_INDEX packet to copy them to another
internal counter which 3DPRIMITIVE can use...but there's no good way to
extract that either.

So, once again, we use SO_NUM_PRIMS_WRITTEN to calculate the vertex
numbers.  Thankfully, we can reuse most of the existing Gen7+ code.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-02-21 12:38:13 -08:00
Kenneth Graunke
eb0331382a i965: Save max_index in brw_transform_feedback_object.
I'm going to need this in a new Resume hook shortly.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-02-21 12:38:13 -08:00
Kenneth Graunke
8513090cd7 i965: Update brw_save_primitives_written_counters for pre-Gen7.
Sandybridge and earlier only have a single counter.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-02-21 12:38:13 -08:00
Kenneth Graunke
42a4f91820 i965: Use ctx->Const.MaxVertexStreams rather than BRW_XFB_MAX_STREAMS.
This way on Sandybridge we'll only do 1 stream worth of math, since
we only have one SO_NUM_PRIMS_WRITTEN counter.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-02-21 12:38:13 -08:00
Kenneth Graunke
2af5f0caad i965: Move some code from gen7_sol_state.c to gen6_sol.c.
I plan to use these functions on Sandybridge soon.  I changed the prefix
on a couple of functions to "brw" instead of "gen7" as in theory they
should be usable all the way back to G45.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-02-21 12:38:13 -08:00
Kenneth Graunke
bf8dd21191 i965: Drop dead Gen8+ code from Gen7/sometimes-HSW driver hooks.
These driver hooks are not used when MI_MATH and MI_LOAD_REGISTER_REG
are supported, which Gen8+ can always do.  So this code is dead.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-02-21 12:38:13 -08:00
Marek Olšák
96cbc1ca29 vbo: kill primitive restart lowering in glDrawArrays
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-02-21 21:28:02 +01:00
Marek Olšák
63c462226e radeonsi: fix issues with monolithic shaders
R600_DEBUG=mono has had no effect since:

    commit 1fabb29717
    Author: Marek Olšák <marek.olsak@amd.com>
    Date:   Tue Feb 14 22:08:32 2017 +0100

    radeonsi: have separate LS and ES main shader parts in the shader selector

Also, this assertion was failing:
    si_state_shaders.c:1307: si_shader_select_with_key: Assertion
    `!shader->is_optimized' failed.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-21 21:27:23 +01:00
Marek Olšák
52581606c2 radeonsi: set no-signed-zeros-fp-math
Recommended by Matt Arsenault.

46757 shaders in 28742 tests
Totals:
SGPRS: 2068851 -> 2066907 (-0.09 %)
VGPRS: 1604056 -> 1602676 (-0.09 %)
Spilled SGPRs: 1402 -> 1382 (-1.43 %)
Spilled VGPRs: 113 -> 113 (0.00 %)
Private memory VGPRs: 1332 -> 1332 (0.00 %)
Scratch size: 3224 -> 3188 (-1.12 %) dwords per thread
Code Size: 58815520 -> 58716788 (-0.17 %) bytes
LDS: 1162 -> 1162 (0.00 %) blocks
Max Waves: 354616 -> 354905 (0.08 %)
Wait states: 0 -> 0 (0.00 %)

Totals from affected shaders:
SGPRS: 786452 -> 784508 (-0.25 %)
VGPRS: 530000 -> 528620 (-0.26 %)
Spilled SGPRs: 958 -> 938 (-2.09 %)
Spilled VGPRs: 85 -> 85 (0.00 %)
Private memory VGPRs: 636 -> 636 (0.00 %)
Scratch size: 1880 -> 1844 (-1.91 %) dwords per thread
Code Size: 26349936 -> 26251204 (-0.37 %) bytes
LDS: 304 -> 304 (0.00 %) blocks
Max Waves: 108962 -> 109251 (0.27 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-21 21:27:23 +01:00
Marek Olšák
fd3e73f54e gallivm: add no-signed-zeros-fp-math option to lp_create_builder (v2)
v2: define lp_float_mode

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-21 21:27:23 +01:00
Marek Olšák
84e72f2962 radeonsi: skip TESSINNER/OUTER offchip stores if TES doesn't read them
We were unconditionally storing these outputs, sometimes even one component
at a time, but apps never read them in TES.

Move the TESSINNER/OUTER buffer stores into the TCS epilog where we can
easily disable them on demand.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-21 21:27:23 +01:00
Marek Olšák
d633e23192 radeonsi: skip LDS stores in TCS if there are no LDS output reads
This removes a lot of useless LDS stores.

A few games read TESSINNER/OUTER, but not any other outputs. Most games
don't read any outputs.

The only app doing LDS output reads is UE4 Lightsroom Interior.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-21 21:27:23 +01:00
Marek Olšák
58af0a5385 tgsi/scan: add basic info about tessellation OUT and IN uses
not all of them will be used immediately

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-21 21:27:23 +01:00
Jason Ekstrand
f31ed6d0cd anv: Take a device parameter in anv_state_flush
This allows the helper to check for llc instead of having to do it
manually at all the call sites.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-02-21 12:26:35 -08:00
Jason Ekstrand
f408971deb anv: Pull all clflushing into a clflush_range helper
All this cache line address calculation stuff is tricky.  Let's not
duplicate it more places than we have to.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-02-21 12:26:35 -08:00
Jason Ekstrand
16b187c8bb anv: Remove the unused state_pool_emit macro
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-02-21 12:26:35 -08:00
Jason Ekstrand
f9d7d27d6d anv: Rename clflush_range and state_clflush
It's a bit shorter and easier to work with.  Also, we're about to add a
helper called clflush which does the clflush but without any memory
fencing.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-02-21 12:26:35 -08:00
Jason Ekstrand
075ed20614 intel/blorp: Explicitly flush all allocated state
Found by inspection.  However, I expect it fixes real bugs when using
blorp from Vulkan on little-core platforms.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-02-21 12:26:35 -08:00
Jason Ekstrand
b6b03329af anv: Put everything about queries in genX_query.c
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-02-21 12:26:35 -08:00
Jason Ekstrand
965fad0e8b anv/Makefile: alphabetize
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-02-21 12:26:35 -08:00
Jason Ekstrand
40087bcb51 anv/query: Perform CmdResetQueryPool on the GPU
This fixes a some rendering corruption in The Talos Principle

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-02-21 12:26:35 -08:00
Jason Ekstrand
dc9abd0e6b genxml: Make MI_STORE_DATA_IMM more consistent
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-02-21 12:26:35 -08:00
Jason Ekstrand
3788cd3239 anv/query: clflush the bo map on non-LLC platforms
Found by inspection

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-02-21 12:26:35 -08:00
Jason Ekstrand
8582ab2d6e anv: Add an invalidate_range helper
This is similar to clflush_range except that it puts the mfence on the
other side to ensure caches are flushed prior to reading.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-02-21 12:26:35 -08:00
Christian Gmeiner
e8d600710c etnaviv: remove number of pixel pipes validation
This validation was added before the etnaviv drm driver landed in
the linux kernel. Due some pre-merge API changes we had to fix-up
this value but with a mainline kernel this is not a problem anymore.

Lets remove that validation which also gets rid of problem caught
by Coverity, reported to me by imirkin.

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-02-21 21:14:35 +01:00
Christian Gmeiner
a0b16a0890 etnaviv: move pctx initialisation to avoid a null dereference
In case ctx->stream == NULL the fail label gets executed where
pctx gets dereferenced - too bad pctx is NULL in that case.

Caught by Coverity, reported to me by imirkin.

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-02-21 21:14:27 +01:00
Christian Gmeiner
f709096d0e etnaviv: add missing fallthrough annotation
Caught by Coverity, reported to me by imirkin.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-02-21 21:14:01 +01:00
Emil Velikov
383e8e2d5d docs/releasing.html: reword "distro breaking changes" hunk
v2: s/rare/rarely/ (Eric)

Suggested-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-02-21 18:39:40 +00:00
Emil Velikov
8b79f0ed08 radv: make radv_resolve_entrypoint static
Used only within the generated source file.

Fixes: 12301c5418 ("radv: drop the RADV_CALL macro.")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2017-02-21 18:31:16 +00:00
Emil Velikov
320561bd83 radv: remove unused radv_dispatch_table dtable
Fixes: 12301c5418 ("radv: drop the RADV_CALL macro.")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2017-02-21 18:31:14 +00:00
Emil Velikov
9807e9dea6 anv: remove unused anv_dispatch_table dtable
Fixes: 4c9dec80ed ("anv: Get rid of the ANV_CALL macro")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2017-02-21 18:31:04 +00:00
Emil Velikov
aa5baf1d50 i915: remove extern "C" guards
None of this code is used in C++ context.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-02-21 18:29:43 +00:00
Emil Velikov
0e74f390d9 i915: remove 'virtual' and extern C workarounds
Analogous to previous commit.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-02-21 18:29:41 +00:00
Emil Velikov
3ea07d2be9 i965: remove 'virtual' and extern C workarounds
The headers are properly annotated thus we don't need these.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-02-21 18:29:38 +00:00
Emil Velikov
8481914681 i965: add extern C notation in headers
Otherwise symbols wont be annotated with C linkage and we'll fail at
link time.

Currently this is worked around by wrapping the header inclusion itself.
The latter in itself fragile and not recommended.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-02-21 18:29:28 +00:00
Emil Velikov
dafc325f42 gallium: do not #include foo.h within extern C {}
Analogous to previous commit.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-02-21 18:29:25 +00:00
Emil Velikov
e4f971c85f nir: do not #include util/debug.h within extern C {}
It's a problem waiting to happen. Individual headers should be annotated
if needed.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-02-21 18:29:17 +00:00
Emil Velikov
7fcbb1a902 glsl: resolve extern C workarounds/hacks
Do not wrap header inclusion in extern C since it can cause issues.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-02-21 18:29:10 +00:00
Emil Velikov
a177a13033 st/mesa: move extern C wrappers where applicable
Namely, after the include directives. The headers are properly annotated
so keeping things as-is is only asking for trouble.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-02-21 18:29:07 +00:00
Emil Velikov
94b88c1c75 mesa/tests: remove unneeded extern C { #include foo } hack
The header itself (enums.h) is already properly annotated.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-02-21 18:29:01 +00:00
Emil Velikov
d5db27706c mesa: remove unneeded extern C {} wrapper
compiler.h defines a few mesa specific macros which are not C specific.
This allows us to avoid buggy extern C { #include $system_header }
constructs.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-02-21 18:28:59 +00:00
Emil Velikov
1451bcb125 mesa: annotate functions for C linkage
i.e. add extern C {} in program/symbol_table.h

It will allow us remove a workaround we have elsewhere in the code.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-02-21 18:28:55 +00:00
Emil Velikov
e776e0385c anv: remove unneeded extern C notation
Analogous to previous commit - never used in any C++ code.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-02-21 18:28:18 +00:00
Emil Velikov
944620bc0e radv: remove unneeded extern C notation
Header is never #include(d) by a C++ source.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-02-21 18:28:15 +00:00
Rhys Kidd
4bf9862747 glsl/tests: Add UINT64 and INT64 types
glsl/tests/uniform_initializer_utils.cpp:83:14: warning: enumeration value ‘GLSL_TYPE_UINT64’ not handled in switch [-Wswitch]
       switch (type->base_type) {
              ^
glsl/tests/uniform_initializer_utils.cpp:83:14: warning: enumeration value ‘GLSL_TYPE_INT64’ not handled in switch [-Wswitch]

Fixes: 8ce53d4a2f ("glsl: Add basic ARB_gpu_shader_int64 types")
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
2017-02-21 18:03:14 +00:00
Eric Engestrom
6181ab9d77 docs: fix gamma correction link
That link has been dead for 15 years...
We could link to Archive.org [1] to get the last time this page existed,
but I feel like Wikipedia is a better choice.

[1] http://web.archive.org/web/20021211151318/http://www.inforamp.net/~poynton/notes/colour_and_gamma/GammaFAQ.html

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-21 14:54:10 +00:00
Eric Engestrom
b347bbb63b docs: add link to gallium doc
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-21 14:43:29 +00:00
Nicolai Hähnle
066a117be7 radeonsi: fix UINT/SINT clamping for 10-bit formats on <= CIK
The same PS epilog workaround as for 8-bit integer formats is required,
since the CB doesn't do clamping.

Fixes GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels*.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-02-21 10:45:13 +01:00
Nicolai Hähnle
6a1d9684f4 radeonsi: handle MultiDrawIndirect in si_get_draw_start_count
Also handle the GL_ARB_indirect_parameters case where the count itself
is in a buffer.

Use transfers rather than mapping the buffers directly. This anticipates
the possibility that the buffers are sparse (once ARB_sparse_buffer is
implemented), in which case they cannot be mapped directly.

Fixes GL45-CTS.gtf43.GL3Tests.multi_draw_indirect.multi_draw_indirect_type
on <= CIK.

v2:
- unmap the indirect buffer correctly
- handle the corner case where we have indirect draws, but all of them
  have count 0.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-02-21 10:45:02 +01:00
Nicolai Hähnle
550125e1e7 winsys/amdgpu: reduce max_alloc_size based on GTT limits
Allocating huge buffers in VRAM is not a problem, but when those buffers
start being migrated, the kernel runs into errors because it cannot split
those buffer up for moving through GTT.

This should fix intermittent failures of
GL45-CTS.texture_buffer.texture_buffer_max_size

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-02-21 10:43:38 +01:00
Bas Nieuwenhuizen
8cff852ae2 radv: Don't flush at the start of a command buffer.
The preamble flushes now and the rest is the responsibility of the app.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-02-21 09:20:03 +01:00
Bas Nieuwenhuizen
5241fb0ffb radv: Flush in the initial preamble CS.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-02-21 09:19:58 +01:00
Bas Nieuwenhuizen
c121739c47 radv: Special case the initial preamble.
For flushing we don't want to flush every third IB.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-02-21 09:19:53 +01:00
Bas Nieuwenhuizen
eac790811b radv: Split emitting the cache flush out.
So that we can use it without a cmd_buffer.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-02-21 09:19:45 +01:00
Bas Nieuwenhuizen
b6e0df2edd radv: Free empty_cs on device destruction.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-02-21 09:18:50 +01:00
Ben Skeggs
8f4483b609 nvc0: use PascalB for most Pascal boards
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2017-02-21 10:01:16 +10:00
Dave Airlie
6dbb0eaccc radv: handle subpass cache flushes
This splits out the cache flush bit setting code
dependent on the src/dest access flags.

It then calls it from the subpass barrier code.

It also marks a TODO to remove the aggressive CS/PS
flushes at some point.

This fixes a bunch of the
dEQP-VK.renderpass.attachment_allocation.input_output.*
tests.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-21 09:48:37 +10:00
Grazvydas Ignotas
66d1cb587a r300g: only allow byteswapped formats on big endian
They cause regressions on little endian.

Fixes: 172bfdaa9e ("r300g: add support for PIPE_FORMAT_x8R8G8B8_*")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98869
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-02-21 00:37:02 +01:00
Timothy Arceri
87687afb94 mesa: remove unused variable warning in release builds
This assert might have made sense before but we no longer use
gl_linked_shader here. Unless the caller has really done something
crazy this assert is fairly useless.

We also do some small tidy ups in this change.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-02-21 08:46:04 +11:00
Emil Velikov
a40ebe73a1 docs/submittingpatches.html: document the Fixes tag
Provide information and an example.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-20 18:21:22 +00:00
Emil Velikov
9e4248b206 docs/submittingpatches.html: remove version tag for nominations
The version tag used to nominate has bitten even experienced mesa
developers. Not to mention that it deviates from the one used in the
kernel leading to further confusion.

Simplify things and omit it all together.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-20 18:21:22 +00:00
Emil Velikov
f9cdfa33c2 docs/submittingpatches.html: add #backports section
Provide information about merge conflicts resolution and sending
backports.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-20 18:21:22 +00:00
Emil Velikov
d7e0ff0e2b docs/submittingpatches.html: rework the #criteria section
Reword the section to focus on what is allowed, using a more brief, yet
descriptive wording.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-20 18:21:22 +00:00
Emil Velikov
af9a4d9005 travis: bring the scons build on par with AppVeyor
Namely, always build with LLVM and run the check target.

Cc: Rhys Kidd <rhyskidd@gmail.com>
Cc: Eric Anholt <eric@anholt.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-20 18:21:22 +00:00
Ben Crocker
3f1b6ef2aa gallivm: Reenable PPC VSX (v3)
Reenable the PPC64LE Vector-Scalar Extension for LLVM versions >= 3.8.1,
now that LLVM bug 26775 and its corollary, 25503, are fixed.

Amendment: remove extraneous spaces in macro def & invocations.

We would prefer a runtime check, e.g. via an LLVMQueryString
(analogous to glGetString, eglQueryString) or LLVMGetVersion API,
but no such API exists at this time.

Signed-off-by: Ben Crocker <bcrocker@redhat.com>
[Emil Velikov: remove LLVM_VERSION macro]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-20 18:21:22 +00:00
Ben Crocker
b934aae364 gallivm: Override getHostCPUName() "generic" w/ "pwr8" (v4)
If llvm::sys::getHostCPUName() returns "generic", override
it with "pwr8" (on PPC64LE).

This is a work-around for a bug in LLVM: a table entry for "POWER8NVL"
is missing, resulting in (big-endian) "generic" being returned on
little-endian Power8NVL systems.  The result is that code that
attempts to load the least significant 32 bits of a 64-bit quantity in
memory loads the wrong half.

This omission should be fixed in the next version of LLVM (4.0),
but this work-around should be left in place in case some
future version of POWER<n> also ends up unrepresented in LLVM's table.

This workaround fixes failures in the Piglit arb_gpu_shader_fp64 conversion
tests on POWER8NVL processors.

(V4: add similar comment in the code.)

Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Cc: 12.0 13.0 17.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-20 18:21:22 +00:00
Ben Crocker
a8e9c630f3 gallivm: Improve debug output (V2)
Improve debug output from gallivm_compile_module and
lp_build_create_jit_compiler_for_module, printing the
-mcpu and -mattr options passed to LLC.

V2: enclose MAttrs debug_printf block and llc -mcpu debug_printf
in "if (gallivm_debug & <flags>)..."

Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Cc: 12.0 13.0 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com> (v2)
[Emil Velikov: rebase]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-20 18:21:22 +00:00
Marek Olšák
e8c2a05662 gallium/u_suballoc: update comments
as requested by Brian. Trivial.
2017-02-20 18:04:27 +01:00
Jonathan Gray
a042465c21 util/build-id: define ElfW and NT_GNU_BUILD_ID if needed
Define ElfW() and NT_GNU_BUILD_ID if needed as these defines are not
present on at least OpenBSD and FreeBSD.  Fixes the build on OpenBSD.

Fixes: d4fa083e11 ("util: Add utility build-id code.")
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-20 16:39:24 +00:00
Mauro Rossi
41b5620492 android: define HAVE_DL_ITERATE_PHDR for build-id code
Required due to d4fa083 "util: Add utility build-id code."
to avoid following build error and warnings:

external/mesa/src/intel/vulkan/anv_device.c:60:32: error: incompatible integer to pointer conversion initializing 'const struct build_id_note *' with an expression of type 'int' [-Werror,-Wint-conversion]
   const struct build_id_note *note = build_id_find_nhdr("libvulkan_intel.so");
                               ^      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
external/mesa/src/intel/vulkan/anv_device.c:64:19: warning: implicit declaration of function 'build_id_length' is invalid in C99 [-Wimplicit-function-declaration]
   unsigned len = build_id_length(note);
                  ^
external/mesa/src/intel/vulkan/anv_device.c:68:4: warning: implicit declaration of function 'build_id_read' is invalid in C99 [-Wimplicit-function-declaration]
   build_id_read(note, uuid, VK_UUID_SIZE);
   ^
3 warnings and 1 error generated.
[ 40% 1438/3588] target  C: libmesa_vulkan_common_32 <= external/mesa/src/intel/vulkan/anv_image.c
ninja: build stopped: subcommand failed.
build/core/ninja.mk:148: recipe for target 'ninja_wrapper' failed
make: *** [ninja_wrapper] Error 1

Fixes: d4fa083e11 ("util: Add utility build-id code.")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-20 16:33:03 +00:00
Mauro Rossi
9e3d66c1e5 android: glsl: build shader cache sources
Fixes the following building errors:

external/mesa/src/compiler/glsl/linker.cpp:4642: error: undefined reference
 to 'shader_cache_read_program_metadata(gl_context*, gl_shader_program*)'
external/mesa/src/mesa/program/ir_to_mesa.cpp:3135: error: undefined reference
 to 'shader_cache_write_program_metadata(gl_context*, gl_shader_program*)'
clang++: error: linker command failed with exit code 1
...
external/mesa/src/mesa/program/ir_to_mesa.cpp:3135: error: undefined reference
 to 'shader_cache_write_program_metadata(gl_context*, gl_shader_program*)'
external/mesa/src/compiler/glsl/linker.cpp:4642: error: undefined reference
 to 'shader_cache_read_program_metadata(gl_context*, gl_shader_program*)'
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
ninja: build stopped: subcommand failed.
build/core/ninja.mk:148: recipe for target 'ninja_wrapper' failed
make: *** [ninja_wrapper] Error 1

Fixes: 9f8dc3bf03 ("utils: build sha1/disk cache only with
Android/Autoconf")
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-20 16:30:37 +00:00
Mauro Rossi
933988901a android: radeonsi: fix sid_table.h generated header include path
generated-sources-dir-for macro replaces intermediates-dir-for
and LOCAL_MODULE_CLASS is defined as required by new macro,
in order to avoid the following building error:

external/mesa/src/gallium/drivers/radeonsi/si_debug.c:29:10: fatal error: 'sid_tables.h' file not found
         ^
1 error generated.

Fixes: 730574c58e ("android: ac/debug: move sid_tables.h generation and
IB decode to amd/common")
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-20 16:23:13 +00:00
Emil Velikov
920b4d537f docs: add news item and link release notes for 13.0.5
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-20 11:56:39 +00:00
Emil Velikov
85acb42522 docs: add sha256 checksums for 13.0.5
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 112e75f51b)
2017-02-20 11:55:10 +00:00
Emil Velikov
2b06e91ded docs: add release notes for 13.0.5
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 71f3ff57fa)
2017-02-20 11:55:09 +00:00
Dave Airlie
0a44a680ff vulkan/wsi/x11: add support to detect if we can support rendering (v3)
This adds support to radv_GetPhysicalDeviceXlibPresentationSupportKHR
and radv_GetPhysicalDeviceXcbPresentationSupportKHR to check if the
local device file descriptor is compatible with the descriptor
retrieved from the X server via DRI3.

This will stop radv binding to an X server until we have prime
support in place. Hopefully apps use this API before trying
to render things.

v2: drop unneeded function, don't leak memory. (jekstrand)
v3: also check in surface_get_support callback.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-20 12:53:52 +10:00
Dave Airlie
1f6376935b Revert "radv: detect command buffers that do no work and drop them (v2)"
This just keeps popping up minor problems and regressions we should
revisit in a more sustainable manner later.

This also reverts:
Revert "radv: query cmds should mark a cmd buffer as having draws."
Revert "radv: also fixup event emission to not get culled."

This reverts commit d1640e7932.
This reverts commit 8b47b97215.
This reverts commit b4b19afebe.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-20 09:00:40 +10:00
Bas Nieuwenhuizen
81b2379664 radv: Handle VK_REMAINING_ARRAY_LAYERS in fast clear eliminate.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-02-19 20:58:06 +01:00
Marek Olšák
c8ef512398 gallium/u_index_modify: don't add PIPE_TRANSFER_UNSYNCHRONIZED unconditionally
It's OK for r300g (because r300g can't write to buffers via the GPU), but
not later hardware. This issue was spotted randomly.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-19 17:16:26 +01:00
Marek Olšák
a264fee624 radeonsi: fix UNSIGNED_BYTE index buffer fallback with non-zero start (v2)
start can only be non-zero with MultiDrawElements, which is unlikely
to occur with UNSIGNED_BYTE indices.

v2: Also fix the util_shorten_ubyte_elts_to_userptr call.
    Tested with the new piglit.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-19 17:16:26 +01:00
Dave Airlie
9aec76aca3 radv: handle layered fast clears.
This iterates the fast clear flush across the layers in the
specified range.

It also moves the compute resolve flush into the function
and builds the range in there.

This fixes:
dEQP-VK.geometry.layered.* regressions since fast clears.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-02-19 20:30:01 +10:00
Dave Airlie
efc89edf5a radv: pass subresourceRange by pointer.
This struct is 5 dwords, we should really just pass a pointer
to it.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-19 20:28:22 +10:00
Dave Airlie
2b3c490e23 radv: fix typo in a2b10g10r10 fast clear calculation.
This fixes:
dEQP-VK.renderpass.formats.a2b10g10r10_unorm_pack32*
regressions.

Fixes:
f22836dbdd radv: Add CPU color packing for VK_FORMAT_A2B10G10R10_UNORM_PACK32.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-02-19 20:27:28 +10:00
Bas Nieuwenhuizen
c7fcaf2314 radv: Invert ring SGPR check.
I assume this wants to check if all pipelines use the same SGPR for
the rings.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-02-19 10:13:11 +01:00
Bas Nieuwenhuizen
e12cf3f9bf radv: Clamp framebuffer dimensions to min. attachment dimensions.
Even though the preferred stance is not to fix incorrect applications
via the driver, this prevents some nasty GPU hangs.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-02-19 10:13:01 +01:00
Marek Olšák
ad019bf5c6 gallium: remove TGSI_OPCODE_CLAMP
Not used and not widely supported. Use MIN+MAX instead.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 02:58:43 +01:00
Marek Olšák
675ef9c0c7 ac/llvm: use min+max instead of AMDGPU.clamp on LLVM 5.0
It selects v_med3_f32, which has the same rate & size.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 02:58:43 +01:00
Marek Olšák
660b55e6d9 radeonsi: stop using TGSI_OPCODE_CLAMP by moving it amd/common
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 02:58:43 +01:00
Marek Olšák
73d1c8c686 tgsi/lowering: stop using TGSI_OPCODE_CLAMP
v2: do it correctly

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 02:58:43 +01:00
Marek Olšák
1d1b769561 st/mesa: stop using TGSI_OPCODE_CLAMP
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 02:58:43 +01:00
Marek Olšák
45240ce598 radeonsi: use R600_RESOURCE_FLAG_UNMAPPABLE where it's desirable
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Marek Olšák
a41587433c gallium/radeon: add R600_RESOURCE_FLAG_UNMAPPABLE
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Marek Olšák
9434421213 gallium/radeon: change r600_aligned_buffer_create to take flags, not bind
All call sites set bind = 0. The next commit will use this.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Marek Olšák
ac6007460a radeonsi: upload constants into VRAM instead of GTT
This lowers lgkm wait cycles by 30% on VI and normal conditions.
The might be a measurable improvement when CE is disabled (radeon)
or under L2 thrashing.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Marek Olšák
a550fbb510 gallium/radeon: use TCC line size as alignment in other places
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Marek Olšák
791e8ce04a radeonsi: use a clever alignment for index buffer uploads
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Marek Olšák
d6c8c26851 radeonsi: use a clever alignment for descriptor uploads
Non-VBO descriptors won't be smaller than the cache line, so simply use
the cache line size.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Marek Olšák
6b73aafceb radeonsi: use a clever alignment for constant buffer uploads
This results in a very tiny decrease in lgkm wait cycles.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Marek Olšák
620aded541 radeonsi: move index buffer flushing into a non-upload indexed case
The other codepaths don't need this.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Marek Olšák
22b8a773e1 radeonsi: use SI_MAX_ATTRIBS where it should be used
for consistency; no change in behavior

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Marek Olšák
054f853035 radeonsi: sort members of si_shader_key::part
and improve some comments

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Marek Olšák
1fabb29717 radeonsi: have separate LS and ES main shader parts in the shader selector
This might reduce the on-demand compilation if the initial VS/LS/ES
determination is wrong.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Marek Olšák
a02117ba6e radeonsi: don't compile pure monolithic shaders asynchronously
there is no point, we have to wait anyway.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Marek Olšák
9b91e0b54c radeonsi: allow unaligned vertex buffer offsets and strides on CIK-VI
So that we can disable u_vbuf for GL core profiles.

This is a v2 of the previous VI-only patch.
It requires SH_MEM_CONFIG.ALIGNMENT_MODE = UNALIGNED on CIK-VI.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Marek Olšák
2fb021b620 radeonsi: remove the fix_size3 workaround
not needed with the shader fallback

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Marek Olšák
dbd38f2a92 radeonsi: add a workaround for clamping unaligned RGB 8 & 16-bit vertex loads
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Marek Olšák
41a2157a68 radeonsi: make fix_fetch an array of uint8_t
so that we can add 3-component fallbacks.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Marek Olšák
f246ae1ee9 vl: fix a buffer leak in the bicubic filter by using an uploader
there's no error checking, because the previous code didn't do it either.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Marek Olšák
c8d84801b7 gallium/hud: create files after graphs are created to get final names
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
2017-02-18 01:22:08 +01:00
Marek Olšák
22c34bbc55 gallium/u_suballoc: allow setting pipe_resource::flags
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Marek Olšák
edf6bcf6c6 gallium/u_suballoc: use clear_buffer if available
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Marek Olšák
02cd8b20d1 gallium/util: correctly unref a buffer in u_prim_restart
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Marek Olšák
42297c862f gallium/util: remove unused u_index_modify helpers
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Marek Olšák
7f0bf00dc9 gallium/util: remove unused helper util_draw_texquad
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-18 01:22:08 +01:00
Marek Olšák
b5b0936677 gallium/docs: remove documentation of non-existent instructions
trivial
2017-02-18 01:22:08 +01:00
Jason Ekstrand
5f02c2a054 anv/TODO: Check off Storage Image Without Format
The code for this landed a few days ago.
2017-02-17 14:18:34 -08:00
Marek Olšák
edd23e0606 ac/llvm: fix various findMSB bugs
sffbh needs to be suffixed with ".i32"

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-02-18 06:24:32 +10:00
Jose Maria Casanova Crespo
429f112a11 glsl: link error if unsized array not-last in ssbo
If an unsized declared array is not the last in an SSBO
and an implicit size can not be defined on linking time,
the linker should raise an error instead of reaching
an assertion on GL.

This reverts part of commit 3da08e1664
getting back to the behavior of commit 5b2675093e

The original patch was correct for GLES that should produce
a compile-time error but the linker error is still necessary
in desktop GL.

Fixes the following piglit tests:
tests/spec/arb_shader_storage_buffer_object/non_integral_size_array_member.shader_test
tests/spec/arb_shader_storage_buffer_object/unsized_array_member.shader_test

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
2017-02-17 15:49:16 +02:00
Lionel Landwerlin
a0ac118398 i965/fs: fix uninitialized memory access
Found while running shader-db under valgrind.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-02-17 10:07:56 +00:00
Timothy Arceri
62c90492ef glsl: disable on disk shader cache when running as another user
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-17 20:21:22 +11:00
Alejandro Piñeiro
966ddd5d3d mesa/formatquery: use consistent local function names
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-02-17 10:17:54 +01:00
Bas Nieuwenhuizen
d5bf4c7394 radv: Use different allocator for descriptor set vram.
This one only keeps allocated memory in the list, and list nodes
in the descriptor sets. Thsi doesn't need messing around with
max_sets, and we get automatic merging of free regions.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-02-17 09:28:23 +01:00
Bas Nieuwenhuizen
f448701622 radv: Never try to create more than max_sets descriptor sets.
We only use the freed ones after all free space has been used. If
the app only allocates small descriptor sets, we might go over
max_sets before the memory is full.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
CC: <mesa-stable@lists.freedesktop.org>
Fixes: f4e499ec79
2017-02-17 09:28:14 +01:00
Samuel Iglesias Gonsálvez
fccbad73ef i965/fs: fix 32-bit data type to int64 conversion on BSW/BXT
The 32-bit to 64-bit conversions need to have the 32-bit
data source elements aligned to 64-bit but only with doubles as
destination type.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99660
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-02-17 06:50:22 +01:00
Timothy Arceri
172c48cc15 glsl: fix scons builds with shader cache
For now its disabled for scons so wrap glsl cache calls in a
define conditional.
2017-02-17 16:31:47 +11:00
Timothy Arceri
a2bf0954fb util/disk_cache: fix typo in function stub 2017-02-17 15:54:00 +11:00
Jason Ekstrand
b073811617 i965/fs: Remove hand-coded 64-bit packing optimizations
The optimization in unpack_64 is clearly subsumed with the opt_algebraic
optimizations in the previous commit.  The pack optimization may not be
quite handled by opt_algebraic but opt_algebraic should get the really
bad cases.  Also, it's been broken since it was merged and we've never
noticed so it must not be doing anything.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-02-16 17:28:03 -08:00
Jason Ekstrand
70e86a3f2d nir/algebraic: Optimize 64bit pack/unpack
This reduces the instruction count in some fp64 and int64 piglit tests

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-02-16 17:28:03 -08:00
Jason Ekstrand
e10f522cd7 nir: Rename lower_double_pack to lower_64bit_pack
There's nothing "double" about it other than, perhaps, the fact that it
packs two 32-bit values.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-02-16 17:28:03 -08:00
Jason Ekstrand
161d3e81be nir: Combine the int and double [un]pack opcodes
NIR is a typeless IR and the two opcodes, when considered bitwise, do
exactly the same thing.  There's no reason to have two versions.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-02-16 17:28:03 -08:00
Jason Ekstrand
a4393bd97f i965/fs: Fix the inline nir_op_pack_double optimization
We can only do the optimization if the source *is* SSA.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-02-16 17:28:03 -08:00
George Kyriazis
e2abe80bee swr: remove unneeded extern "C"
the guards have been added to the header files that needed them.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-16 18:22:27 -06:00
George Kyriazis
d4b4a511f6 gallium: add extern "C" guards
Added extern "C" __cplusplus guards on headers that did not have them.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-16 18:22:27 -06:00
Timothy Arceri
a3ab09f90f util/disk_cache: check cache exists before calling munmap()
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-17 11:18:43 +11:00
Timothy Arceri
512c046edd util/disk_cache: add support for removing old versions of the cache
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-17 11:18:43 +11:00
Timothy Arceri
3342ce452c util/disk_cache: allow drivers to pass a directory structure
In order to avoid costly fallback recompiles when cache items are
created with an old version of Mesa or for a different gpu on the
same system we want to create directories that look like this:

./{TIMESTAMP}_{LLVM_TIMESTAMP}/{GPU_ID}

Note: The disk cache util will take a single timestamp string, it is
up to the backend to concatenate the llvm string with the mesa string
if applicable.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-17 11:18:43 +11:00
Timothy Arceri
87009681a5 mesa: remove cache creation from _mesa_initialize_context()
We will change the way we create the cache directory in the following
patches.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-17 11:18:43 +11:00
Timothy Arceri
6602d0401c st/mesa/glsl: build string of dri options and use as input to building sha for shaders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-17 11:18:43 +11:00
Timothy Arceri
ed61530121 glsl: reserve parameter storage on cache restore
Since we know how big the list will be we can allocate the storage
upfront.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-17 11:18:43 +11:00
Timothy Arceri
1183eb487f glsl: don't try to load/store buffer object values in the cache
Also add an assert to catch buffer overflows.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-17 11:18:43 +11:00
Timothy Arceri
cad1a9bfde glsl: don't reprocess or clear UBOs on cache fallback
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-17 11:18:43 +11:00
Timothy Arceri
01d1e5a7ad glsl: skip more uniform initialisation when doing fallback linking
We already pull these values from the metadata cache so no need to
recreate them.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-17 11:18:43 +11:00
Timothy Arceri
794f7326bc glsl: don't lose uniform values when falling back to full compile
Here we skip the recreation of uniform storage if we are relinking
after a cache miss. This is improtant because uniform values may
have already been set by the application and we don't want to reset
them.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-17 11:18:43 +11:00
Timothy Arceri
0e9991f957 glsl: don't reference shader prog data during cache fallback
We already have a reference.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-17 11:18:43 +11:00
Timothy Arceri
2f19accc5e mesa/glsl: add cache_fallback flag to gl_shader_program_data
This will allow us to skip certain things when falling back to
a full recompile on a cache miss such as avoiding reinitialising
uniforms.

In this change we use it to avoid reading the program metadata
from the cache and skipping linking during a fallback.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-17 11:18:43 +11:00
Timothy Arceri
e3adde023b glsl: add api and glsl version to hash generation for shaders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-17 11:18:43 +11:00
Timothy Arceri
dc0c0c176d glsl: cache uniform values
These may be lowered constant arrays or uniform values that we set before linking
so we need to cache the actual uniform values.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-17 11:18:43 +11:00
Timothy Arceri
49f3439089 glsl: make uniform values helper available for use elsewhere
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-17 11:18:43 +11:00
Timothy Arceri
bb16cf805d glsl: cache some more image metadata
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-17 11:18:43 +11:00
Timothy Arceri
a3ff840d05 glsl: add support for caching atomic buffers
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-17 11:18:42 +11:00
Timothy Arceri
3d15d814c0 glsl: add shader cache support for buffer blocks
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-17 11:18:42 +11:00
Timothy Arceri
6761259958 glsl: store subroutine remap table in shader cache
V2: use new helpers to store/restore table entries.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-17 11:18:42 +11:00
Timothy Arceri
787535fb11 glsl: add support for caching subroutines
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-17 11:18:42 +11:00
Timothy Arceri
0057de58f9 glsl: add support for caching shaders with xfb qualifiers
For now this disables the shader cache when transform feedback is
enabled via the GL API as we don't currently allow for it when
generating the sha for the shader.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-17 11:18:42 +11:00
Timothy Arceri
3bbfee3cd3 glsl: add shader cache support for samplers
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-17 11:18:42 +11:00
Timothy Arceri
c4cff5f402 glsl: add basic support for resource list to shader cache
This initially adds support for simple uniforms and varyings.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-17 11:18:42 +11:00
Timothy Arceri
3c45d8f464 glsl: fix uniform remap table cache when explicit locations used
V2: don't store pointers use an enum instead to flag what should be
restored. Also do the work in a helper that we will later use for
the subroutine remap table.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-17 11:18:42 +11:00
Carl Worth
a01973a784 glsl: Serialize three additional hash tables with program metadata
The three additional tables are AttributeBindings, FragDataBindings,
and FragDataIndexBindings.

The first table (AttributeBindings) was identified as missing by
trying to test the shader cache with a program that called
glGetAttribLocation.

Many thanks to Tapani Pälli <tapani.palli@intel.com>, as it was review
of related work that he had done previously that pointed me to the
necessity to also save and restore FragDataBindings and
FragDataIndexBindings.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-17 11:18:42 +11:00
Timothy Arceri
e5bb4a0b0f glsl: use correct shader source in case of cache fallback
The scenario is:

glShaderSource
glCompileShader <-- deferred due to cache hit of shader

glShaderSource <-- with new source code

glAttachShader
glLinkProgram <-- no cache hit for program

At this point we need to compile the original source when we
fallback.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-17 11:18:42 +11:00
Timothy Arceri
8771940682 glsl: make use of on disk shader cache
The hash key for glsl metadata is a hash of the hashes of each GLSL
source string.

This commit uses the put_key/get_key support in the cache put the SHA-1
hash of the source string for each successfully compiled shader into the
cache. This allows for early, optimistic returns from glCompileShader
(if the identical source string had been successfully compiled in the past),
in the hope that the final, linked shader will be found in the cache.

This is based on the intial patch by Carl.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-17 11:18:42 +11:00
Timothy Arceri
34ca0fce22 glsl: add initial implementation of shader cache
This uses disk_cache.c to write out a serialization of various
state that's required in order to successfully load and use a
binary written out by a drivers backend, this state is referred to as
"metadata" throughout the implementation.

This initial version is intended to work with all stages beside
compute.

This patch is based on the initial work done by Carl.

V2: extend the file's doxygen comment to cover some of the
design decisions.

V3:
- skip cache for fixed function shaders
- add int64 support
- fix glsl IR program parameter caching/restore and cache the
  parameter values which are used by gallium backends.
- use new link status enum

V4:
- add compute program support

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-17 11:18:42 +11:00
Dave Airlie
b0232d98e9 radeonsi: use shared emit_umsb helper.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-16 22:57:16 +00:00
Dave Airlie
ebed22ec67 radv/ac: use shared umsb helper.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-16 22:57:16 +00:00
Dave Airlie
0ec66b9969 radeon/ac: add emit umsb shared code.
Since we shared imsb, makes sense to share umsb.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-16 22:57:16 +00:00
Dave Airlie
4617ad07e0 radeon/ac: use llvm.amdgcn.sffbh intrinsic instead of AMDGPU.flbit.i32
Use the newer intrinsic.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-16 22:57:16 +00:00
Dave Airlie
e933331cd7 radeonsi: use shared emit imsb code.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-16 22:57:15 +00:00
Dave Airlie
fb15a1e9dd radv/ac: use shader imsb emission code.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-16 22:57:15 +00:00
Dave Airlie
cae1ff1a4b radeon/ac: add ac_emit_imsb helper.
We want to use a different intrinsic on newer llvm, so move this
code to a shared area.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-16 22:57:15 +00:00
Emil Velikov
40bf7ba023 egl: _eglFilterArray's filter is always non-null
Drop the extra handling and assert() if things change in the future.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-02-16 15:27:20 +00:00
Emil Velikov
b8ae2fe3e6 docs: add hyperlink to the releasing documentation
Other files such as xlibdriver.html and versions.html explicitly left
out, for now.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-02-16 15:25:02 +00:00
Emil Velikov
cadf174866 util/disk_cache: do not allow space in MESA_GLSL_CACHE_MAX_SIZE
No other env var used in mesa allows for space in the variable contents.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-02-16 15:22:17 +00:00
Emil Velikov
350e8e821f configure.ac: remove unneeded trailing semicolon
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-02-16 15:17:52 +00:00
Emil Velikov
78c747e820 r100: use correct libdrm_radeon macro
Remove local definition of RADEON_INFO_TILE_CONFIG and use the correct
macro provided by libdrm_radeon RADEON_INFO_TILING_CONFIG.

Latter was present as of libdrm 2.4.22, sirca 2010.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-16 15:17:52 +00:00
Emil Velikov
c8f1f2dc2d winsys/radeon: remove fall-back defines
Provided by libdrm as of last commit.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-16 15:17:52 +00:00
Emil Velikov
f3637b3a1e configure.ac: bump LIBDRM_RADEON requirement to 2.4.71
Such that we can remove all the local fall-back definitions and use the
official UABI ones.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-16 15:17:52 +00:00
Emil Velikov
389478c4e9 bin/get-fixes-pick-list.sh: add new script
The script parses the "Fixes" tags and nominates respective commit if
applicable.

Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-02-16 15:17:52 +00:00
Emil Velikov
f1b0b75099 bin/get-pick-list.sh: remove ancient way of nominating patches
The old way of nominating patches [NOTE: .*[Cc]andidate] was
deprecated and has been unused for approx. 3 years.

Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-02-16 15:17:52 +00:00
Emil Velikov
d6b1d11d4f bin/get-pick-list.sh: limit `git grep ...' only as needed
Analogous to previous commit.

Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-02-16 15:17:52 +00:00
Emil Velikov
d292f12d94 bin/get-typod-pick-list.sh: limit `git grep ...' to only as needed
The currently used range HEAD..origin/master is far too broad. It looks
for nominations within the already_landed list (branchpoint..HEAD).

Similarly we look for already_landed whiting the [possible] nominations
Rand branchpoint..origin/master.

Improve things by limiting the look ups to the branch point.

Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-02-16 15:17:52 +00:00
Emil Velikov
71e00d62ed bin/get-extra-pick-list: rework to use already_picked list
Currently we loop (git log --grep) to check if the fix has landed. We
can simplify and make things faster by storing the already_picked list
and grep ping through it.

Slim down the message while we're here.

Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-02-16 15:17:52 +00:00
Emil Velikov
cb1947eac7 bin/get-extra-pick-list: use git merge-base to get the branchpoint
Since mesa development history is linear and the only diversion is at
the branchpoint. Thus we can drop the ad-hoc parsing and use git
merge-base to retrieve it.

Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-02-16 15:17:51 +00:00
Emil Velikov
1c0a536a72 docs: provide some tips where to obtain Mesa binaries
Mention the generic channels (PPA, Corp, other) as well as give a couple
of examples. Even if the latter became out of date the former should a
be good guide.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-02-16 15:17:51 +00:00
Emil Velikov
99266ec3ce docs/submittingpatches: assorted grammar fixes
Cc: Ben Crocker <bcrocker@redhat.com>
Suggested-by: Ben Crocker <bcrocker@redhat.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-16 15:17:51 +00:00
Emil Velikov
e280a6bc8a docs/releasing: update the website section
Things are automated via git hooks.

Cc: Brian Paul <brianp@vmware.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
---
Guys, let me know when things are in place.
2017-02-16 15:17:51 +00:00
Emil Velikov
652e367d5f docs/releasing: tweak the glxinfo/glxgear/etc. command lines
Print only the information needed. Namely:
*info: the DRI module picked and the vendor/renderer strings
*gears: everything but the "...configuration file..." line(s)

v2: (Eric) Use "2>&1 |" over "|&", properly escape &.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-16 15:17:51 +00:00
Emil Velikov
f9b18d5acc docs/releasing: build test the scons/mingw build
We had multiple cases in the past where files used only by the
Scons/MinGW/Windows build were missing.

Avoid such instances and add a step to catch them early.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-02-16 15:17:51 +00:00
Dave Airlie
03f4982c68 nir: handle some 64-bit integer conversions
These are enough for the spir-v generator to handle UConvert
and SConvert operations, and fix the 4 tests in CTS.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-16 14:13:21 +10:00
Dave Airlie
adb9555794 nir: handle 64-bit integer types in glsl->nir type conversion.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-16 14:13:14 +10:00
Dave Airlie
14167080e2 spirv: handle SpvOpUConvert in proper place.
This was falling into the quantizetof16 path.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-16 14:11:59 +10:00
Dave Airlie
2d0b145902 spirv: add support for Int64 capability
This just adds the support at the spirv->nir level for the Int64
cap.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-16 14:11:13 +10:00
Dave Airlie
48ebdbecc5 spirv/nir: add support for int64
This adds the spirv->nir conversion for int64 types.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-16 14:11:05 +10:00
Dave Airlie
7593f2ac1b nir/types: add C accessors for 64-bit integer types.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-16 14:10:45 +10:00
Dave Airlie
b292e662fc radv: add fast color clear for b10g11r11
This is used in DOOM, so provide the fast clear path for it.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-16 14:09:15 +10:00
Timothy Arceri
e6506b3cd2 mesa: retain gl_shader_programs after glDeleteProgram if they are in use
Fixes regressions from c505d6d852.

Switching from using gl_shader_program to gl_program for the pipline
objects CurrentProgram array meant we were freeing gl_shader_programs
immediately after glDeleteProgram was called, but the spec states
the program should only get deleted once it is no longer in use.

To work around this we add a new ReferencedPrograms array to track
gl_shader_programs in use.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-02-16 15:01:41 +11:00
Timothy Arceri
300900516d mesa: remove tabs in dri xmlconfig.c
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-16 14:47:13 +11:00
Timothy Arceri
703b592f7a mesa: style fixes for dri xmlconfig.c
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-16 14:47:13 +11:00
Chris Wilson
ed442ee39b i965: Do not use purged bo after calling glObjectUnpurgeable
If the buffer has been freed by the kernel under memory pressure, it is
invalid to try and access the backing storage for that buffer in the
future - the backing storage is not recreated automatically. As such we
need to mark the GL object as being freed for unretained buffers and so
recreate the object on next use.

Futhermore from the GL_APPLE_object_purgeable:

    "In contrast, by calling ObjectUnpurgeableAPPLE with an <option> of
    UNDEFINED_APPLE, the application is indicating that it intends to
    recreate the contents of the storage from scratch.  Further, the
    application is is stating that it would like the GL to do only the
    minimal amount of work set PURGEABLE_APPLE to FALSE.   If
    ObjectUnpurgeableAPPLE is called with the <option> set to
    UNDEFINED_APPLE, then ObjectUnpurgeableAPPLE will return the value
    UNDEFINED_APPLE."

we must always report GL_UNDEFINED_APPLE when called with
glObjectUnpurgeable(GL_UNDEFINED_APPLE).

Testcase: piglit/object_purgeable-api-*
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-02-15 17:00:42 -08:00
Matt Turner
a1891da7c8 Revert "i915: Always enable GL 2.0 support."
This partially reverts commit 97217a40f9.
It leaves ES 2.0 support in place per Ian's suggestion, because ES 2.0
is designed to work on hardware like i915.

Chrome only uses the GPU if you have GL >= 2.0, and using i915 (and
prog_execute) actually hurt performance compared with the software
paths.
2017-02-15 14:52:27 -08:00
Matt Turner
656e30b686 anv: Use build-id for pipeline cache UUID.
The --build-id=... ld flag has been present since binutils-2.18,
released 28 Aug 2007.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-02-15 13:59:51 -08:00
Matt Turner
d4fa083e11 util: Add utility build-id code.
Provides the ability to read the .note.gnu.build-id section of ELF
binaries, which is inserted by the --build-id=... flag to ld.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-02-15 13:59:51 -08:00
Bas Nieuwenhuizen
4e6095ff61 radv: Add support for shaderStorageImageReadWithoutFormat.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-02-15 21:18:21 +01:00
Bas Nieuwenhuizen
501a4c0d73 spirv: Add support for SpvCapabilityStorageImageReadWithoutFormat.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-02-15 21:18:18 +01:00
Bas Nieuwenhuizen
53873697e4 radv: Add support for shaderStorageImageWriteWithoutFormat.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-02-15 21:18:13 +01:00
Eduardo Lima Mitev
633c959fae getteximage: Return correct error value when texure object is not found
glGetTextureSubImage() and glGetCompressedTextureSubImage() are currently
returning INVALID_OPERATION error when the passed texture argument does not
correspond to an existing texture object. However, the error should be
INVALID_VALUE instead. From OpenGL 4.5 spec PDF, section '8.11. Texture
Queries', page 236:

    "An INVALID_VALUE error is generated if texture is not the name of
     an existing texture object."

Same wording applies to the compressed version.

The INVALID_OPERATION error is coming from the call to
_mesa_lookup_texture_err(). This patch uses _mesa_lookup_texture() instead
and emits the correct error in the caller.

Fixes: GL45-CTS.get_texture_sub_image.errors_test

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-15 19:37:21 +01:00
Jason Ekstrand
a9a517f530 util: Fix a typo in Makefile.sources
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-02-15 10:27:42 -08:00
Lionel Landwerlin
569231c55e i965: define default allow_higher_compat_version value
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 9d16f3903e ("driconf: add allow_higher_compat_version option")
2017-02-15 17:03:31 +00:00
Samuel Pitoiset
124d9dd57f drirc: add allow_higher_compat_version for Tropico 5
v2: s/force_compat_profile/allow_higher_compat_version

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-15 16:15:54 +01:00
Samuel Pitoiset
76c6d85cbd drirc: add allow_higher_compat_version for Crookz - The Big Heist
v2: s/force_compat_profile/allow_higher_compat_version

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-15 16:15:54 +01:00
Samuel Pitoiset
34d587abc2 drirc: add allow_higher_compat_version for Worms WMD
v2: s/force_compat_profile/allow_higher_compat_version

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-15 16:15:54 +01:00
Samuel Pitoiset
9d16f3903e driconf: add allow_higher_compat_version option
Mesa currently doesn't allow to create 3.1+ compatibility profiles
mainly because various features are unimplemented and bugs can
happen.

However, some buggy apps request a compat profile without using
any old features unimplemented in mesa, and they fail to start.

This option should help some games to run but it's not enough
for all (eg. Dying Light).

v2: - s/force_compat_profile/allow_higher_compat_version

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-15 16:15:32 +01:00
Marek Olšák
d1fae627fa gallium/radeon: add a HUD query for monitoring the CS thread activity
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-15 14:35:52 +01:00
Lionel Landwerlin
0fcb92c17d anv: wsi: report presentation error per image request
vkQueuePresentKHR() takes VkPresentInfoKHR pointer and includes a
pResults fields which must holds the results of all the images
requested to be presented. Currently we're not filling this field.

Also as a side effect we probably want to go through all the images
rather than stopping on the first error.

This commit also makes the QueuePresentKHR() implementation return the
first error encountered.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
2017-02-15 11:43:05 +00:00
Eric Engestrom
fc9b119013 egl: remove duplicate 0 assignment
The memset on the line before already takes care of this.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-02-15 08:57:05 +00:00
Hans de Goede
4c66f529a8 glx/glvnd: Fix GLXdispatchIndex sorting
Commit 8bca8d89ef ("glx/glvnd: Fix dispatch function names and indices")
fixed the sorting of the array initializers in g_glxglvnddispatchfuncs.c
because FindGLXFunction's binary search needs these to be sorted
alphabetically.

That commit also mostly fixed the sorting of the DI_foo defines in
g_glxglvnddispatchindices.h, which is what actually matters as the
arrays are initialized using "[DI_foo] = glXfoo," but a small error
crept in which at least causes glXGetVisualFromFBConfigSGIX to not
resolve, breaking games such as "The Binding of Isaac: Rebirth" and
"Crypt of the NecroDancer" from Steam not working and possible causes
other problems too.

This commit fixes the last of the sorting errors, fixing these mentioned
games not working.

Fixes: 8bca8d89ef ("glx/glvnd: Fix dispatch function names and indices")
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Cc: Adam Jackson <ajax@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-02-15 09:55:57 +01:00
Dave Airlie
b4b19afebe radv: also fixup event emission to not get culled.
This is possibly a bad idea, I might have to consider a better one.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-15 00:36:30 +00:00
Jason Ekstrand
bfbb362601 anv: Use vk_foreach_struct for handling extension structs
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-02-14 16:15:39 -08:00
Jason Ekstrand
f76584e7b7 util: Add helpers for iterating over Vulkan extension structs
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-02-14 16:15:39 -08:00
Dave Airlie
d1640e7932 radv: query cmds should mark a cmd buffer as having draws.
This fixes a regression with the remove non-draw cmd buffers in
queries.

Fixes: 8b47b97215 radv: detect command buffers that do no work and drop them (v2)
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-15 00:02:33 +00:00
Kenneth Graunke
a3e4fa5495 glsl: Handle packed_type == ivec4[] in lower_packed_varyings().
For GS input arrays, we may turn a packed_type of ivec4 into an
array of ivec4s.  We still want flat qualification.

Found by inspection.  Not known to help anything.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-02-14 14:47:40 -08:00
Jason Ekstrand
f434a60a53 anv: Implement the Skylake stencil PMA optimization
Unfortunately, this doesn't substantially improve the performance of any
known apps.  With Dota 2 on my Sky Lake gt4, it seems help by somewhere
between 0% and 1% but there's enough noise that it's hard to get a clear
picture.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-02-14 14:18:55 -08:00
Jason Ekstrand
d665c51eea genxml: Add the CACHE_MODE_0 register on gen9
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-02-14 14:18:55 -08:00
Jason Ekstrand
028e1137e6 anv/pipeline: Be smarter about depth/stencil state
It's a bit hard to measure because it almost gets lost in the noise,
but this seemed to help Dota 2 by a percent or two on my Broadwell
GT3e desktop.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-02-14 14:18:55 -08:00
Jason Ekstrand
215fed7318 anv/pipeline: Make a copy of VkPipelineDepthStencilStateCreateinfo
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-02-14 14:18:55 -08:00
Jason Ekstrand
e8d52dab48 anv: Add support for the PMA fix on Broadwell
This helps Dota 2 on Broadwell by 8-9%.  I also hacked up the driver and
used the Sascha "shadowmapping" demo to get some results.  Setting
uses_kill to true dropped the framerate on the demo by 25-30%.  Enabling
the PMA fix brought it back up to around 90% of the original framerate.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-02-14 14:18:55 -08:00
Jason Ekstrand
62bba4ba2d genxml: Add the CACHE_MODE_1 register on gen8
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-02-14 14:18:55 -08:00
Jason Ekstrand
6ce8592836 anv: Disable stencil writes when both write masks are zero
Vulkan doesn't have a stencilWriteEnable bit like it does for depth.
Instead, you have a stencil mask.  Since the stencil mask is handled as
dynamic state, we have to handle it later during command buffer
construction.  This, combined with a later commit, seems to help Dota2
on my Broadwell GT3e desktop by a couple percent because it allows the
hardware to move the depth and stencil writes to early in more cases.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-02-14 14:18:55 -08:00
Jason Ekstrand
114c281e70 anv/entrypoints: Only generate entrypoints for supported features
This changes the way anv_entrypoints_gen.py works from generating a
table containing every single entrypoint in the XML to just the ones
that we actually need.  There's no reason for us to burn entrypoint
table space on a bunch of NV extensions we never plan to implement.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-02-14 14:18:55 -08:00
Connor Abbott
6319bfc2a6 anv: fix Get*MemoryRequirements for !LLC
Even though we supported both coherent and non-coherent memory types, we
effectively forced apps to use the coherent types by accident. Found by
inspection, only compile tested.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
2017-02-14 13:05:44 -08:00
Marek Olšák
b5eb38f071 radeonsi: implement uploading zero-stride vertex attribs
This is the only kind of user buffer we can get with the GL core profile.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-14 22:04:35 +01:00
Marek Olšák
b8f3b00742 gallium/radeon: include SDMA in the GPU load query
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-14 21:47:51 +01:00
Marek Olšák
579ffe81f1 gallium/hud: add monitoring of API thread busy status
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-14 21:47:51 +01:00
Marek Olšák
626e4ef18f gallium/u_queue: add util_queue_get_thread_time_nano
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-14 21:47:51 +01:00
Marek Olšák
6c61a8bfc6 gallium/os: add per-thread time clock queries
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-14 21:47:51 +01:00
Marek Olšák
5d19b503af st/mesa: tell u_vbuf that GL core doesn't have user VBOs
I think this only affects radeonsi - VI, because all other drivers using
u_vbuf probably don't support GL_DOUBLE, so they won't be affected by this.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-14 21:47:51 +01:00
Marek Olšák
e0f95ddd3e gallium: let state trackers tell u_vbuf whether user VBOs are possible
This can affect whether u_vbuf will be enabled or not.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-14 21:47:51 +01:00
Marek Olšák
0561b3c75a vdpau: skip vlVdpOutputSurfacePutBitsNative with a zero-area rectangle
This prevents errors:
"EE r600_texture.c:1571 r600_texture_transfer_map - failed to create
 temporary texture to hold untiled copy"

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99542

Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-02-14 21:47:51 +01:00
Marek Olšák
c196efcf03 gallium/radeon: add an assertion to texture_transfer_map for app bugs
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Kai Wasserbäch <kai@dev.carbon-project.org>
2017-02-14 21:47:51 +01:00
Marek Olšák
4c36553a46 radeonsi: implement legacy GL_DOUBLE vertex formats
so that we can disable u_vbuf for GL core profiles.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-14 21:47:51 +01:00
Marek Olšák
2c8ee2e825 radeonsi: clean up si_get_param
has_streamout is always true

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-14 21:47:51 +01:00
Marek Olšák
4fe1fd4df4 gallium/hud: don't use user vertex buffers
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-14 21:46:16 +01:00
Marek Olšák
00d170a5c3 gallium/hud: call u_upload_alloc only once
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-14 21:46:16 +01:00
Marek Olšák
5699c8a2f7 gallium/u_upload_mgr: remove deprecated function u_upload_buffer
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Charmaine Lee <charmainel@vmware.com>
2017-02-14 21:46:16 +01:00
Marek Olšák
2ca3548eb9 gallium/radeon: remove the internal u_upload_mgr pointer
also remove the BIND flags

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Tested-by: Charmaine Lee <charmainel@vmware.com>
2017-02-14 21:46:16 +01:00
Marek Olšák
1e20112abd st/mesa: use the common uploader (v2)
v2: use const_uploader

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> (v1)
Tested-by: Charmaine Lee <charmainel@vmware.com>
2017-02-14 21:46:16 +01:00
Marek Olšák
d3de8e1096 gallium/vl: use the common uploader
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Tested-by: Charmaine Lee <charmainel@vmware.com>
2017-02-14 21:46:16 +01:00
Marek Olšák
b1dc347822 gallium/vbuf: use the common uploader
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Tested-by: Charmaine Lee <charmainel@vmware.com>
2017-02-14 21:46:16 +01:00
Marek Olšák
5fe5321633 gallium/blitter: use the common uploader
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Tested-by: Charmaine Lee <charmainel@vmware.com>
2017-02-14 21:46:16 +01:00
Marek Olšák
8a84585951 gallium/primconvert: use the common uploader
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Tested-by: Charmaine Lee <charmainel@vmware.com>
2017-02-14 21:46:16 +01:00
Marek Olšák
9f78ec39e9 gallium/hud: use the common uploader
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Tested-by: Charmaine Lee <charmainel@vmware.com>
2017-02-14 21:46:16 +01:00
Marek Olšák
55ad59d2b7 gallium: set pipe_context uploaders in drivers (v3)
Notes:
- make sure the default size is large enough to handle all state trackers
- pipe wrappers don't receive transfer calls from stream_uploader, because
  pipe_context::stream_uploader points directly to the underlying driver's
  stream_uploader (to keep it simple for now)

v2: add error handling to nv50, nvc0, noop
v3: set const_uploader

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> (v1)
Tested-by: Charmaine Lee <charmainel@vmware.com>
2017-02-14 21:46:16 +01:00
Marek Olšák
998396c32e gallium/u_upload_mgr: add a helper that creates the default uploader
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Tested-by: Charmaine Lee <charmainel@vmware.com>
2017-02-14 21:46:16 +01:00
Marek Olšák
d71bc0d741 gallium: add common uploaders into pipe_context (v2)
For lower memory usage and more efficient updates of the buffer residency
list. (e.g. if drivers keep seeing the same buffer for many consecutive
"add" calls, the calls can be turned into no-ops trivially)

v2: add const_uploader, add documentation

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Tested-by: Charmaine Lee <charmainel@vmware.com>
2017-02-14 21:46:16 +01:00
Dave Airlie
3360dbe0c1 radv: fixup IA_MULTI_VGT_PARAM handling.
This ports the remains of the workarounds from radeonsi for
the non-TESS cases. It should provide equivalent workarounds
for hawaii and bonarie.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-14 20:29:19 +00:00
Dave Airlie
a465eae38f radv: fix warning since using common gs emit code
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-14 20:02:13 +00:00
Dave Airlie
09bf5491c4 radv: adopt some init config workarounds from radeonsi.
Just one bonaire fix.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-15 05:02:33 +10:00
Dave Airlie
eea562f875 radv: re-enable init gfx state on CIK.
Once the color alignment was fixed this works fine now.

Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-15 05:02:29 +10:00
Dave Airlie
5e988ac61f radv: align the initial state command buffer.
This just adds the padding to align this to an 8 dword boundary.

Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-15 05:02:21 +10:00
Dave Airlie
0f1a4220a6 radv: fix cik macroModeIndex.
This just a CIK fix ported from radeonsi.

Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-15 05:02:13 +10:00
Dave Airlie
06ffd29925 radv: change base aligmment for allocated memory.
On some CIK (Hawaii) this needs to be at least 64k, I'm not 100% sure
it doesn't need to be 128k.

This was causing fast clear eliminate to overwrite the previous buffer,
which since my gfx init code, was the indirect buffer.

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=99692
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-15 04:59:57 +10:00
Alex Smith
924a8cbb40 anv: Add support for shaderStorageImageWriteWithoutFormat
This allows shaders to write to storage images declared with unknown
format if they are decorated with NonReadable ("writeonly" in GLSL).

Previously an image view would always use a lowered format for its
surface state, however when a shader declares a write-only image, we
should use the real format. Since we don't know at view creation time
whether it will be used with only write-only images in shaders, create
two surface states using both the original format and the lowered
format. When emitting the binding table, choose between the states
based on whether the image is declared write-only in the shader.

Tested on both Sascha Willems' computeshader sample (with the original
shaders and ones modified to declare images writeonly and omit their
format qualifiers) and on our own shaders for which we need support
for this.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-02-14 08:16:52 -08:00
Alex Smith
94d48b7f9f spirv: Add support for SpvCapabilityStorageImageWriteWithoutFormat
Allow that capability if the driver indicates that it is supported, and
flag whether images are read-only/write-only in the nir_variable (based
on the NonReadable and NonWritable decorations), which drivers may need
to implement this.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-02-14 08:16:52 -08:00
Iago Toral Quiroga
5c6eaa1421 nir/spirv: do not require a format with images that are not sampled
As soon as we support shaderStorageImageWriteWithoutFormat we can see
write-only images (sampled == 2) that don't have a format specified.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-02-14 08:16:52 -08:00
Jason Ekstrand
2c30918581 anv/apply_pipeline_layout: Set image.write_only to false
This makes our driver robust to changes in spirv_to_nir which would set
this flag on the variable.  Right now, our driver relies on spirv_to_nir
*not* setting var->data.image.write_only for correctness.  Any patch
which implements the shaderStorageImageWriteWithoutFormat will need to
effectively revert this commit.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-02-14 08:16:45 -08:00
Jason Ekstrand
f8dfe9b826 intel/isl: Add format metadata for typed reads/writes
This adds two columns to the format table as well as two helpers for
determining whether or not a given format is supported for typed reads
and writes.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-02-14 07:50:13 -08:00
Jason Ekstrand
0ef14cdc98 anv/cmd_buffer: Return a VkResult from verify_cmd_parser
This fixes a "statement with no effect" compiler warning

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-02-14 07:50:13 -08:00
Ilia Mirkin
956556b3c3 nvc0: disable linked tsc mode in compute launch descriptor
Empirically, this makes things work. Presumably this was originally
copied from the blob, which does make use of linked tsc mode.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99532
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2017-02-13 20:10:53 -05:00
Anuj Phogat
5e2909e732 mesa: Add EXT_frag_depth bits and enable it on all drivers
Passes the newly added piglit test for this extension on i965.

V2: Fix comments by Ilia.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-13 16:08:40 -08:00
Dave Airlie
b3b4114a0f radeonsi: use common sendmsg emission function.
This just ports radeonsi to use the sendmsg common code.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-14 00:03:22 +00:00
Dave Airlie
e3324e0c60 radv/ac: use sendmsg emission interface.
This uses the common code to emit the correct intrinsic.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-14 00:03:18 +00:00
Dave Airlie
f32955be43 radeon/ac/llvm: add support for sendmsg emission
This lets us use the new intrinsic on the correct
version of llvm.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-14 00:02:50 +00:00
Dave Airlie
f77d2871ac radv: disable gfx init on CIK for now
Luzipher on irc report this hangs his Hawaii, disable for now
until I get time to debug.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-14 08:01:39 +10:00
Dave Airlie
69fc7a2c82 tgsi: fix memory leak in tgsi sanity check
This just fixes this without repeating the code.

Reported-by: Li Qiang
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-14 08:00:30 +10:00
Dave Airlie
62fef3e159 radv/ac: use common interp code for new intrinsics
This uses the common fs interp code to use the new
llvm intrinsics so llvm can drop the old ones.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-14 07:48:01 +10:00
Dave Airlie
592069c1fb radv: use indirect buffer for initial gfx state.
This puts the common gfx state for the device into an
indirect buffer, and just calls out to it, on CIK and above.

This is taken from what radeonsi does.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-13 20:02:45 +00:00
Dave Airlie
b26253b34d radv: start splitting init config up
This is just prep work for the following patch to use
a common gfx init indirect buffer.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-13 20:02:34 +00:00
Dave Airlie
604e562e5b radv: don't pass physical device to si_init_ fns.
This is just a trivial cleanup.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-13 20:02:06 +00:00
Dave Airlie
8b47b97215 radv: detect command buffers that do no work and drop them (v2)
If a buffer is just full of flushes we flush things on command
buffer submission, so don't bother submitting these.

This will reduce some CPU overhead on dota2, which submits a fair
few command streams that don't end up drawing anything.

v2: reorganise loop to count first then malloc,
rename some vars (Bas)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-13 20:00:28 +00:00
Jason Ekstrand
d49d275c41 anv/blorp: Don't sanitize the swizzle for blorp_clear
BLORP is now smart enough to handle any swizzle (even those that contain
ZERO or ONE) in a reasonable manner.  Just let BLORP handle it.  This
fixes the following Vulkan CTS tests on Haswell:

 - dEQP-VK.api.image_clearing.clear_color_image.1d_b4g4r4a4_unorm_pack16
 - dEQP-VK.api.image_clearing.clear_color_image.2d_b4g4r4a4_unorm_pack16
 - dEQP-VK.api.image_clearing.clear_color_image.3d_b4g4r4a4_unorm_pack16

Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
2017-02-13 09:24:49 -08:00
Jason Ekstrand
e233db6e93 intel/blorp: Swizzle clear colors on the CPU
It's trivial to swizzle clear colors on the CPU, easily deals with the
hardware restrictions for render target swizzles, and makes swizzled
clears work on all hardware as opposed to just HSW+.

Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
2017-02-13 09:24:43 -08:00
Emil Velikov
bd1c61261f docs: add news item and link release notes for 17.0.0
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-13 12:05:34 +00:00
Emil Velikov
437b6a136e docs: add sha256 checksums for 17.0.0
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 80b41d9899)
2017-02-13 12:02:58 +00:00
Emil Velikov
2343b8a262 docs: Update 17.0.0 release notes
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 683462e680)
2017-02-13 12:02:56 +00:00
Emil Velikov
20ccff56a0 st/xlib: remove always true ifdef GLX_EXTENSION guards
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2017-02-13 10:15:02 +00:00
Emil Velikov
884fd1262f xlib: remove always true ifdef GLX_EXTENSION guards
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2017-02-13 10:14:40 +00:00
Emil Velikov
261d5e4c6d glx: remove always true XDAMAGE_1_1_INTERFACE guard
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-02-13 10:14:32 +00:00
Emil Velikov
87f485e957 scons: check for libXdamage 1.1 or later
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-02-13 10:14:23 +00:00
Emil Velikov
43b09ee0b2 configure.ac: check for libXdamage 1.1 or later
Released back in 2007 so it should not be an issue for anyone building
from git.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-02-13 10:14:06 +00:00
Emil Velikov
bfac8d1749 glx: remove DRI2DriverPrimeShift compile guards
DRI2DriverPrimeShift was added in dri2proto-2.8, which we now require
as of the previous commit.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-02-13 10:13:46 +00:00
Emil Velikov
a1662d0dab vl: remove DRI2DriverPrimeShift compile guards
DRI2DriverPrimeShift was added in dri2proto-2.8, which we now require as
of the previous commit.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-02-13 10:13:29 +00:00
Emil Velikov
cd1ebd8aba scons: add missing dri2proto requirement
Noticed while skimming through, although admittedly there's many other
dependencies that are not tracked by the scons build.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-02-13 10:13:24 +00:00
Emil Velikov
6689cc0392 configure.ac: dump dri2proto requirement to 2.8
dri2proto 2.8 was released 4+ years ago, so it must be of no surprise
for anyone building mesa from git.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-02-13 10:12:56 +00:00
Emil Velikov
404a5ca088 glx: remove always true ifdef guards
The two symbols referenced were introduced with v2.2 and 2.3 of
the dri2proto package and we require dri2proto >= 2.6.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-02-13 10:12:36 +00:00
Emil Velikov
4f080b46a8 winsys/intel: remove unused winsys - ilo was its only user
Cc: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-13 10:09:52 +00:00
Emil Velikov
6ffddba33b configure.ac: do not use deprecated macros - AC_HELP_STRING AC_ERROR
Replace with AS_HELP_STRING and AC_MSG_ERROR respectively, as spotted by
autoupdate.

Note that the suggested AC_CANONICAL_SYSTEM > AC_CANONICAL_TARGET change
is not addressed here since that requires very extensive testing.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-13 10:09:45 +00:00
Timothy Arceri
0cbde643eb util/disk_cache: correctly use stat(3)
I forgot to error check stat() and also I wasn't using the subdir in
is_two_character_sub_directory().

Fixes: d7b3707c61 "util/disk_cache: use stat() to check if entry is a directory"
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-13 10:01:12 +00:00
Michel Dänzer
0f53404565 configure.ac: Drop LLVM compiler flags more radically
Drop all -m*, -W*, -O*, -g* and -f* flags, with the exception of
-fno-rtti, which must be used if it's part of the llvm-config --cxxflags
output. We don't want LLVM to dictate the flags we use, and it can even
cause build failures, e.g. if LLVM and Mesa are built with different
compilers.

While we're at it, eat any whitespace preceding dropped flags as well.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-13 16:07:37 +09:00
Kenneth Graunke
57dc6d80a0 glsl: Drop resize-to-MaxPatchVertices hack.
TCS and TES inputs without an array size are implicitly sized to
gl_MaxPatchVertices.  But TCS outputs are apparently not:

   "If no size is specified, it will be taken from the output patch size
    (gl_VerticesOut) declared in the shader."

Fixes dEQP-GLES31.functional.program_interface_query.program_output.
array_size.separable_tess_ctrl.var.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2017-02-12 21:09:25 -08:00
Kenneth Graunke
1fad070f96 mesa: Ignore per-vertex array size in SSO pipeline validation.
We were already unwrapping types when the producer was a non-array
stage and the consumer was an arrayed-stage...but we ought to unwrap
both ends for TCS -> TES matching too.

This will allow us to drop the "resize to gl_MaxPatchVertices" check
shortly, which breaks some things.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2017-02-12 21:09:23 -08:00
Kenneth Graunke
e99df398f1 glsl: Update a comment about link errors for TCS && !TES.
OpenGL ES actually has spec text to prohibit this.  It's just OpenGL
that's confusing.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2017-02-12 21:09:21 -08:00
Kenneth Graunke
365afbdaef mesa: Do a draw time check for TES && !TCS in ES 3.x.
ES 3.x requires both TCS and TES to be present.  We already checked
the TCS && !TES case above, so we just have to check !TCS && TES here.

Note that this is allowed in OpenGL, just not ES.

This fixes a subcase of:
dEQP-GLES31.functional.debug.negative_coverage.*.tessellation.single_tessellation_stage

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2017-02-12 21:09:19 -08:00
Kenneth Graunke
05a56893aa mesa: Do (TCS && !TES) draw time validation in ES as well.
Now that we have OES_tessellation_shader, the same situation can occur
in ES too, not just GL core profile.

Having a TCS but no TES may confuse drivers - i965 crashes, for example.

This prevents regressions in
ES31-CTS.core.tessellation_shader.single.xfb_captures_data_from_correct_stage
with some SSO pipeline validation changes I'm making.

v2: Add an ES spec citation (suggested by Alejandro)

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2017-02-12 21:09:14 -08:00
Jason Ekstrand
c59d1ea51b i965/sampler_state: Set the "Base Mip Level" field on Sandy Bridge
Fixes two GL ES 3.0 CTS tests on Sandy Bridge:

ES3-CTS.functional.texture.mipmap.cube.base_level.linear_linear
ES3-CTS.functional.texture.mipmap.cube.base_level.linear_nearest

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
2017-02-12 17:56:32 -08:00
Jason Ekstrand
c4f8f395b2 i965/sampler_state: Pass texObj into update_sampler_state
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
2017-02-12 17:56:32 -08:00
Jason Ekstrand
9df3778016 i965/sampler_state: Clamp min/max LOD to 14 on gen7+
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
2017-02-12 17:56:32 -08:00
Ilia Mirkin
3970257cef st/mesa: don't pass compare mode for stencil-sampled textures
Fixes dEQP-GLES31.functional.stencil_texturing.misc.compare_mode_effect

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: mesa-stable@lists.freedesktop.org
2017-02-12 19:26:25 -05:00
Ilia Mirkin
3f8b886e73 nv50,nvc0: use alternate samplers for stencil
The blob uses these, and it fixes a bunch of dEQP stencil sampling tests
involving border colors. Probably the Z-based samplers work somehow
differently wrt border colors when using the stencil swizzle.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-12 18:22:17 -05:00
Bas Nieuwenhuizen
1811ccf125 radv: Fix radv_GetPhysicalDeviceQueueFamilyProperties2KHR.
The struct have different size, so the arrays have different stride.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-02-13 00:18:19 +01:00
Wladimir J. van der Laan
55e00c7cfe etnaviv: Set shader instruction area correctly for GC3000
- Use the same instruction area on GC3000 as the Vivante driver.
  This allows the same number of instructions on GC3000 as GC2000
  instead of half.

- Makes sure that the "PE to FE" stall before updating the shader code
  or constants is hit (which is conditional on vs_offset > 0x4000). This
  is necessary on GC3000 too, it increases stability.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-02-12 20:42:37 +01:00
Wladimir J. van der Laan
0fe60e4fcc etnaviv: Update hw header files
Update from etnaviv repository rnndb. This adds some newly
discovered state for GC3000 (and some GC2000) features.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-02-12 20:38:56 +01:00
Dave Airlie
f466d4dd6a radv: reduce CPU overhead merging bo lists.
Just noticed we do a fair bit of unneeded searching here.

Since we know that the buffers in a CS are unique already,
the first time we get any buffers, we can just memcpy those into
place, and when we are searching for subsequent CSes, we only
have to search up until where the previous unique buffers were.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-12 19:00:19 +00:00
Ilia Mirkin
48f04862c1 nvc0: set the render condition in the compute object
Fixes GL45-CTS.compute_shader.conditional-dispatching

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2017-02-11 21:06:52 -05:00
Ilia Mirkin
7e75f0913a gm107/ir: fix address offset bitfield for ATOMS
Fixes GL45-CTS.compute_shader.atomic-case1 on Maxwell

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2017-02-11 21:06:41 -05:00
Ilia Mirkin
b38aab50a0 nv50/ir: convert an ATOM.EXCH without a destination into a store
On SM35 there does not appear to be a way to emit a ATOM.EXCH with a
null destination. This should be functionally equivalent to a plain
store however, so just do that.

Fixes GL45-CTS.compute_shader.atomic-case2 on SM35.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-11 20:25:26 -05:00
Ilia Mirkin
2b0580123e nvc0: fix 64-bit integer query buffer writes
The former logic just plain didn't work at all. We need to write the
subsequent dword to the next buffer location.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-11 20:25:26 -05:00
Ilia Mirkin
399e267f0e nv50/ir: return a register when retrieving thread id sysval
We have logic to short-circuit such retrievals to zero. However "zero"
was an immediate, and some logic expected to get registers (to later be
propagated). Fix this by using loadImm.

Fixes GL45-CTS.gpu_shader5.images_array_indexing

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-11 20:25:26 -05:00
Ilia Mirkin
0d1edb01ec nv50/ir: add missing break after DSSG
Recently broken during int64 addition.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-11 17:21:55 -05:00
Christian Gmeiner
137ad879d5 etnaviv: shader-db traces
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
2017-02-11 21:22:53 +01:00
Christian Gmeiner
7256ed3c79 etnaviv: keep track of emitted loops
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
2017-02-11 21:22:48 +01:00
Christian Gmeiner
5a3ea68895 etnaviv: wire up core pipe_debug_callback
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
2017-02-11 21:22:42 +01:00
Jose Maria Casanova Crespo
5bc222ebaf glsl: non-last member unsized array on SSBO must fail compilation on GLSL ES 3.1
From GLSL ES 3.10 spec, section 4.1.9 "Arrays":

"If an array is declared as the last member of a shader storage block
 and the size is not specified at compile-time, it is sized at run-time.
 In all other cases, arrays are sized only at compile-time."

In desktop GLSL it is allowed to have unsized-arrays that are
not last, as long as we can determine that they are implicitly
sized, which is detected at link-time.

With this patch Mesa reports a compilation error as glslang does with
the following shader:

buffer SSBO { vec4 data[]; vec4 moreData;};
void main (void)
{
}

Fixes:
dEQP-GLES31.functional.debug.negative_coverage.log.shader.compile_compute_shader
dEQP-GLES31.functional.debug.negative_coverage.callbacks.shader.compile_compute_shader
dEQP-GLES31.functional.debug.negative_coverage.get_error.shader.compile_compute_shader

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-02-10 23:14:12 -08:00
Eric Anholt
0514b0bdc9 vc4: Enable glSampleMask() even when !rasterizer->multisample.
gallium's blitter expects that it can set the sample mask even when the
rasterizer doesn't have the flag on.

Between this and the previous test, 10 new ext_framebuffer_multisample
tests start passing.
2017-02-10 14:17:05 -08:00
Eric Anholt
5c86f119b9 vc4: Respect glSampleMask() even when we're not writing color.
gallium's quad-based blitter for copying MSAA depth textures expects to be
able to do 4 passes updating a sample at a time using glSampleMask, and
there's no color buffer bound when it's doing that.
2017-02-10 14:17:04 -08:00
Eric Anholt
30237193f5 vc4: Use the nir_builder helper for loading sample mask. 2017-02-10 14:17:04 -08:00
Eric Anholt
ce538a443d vc4: Use accurate 1/w in coordinate shader as well as vert shader.
We probably shouldn't be emitting different scaled viewport coordinates
between vertex and coord.
2017-02-10 14:17:04 -08:00
Eric Anholt
a0b6841838 vc4: Drop VS inputs to 8.
In the hardware we only get to declare 8 vertex elements (GLES2's
minimum), so we should be exposing that number here.  Fixes an assertion
failure in piglit texrect-many, at the expense of various GL 2.0-ish
minmax tests now complaining that our count is too low.
2017-02-10 14:17:04 -08:00
Eric Anholt
b230939303 vc4: Avoid emitting small immediates for UBO indirect load address guards.
The kernel will reject our shader if we emit one here, and having 4, 8, or
12 as the top end of our UBO clamp rare is enough that it's not worth
making the kernel let us.

Fixes piglit fs-const-array-of-struct and
fs-const-array-of-struct-of-array since recent GLSL linking changes made
us get this as an indirect load of a uniform, instead of a tempoary.

Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-02-10 14:17:04 -08:00
Timothy Arceri
d7b3707c61 util/disk_cache: use stat() to check if entry is a directory
d_type is not supported on all systems.

Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97967
2017-02-10 23:50:36 +11:00
Emil Velikov
463236bd31 st/nine: update configure options in the README
Cc: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-10 11:47:24 +00:00
Emil Velikov
b3b415609d configure.ac: supersede --enable-gallium-llvm over --enable-llvm
Currently we have extra (somewhat questionable) modularity, such that
one could build some parts with LLVM while others w/o.

That is extremely fragile, error prone and requires quite noticable
amount of code throughout.

Thus lets deprecate the gallium toggle in faviour of the generic one.
The former will throw a warning when set, and it will be overwritten by
the latter. This will allow gradual transition w/o breaking people's
scripts.

v2: Rebase, document in release notes.

Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de> (v1)
2017-02-10 11:47:24 +00:00
Emil Velikov
bdd6147e29 configure.ac: remove dummy radeon_gallium_llvm_check()
The extra function brings no added benefit as of earlier commit which
made llvm_require_version (as called by radeon_llvm_check) require LLVM
(--enable-gallium-llvm).

Fixes: 5f966a96af7 "configure.ac: Mandate --enable-gallium-llvm when
checking LLVM version"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
2017-02-10 11:47:24 +00:00
Emil Velikov
d4840c0c26 configure.ac: correctly manage llvm auto-detection
Earlier refactoring commits changed from one, dare I say it, broken
behaviour to another. Namely:

Before, as you explicitly --enable-gallium-llvm your selection was
ignored when llvm-config was not present/detected.
Today, the "auto" heuristics enables gallium llvm regardless if you have
llvm/llvm-config available or not.

Rework the auto-detection to attribute for llvm's presence.

v2: Set enable_gallium_llvm=no when LLVM is not found.

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
Reported-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-02-10 11:47:24 +00:00
Emil Velikov
ce65cc1f1f configure.ac: disable enable_gallium_llvm in the !x86 case
Already implicitly handled throughout, but keep it clear and disable
gallium-llvm. This change should be a no-op.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
2017-02-10 11:47:24 +00:00
Emil Velikov
4d8bb9cf8c configure.ac: set LLVM_{C, CXX, LD}FLAGS only as needed
Earlier refactoring commits started setting the above regardless if LLVM
is used or not. Move them to the respective section to restore the
original functionality.

Since we require the preprocessor flags (includes in particular) for the
header version parsing keep those as-is. They are not used outside of
configure.ac thus should not cause any side-effects.

As-is adding the C/CXXFLAGS can lead to build issues on when
cross-compiling.

Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reported-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
2017-02-10 11:47:24 +00:00
Emil Velikov
fc30992a54 Revert "configure.ac: Create correct LLVM_VERSION_INT with minor >= 10"
As stated in [1] by the LLVM devs, the new versioning scheme will not
deploy any minor version (i.e. it will always be zero). As such the
patch should not be needed.

This reverts commit 0e9a5be7e7.

[1] http://blog.llvm.org/2016/12/llvms-new-versioning-scheme.html
Reviewed-by: Tobias Droste <tdroste@gmx.de>
2017-02-10 11:47:24 +00:00
Emil Velikov
5e9f4a5f3f configure.ac: don't use == with test
Although it works, it's not the correct thing to do.

v2: Rebase
v3: Rebase

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de> (v1)
2017-02-10 11:47:23 +00:00
Emil Velikov
65ee9dff69 configure.ac: remove unused LLVM variables
LLVM_BINDIR is completely unused while others such as LLVM_LIBDIR are
used only internally. In the latter case there's no need to AC_SUBST it.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
2017-02-10 11:47:23 +00:00
Tobias Droste
143c566a81 configure.ac: Only define HAVE_LLVM if LLVM is used
Make sure that HAVE_LLVM compiler define is only set if LLVM is
actually used.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99010
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Tobias Droste <tdroste@gmx.de>
v2 [Emil] fold within the existing conditional
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-10 11:47:23 +00:00
Tobias Droste
04377cbdcf configure.ac: Rework MESA_LLVM and LLVM detection
Set FOUND_LLVM only when LLVM is present (checking for exact version/etc
is deferred) and use enable-gallium-llvm to indicate the global LLVM
status.

Renaming the latter is not appropriate for stable patches, so we'll
address it with a later commit.

Loosely based on work by Tobias.

v2: Check FOUND_LLVM if enable_gallium_llvm is set.

Cc: Dave Airlie <airlied@redhat.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
2017-02-10 11:47:23 +00:00
Emil Velikov
5869a7db75 configure.ac: move enable-gallium-llvm dependency with-gallium-drivers
... to where it's applicable.

Since we effectively made --enable-gallium-llvm mean --enable-llvm with
earlier commits, we need to move the requirement to guard the compnents
added for the LLVM draw.

Otherwise we'll error (as below) when building RADV w/o gallium drivers.

configure: error: --enable-gallium-llvm is required when building radv

v2: Don't remove but move the dependency (Tobias).

Cc: Dave Airlie <airlied@redhat.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
2017-02-10 11:47:23 +00:00
Emil Velikov
a66ffcd736 configure.ac: Mandate --enable-gallium-llvm when checking LLVM version
With this change we effectively require --enable-gallium-llvm when
building RADV. This should be perfectly safe since the gallium radeonsi
driver already explicitly requires it.

The "gallium" part in --enable-gallium-llvm is about to be removed soon
(not in stable), but until then make sure that things can build.

To reflect the requirement (as opposed to check previously) we rename
llvm_check_version_for to llvm_require_version

Cc: Dave Airlie <airlied@redhat.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
2017-02-10 11:47:23 +00:00
Emil Velikov
514a494415 configure.ac: Rename the gallium_require_llvm helper
Drop the gallium prefix since we're about it use it throughout the
configure.

Note we do want to check for enable_gallium_llvm check since (as
explicitly requested) the toggle should mean --enable-llvm. Latter of
which to be resolved with later patches.

Cc: Dave Airlie <airlied@redhat.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
2017-02-10 11:47:23 +00:00
Tobias Droste
f64d4d82bd configure.ac: Don't check LLVM version in require_llvm
This is actually not needed because the version is checked later.

Around line 2380
if test "x$enable_gallium_llvm" == "xyes"; then
    llvm_check_version_for $LLVM_REQUIRED_GALLIUM "gallium"
    llvm_add_default_components "gallium"
fi

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Cc: Tobias Droste <tdroste@gmx.de>
Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1)
v2: [Emil Velikov: rebase/respin series order]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-10 11:47:23 +00:00
Emil Velikov
38abcdba8a configure.ac: move AC_ARG_ENABLE([gallium-llvm] hunk further up
With next commits we'll require --enable-gallium-llvm (en route to a
greater good later on) for RADV. The latter is required to ensure that
as otherwise we'll fail to build.

Cc: Dave Airlie <airlied@redhat.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
2017-02-10 11:47:23 +00:00
Emil Velikov
3a7973fd15 configure.ac: remove unused AC_SUBST([MESA_LLVM])
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
2017-02-10 11:47:22 +00:00
Nicolai Hähnle
de6e6a347d loader: unconditionally include unistd.h and stdlib.h
Otherwise we would fail with "implicit declaration of function" geteuid
and getenv respectively.

To trigger (re)move the libdrm.pc file and use the following:

 $ ./autogen.sh --disable-egl --disable-gbm --disable-dri \
    --with-dri-drivers=swrast --with-gallium-drivers=swrast
 $ make

Cc: Vinson Lee <vlee@freedesktop.org>
Fixes: 3f462050c ("loader: Add an environment variable to override driver name choice.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99701
v2: [Emil: handle stdlib.h add commit message]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-10 11:47:12 +00:00
Emil Velikov
a04cb3f8a5 intel/blorp: do not return const data by get_px_size_sa()
Not much point in the const qualifier since we provide a copy to the
user. Resolves the following -Wignored-qualifiers warning.

src/intel/blorp/blorp_blit.c:1857:8: warning: 'const' type qualifier on
return type has no effect [-Wignored-qualifiers]

v2: keep const qualifier of local variable.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-02-10 11:47:12 +00:00
Marek Olšák
43a2ba1b7d gallium/radeon: use staging for texture read mappings from GTT WC
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-10 11:27:50 +01:00
Marek Olšák
dc7483f445 gallium/radeon: ignore the level parameter in buffer_transfer_map
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-10 11:27:50 +01:00
Marek Olšák
d86099df0a gallium/radeon: fix performance of buffer readbacks
We want cached GTT for all non-persistent read mappings.
Set level = 0 on purpose.

Use dma_copy, because resource_copy_region causes a failure in the PBO
read of piglit/getteximage-luminance.

If Rocket League used the READ flag, it should get cached GTT.

v2: mask out UNSYNCHRONIZED

Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-10 11:27:50 +01:00
Marek Olšák
24e3b06408 radeonsi: align vertex buffer descriptor list size for optimal prefetch
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-10 11:27:50 +01:00
Marek Olšák
3a534c5c7d radeonsi: align shader binaries to CP DMA alignment for optimal prefetch
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-10 11:27:50 +01:00
Marek Olšák
1a392a4377 radeonsi: move CP_DMA_ALIGNMENT definition
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-10 11:27:50 +01:00
Marek Olšák
4c288c73ea radeonsi: remove SI_CONTEXT_FLUSH_AND_INV_FRAMEBUFFER
not necessary

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-10 11:27:50 +01:00
Marek Olšák
65df38b191 radeonsi: remove separate CB/DB_META flush flags
not used separately

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-10 11:27:50 +01:00
Marek Olšák
8a2ae4153b radeonsi: reduce the number of FMASK input coordinates
Before:
  image_load v3, v[0:3] ...
After:
  image_load v3, v[0:1] ...

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-10 11:27:50 +01:00
Marek Olšák
28c06b3ceb radeonsi: write shader asm annotated with wave info into GPU hang reports
Note that the disassembly is written twice - first the unmodified compiler
output and then the wave-annotated output only if there are waves executing
the shader.

Sample output from a real GPU hang most likely caused by image_sample:

The number of active waves = 28

Pixel Shader - annotated disassembly:
    s_mov_b64 s[6:7], exec                                        ; BE86017E [PC=0x10f3e3800, off=0, size=4]
    s_wqm_b64 exec, exec                                          ; BEFE077E [PC=0x10f3e3804, off=4, size=4]
...
    image_sample v[7:9], v[0:1], s[12:19], s[20:23] dmask:0x7     ; F0800700 00A30700 [PC=0x10f3e3a94, off=660, size=8]
    s_buffer_load_dword s20, s[0:3], 0x50                         ; C0220500 00000050 [PC=0x10f3e3a9c, off=668, size=8]
    s_load_dwordx4 s[24:27], s[4:5], 0x170                        ; C00A0602 00000170 [PC=0x10f3e3aa4, off=676, size=8]
    s_load_dwordx8 s[12:19], s[4:5], 0x140                        ; C00E0302 00000140 [PC=0x10f3e3aac, off=684, size=8]
    s_buffer_load_dword s11, s[0:3], 0x5c                         ; C02202C0 0000005C [PC=0x10f3e3ab4, off=692, size=8]
    s_buffer_load_dword s21, s[0:3], 0x54                         ; C0220540 00000054 [PC=0x10f3e3abc, off=700, size=8]
    s_buffer_load_dword s22, s[0:3], 0x58                         ; C0220580 00000058 [PC=0x10f3e3ac4, off=708, size=8]
    s_waitcnt vmcnt(0)                                            ; BF8C0F70 [PC=0x10f3e3acc, off=716, size=4]
          ^ SE0 SH0 CU1 SIMD1 WAVE0  EXEC=aaaaaaa555aaaaaa  INST32=BF8C0F70
          ^ SE0 SH0 CU1 SIMD2 WAVE0  EXEC=aaaa85555555552a  INST32=BF8C0F70
          ^ SE0 SH0 CU1 SIMD3 WAVE0  EXEC=000000000000000a  INST32=BF8C0F70
          ^ SE0 SH0 CU6 SIMD1 WAVE0  EXEC=25a5a5aa82aaaaaa  INST32=BF8C0F70
          ^ SE0 SH0 CU6 SIMD3 WAVE0  EXEC=50aaaa8fffa55555  INST32=BF8C0F70
          ^ SE0 SH0 CU7 SIMD0 WAVE0  EXEC=5554aaaaaaa1a555  INST32=BF8C0F70
          ^ SE0 SH0 CU7 SIMD0 WAVE1  EXEC=aaaa5555ffffffff  INST32=BF8C0F70
          ^ SE0 SH0 CU7 SIMD1 WAVE0  EXEC=555557aaaaaaaaa5  INST32=BF8C0F70
          ^ SE0 SH0 CU7 SIMD3 WAVE0  EXEC=5555aaaaaaaaaa85  INST32=BF8C0F70
          ^ SE1 SH0 CU3 SIMD1 WAVE0  EXEC=aaaaaaaaaaaaaaaa  INST32=BF8C0F70
          ^ SE1 SH0 CU4 SIMD0 WAVE0  EXEC=aaaaaaaa5a5a5a5a  INST32=BF8C0F70
          ^ SE1 SH0 CU4 SIMD1 WAVE0  EXEC=aaaaaaa5a5a5a4a5  INST32=BF8C0F70
          ^ SE1 SH0 CU4 SIMD2 WAVE0  EXEC=5555555000000000  INST32=BF8C0F70
          ^ SE1 SH0 CU4 SIMD3 WAVE0  EXEC=aa555554155aaaaa  INST32=BF8C0F70
          ^ SE1 SH0 CU5 SIMD0 WAVE0  EXEC=55ffff55555555aa  INST32=BF8C0F70
          ^ SE1 SH0 CU5 SIMD1 WAVE0  EXEC=555555555aaaaaaa  INST32=BF8C0F70
          ^ SE1 SH0 CU5 SIMD2 WAVE0  EXEC=a0aaaaaaa8555555  INST32=BF8C0F70
          ^ SE1 SH0 CU5 SIMD3 WAVE0  EXEC=8aaaaaaaaaaaa555  INST32=BF8C0F70
          ^ SE1 SH0 CU6 SIMD0 WAVE0  EXEC=000000002aaaaaaa  INST32=BF8C0F70
          ^ SE2 SH0 CU1 SIMD0 WAVE0  EXEC=5aaaa5400aaaa15a  INST32=BF8C0F70
          ^ SE2 SH0 CU1 SIMD1 WAVE0  EXEC=00aaaaaaaa5555aa  INST32=BF8C0F70
          ^ SE2 SH0 CU1 SIMD2 WAVE0  EXEC=aa00005555554555  INST32=BF8C0F70
          ^ SE2 SH0 CU1 SIMD3 WAVE0  EXEC=aaaaaaa000000000  INST32=BF8C0F70
          ^ SE3 SH0 CU4 SIMD0 WAVE0  EXEC=5555aaaaaaaaaaaa  INST32=BF8C0F70
          ^ SE3 SH0 CU4 SIMD2 WAVE0  EXEC=ffaaaaaaaaaa5555  INST32=BF8C0F70
          ^ SE3 SH0 CU4 SIMD3 WAVE0  EXEC=aaaa55555555aa00  INST32=BF8C0F70
          ^ SE3 SH0 CU5 SIMD0 WAVE0  EXEC=00aaaaaaaaaaaa5a  INST32=BF8C0F70
          ^ SE3 SH0 CU5 SIMD1 WAVE0  EXEC=5a555555005555ff  INST32=BF8C0F70
    v_mul_f32_e32 v7, s6, v7                                      ; 0A0E0E06 [PC=0x10f3e3ad0, off=720, size=4]
...

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-10 11:27:50 +01:00
Marek Olšák
3de8c5a3c5 radeonsi: write wave information into GPU hang reports
UMR is our new debugging tool. It must have +s set for Mesa to use it
without root privileges:
  sudo chmod +s .../umr

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-10 11:27:50 +01:00
Marc-André Lureau
dc2d9b8da1 tgsi-dump: dump label if instruction has one
The instruction has an associated label when Instruction.Label == 1,
as can be seen in ureg_emit_label() or tgsi_build_full_instruction().

This fixes dump generating extra :0 labels on conditionals, and virgl
parsing more than the expected tokens and eventually reaching "Illegal
command buffer" (when parsing more than a safety margin of 10 we
currently have).

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-10 12:46:33 +10:00
Marc-André Lureau
bd1cab1168 tgsi: remove ureg_label_insn
Unused since commit 2897cb3dba.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-10 12:46:23 +10:00
Dave Airlie
e5a5d17d13 radv: handle queue submission with no cs but semaphores
It's legal to submit just semaphores with no command streams,
this patch fixes this case by emitting the empty cs, it also
handles the fence emission for this case better.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-09 23:45:33 +00:00
Timothy Arceri
a4086bb531 util/disk_cache: error check asprintf()
Fixes: f3d911463e "util/disk_cache: stop using ralloc_asprintf() unnecessarily"

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-02-10 09:25:32 +11:00
Timothy Arceri
41ad178b13 docs: add shader cache environment variables
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-02-10 09:22:52 +11:00
Ilia Mirkin
c95f821cb4 nvc0/ir: fix ubo max clamp, reset file index
We just increased the max UBO, so we should also increase the clamp that
we do for robustness. Similarly, as we're including the fileIndex in the
new indirect value, we should reset fileIndex to 0 so that it is not
added in a second time.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2017-02-09 15:50:58 -05:00
Ilia Mirkin
e4a698cb97 nv50/ir: always return 0 when trying to read thread id along unit dim
Many many many compute shaders only define a 1- or 2-dimensional block,
but then continue to use system values that take the full 3d into
account (like gl_LocalInvocationIndex, etc). So for the special case
that a dimension is exactly 1, we know that the thread id along that
axis will always be 0, so return it as such and allow constant folding
to fix things up.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-02-09 15:15:36 -05:00
Ilia Mirkin
1acdd62847 nvc0/ir: fix robustness guarantees for constbuf loads on kepler+ compute
Kepler and up unfortunately only support up to 8 constbufs. We work
around this by loading from constbufs as if they were storage buffers.
However we were not consistently applying limits to loads from these
buffers. Make sure to do the same thing we do for storage buffers.

Fixes GL45-CTS.robust_buffer_access_behavior.uniform_buffer

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2017-02-09 15:15:22 -05:00
Ilia Mirkin
59ca352fc5 nvc0: increase number of ubo binding points
Apparently GL 4.5 requires 14 of these (there's a "*" in the spec, but
it's unclear what it refers to). We need to expose an extra binding
point for the "program parameters", which means this must be 15. Remove
the last vestige of the "use c14 for immediates" idea.

Fixes GL45-CTS.shading_language_420pack.binding_uniform_block_array

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2017-02-09 15:15:08 -05:00
Ilia Mirkin
8a2d88e934 configure: add blurb about what the LIBDRM_*_REQUIRED stuff means
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-02-09 12:57:49 -05:00
Ilia Mirkin
1e4f5988ed nvc0: expose int64
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-09 12:57:49 -05:00
Ilia Mirkin
ab00a41a6e nvc0/ir: make it possible to have the flags def in def0
There's all kinds of logic that doesn't like there being holes in defs
or srcs lists. Avoid them. This also fixes the sched logic for maxwell.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-09 12:57:48 -05:00
Ilia Mirkin
61d7676df7 nvc0/ir: add support for 64-bit shift lowering on SM20/SM30
Unfortunately there is no SHF.L/SHF.R instruction pre-SM35. So we have
to do a bit more work to get the job done.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-09 12:57:48 -05:00
Ilia Mirkin
1aefd6159c nvc0/ir: add support for all the new int64 tgsi opcodes
A few thoughts:
 - Some of that LegalizeSSA logic should really live much earlier and be
   subject to the likes of DCE and other useful passes
 - Some of the "lowering" done in from_tgsi should be done later so that
   proper optimization might be done.

However this all works and the above can be improved upon later.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-09 12:57:48 -05:00
Pierre Moreau
009c54aa7a nv50/ir: Split 64-bit integer MAD/MUL operations
Hardware does not support 64-bit integers MAD and MUL operations, so we need
to transform them in 32-bit operations.

Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
2017-02-09 12:57:48 -05:00
Ilia Mirkin
22c705ea8c nvc0/ir: add a "high" subop for shifts, emit shf.l/shf.r for 64-bit
Note that this is not available for SM20/SM30.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-09 12:57:48 -05:00
Ilia Mirkin
2e986fa806 nvc0/ir: fix SET and SLCT emission
We were never emitting a .X flag for consuming condition code on SET,
and weren't emitting a signed type for SLCT comparison. Discovered while
working on int64 logic.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-09 12:57:48 -05:00
Ilia Mirkin
eac5099c11 nvc0/ir: add support for emitting partial min/max ops for int64
These operations allow you to compute min/max on arbitrary-width
integers, 32 bits at a time.

Note that the low/med ops implicitly set the condition code, and the
med/high ops implicitly consume it.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-09 12:57:48 -05:00
Ilia Mirkin
b090033087 gallium: add separate PIPE_CAP_INT64_DIVMOD
Nouveau does not currently have logic to implement this as a library
function. Even though such a library could be written, there's no big
advantage to do it that way for now given that int64 is a very uncommon
use-case. Allow a driver to expose INT64 without supporting division and
modulo operations.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-09 12:57:21 -05:00
Eric Engestrom
6a71a69a12 docs: improve the list of gl implementations
Suggested-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-02-09 15:45:08 +00:00
Eric Engestrom
8278f1ec35 docs: improve the list of implemented APIs
Suggested-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-02-09 15:44:51 +00:00
Matt Turner
d7a0486a9e glsl: Allow compatibility shaders with MESA_GL_VERSION_OVERRIDE=...
Previously if you used MESA_GL_VERSION_OVERRIDE=3.3COMPAT, Mesa exposed
an OpenGL 3.3 compatibility profile context (with various unimplemented
features and bugs), but still refused to compile shaders with

   #version 330 compatibility

This patch simply adds a small bit of plumbing to let that through.

Of course the same caveats apply: compatibility profile is still not
supported (and will not be supported), so there are no guarantees that
anything will work.

Tested-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-02-09 15:14:43 +00:00
Eric Engestrom
89b4176eb1 docs: reword sentence that my brain can't parse
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
2017-02-09 13:04:16 +00:00
Eric Engestrom
30cf9ffb59 docs: https all the links \o/
Most of them already redirected to https anyway, so we might as well
avoid the redirection and the security implications by linking directly
to the right protocol.

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-02-09 11:28:15 +00:00
Eric Engestrom
2b0fe3cff7 docs: fix gallium wiki link in relnotes
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-02-09 11:28:10 +00:00
Eric Engestrom
9f8a6a5b79 docs: update 'thanks' for hosting
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-02-09 11:26:22 +00:00
Samuel Iglesias Gonsálvez
ca16f0a282 i965/fs: add support for int64 to bool conversion
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99660
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-02-09 10:18:34 +01:00
Samuel Iglesias Gonsálvez
824e1bb078 nir: add opcode to perform int64 to bool conversions
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99660
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-02-09 10:18:34 +01:00
Samuel Iglesias Gonsálvez
7ab26613db i965/fs: Add support for nir_op_[iu]2[iu]32
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99660
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-02-09 10:18:34 +01:00
Samuel Iglesias Gonsálvez
7b5834ff54 i965/fs: Add support for nir_op_[iu]642f
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99660
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-02-09 10:18:34 +01:00
Samuel Iglesias Gonsálvez
b115407d75 i965/fs: legalize [u]int64 to 32-bit data conversions in lower_d2x
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99660
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-02-09 10:18:34 +01:00
Jason Ekstrand
8734461c58 i965/fs: Add support for nir_op_[iu]642d
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-02-09 10:18:34 +01:00
Jason Ekstrand
91d2d26f33 i965: Allow int64 conversion operations in channel_expressions
This fixes 143 of the new piglit tests added by Nicolai

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-02-09 10:18:34 +01:00
Timothy Arceri
f3d911463e util/disk_cache: stop using ralloc_asprintf() unnecessarily
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-02-09 14:11:24 +11:00
Timothy Arceri
0bf21519b7 glsl: add param to force shader recompile
This will be used to skip checking the cache and force a recompile.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-02-09 12:22:56 +11:00
Timothy Arceri
4026b45bbc util: add a disk_cache_remove() function
This will be used to remove cache items created with old versions
of Mesa or other invalid cache items from the cache.

V2: rename stub function (cache_* funtions were renamed disk_cache_*)
in master.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-02-09 12:22:56 +11:00
Timothy Arceri
a3fd8bb8c5 st/mesa/i965: create link status enum
For the on-disk shader cache we want to be able to differentiate
between a program that was linked and one that was loaded from cache.

V2:
 - don't return the new enum directly to the application when queried,
   instead return GL_TRUE or GL_FALSE as required. Fixes google-chrome
   corruptions when using cache.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-02-09 12:22:56 +11:00
Brian Paul
ac5845453c docs: update intro.html to mention new APIs, etc
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2017-02-09 00:02:20 +00:00
Brian Paul
b2722a8970 docs: the site is now hosted by freedesktop.org
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2017-02-09 00:02:13 +00:00
Bas Nieuwenhuizen
f22836dbdd radv: Add CPU color packing for VK_FORMAT_A2B10G10R10_UNORM_PACK32.
For allowing fast color clears in the main render targets of dota2.

[airlied: fix clear_vals[1] as suggested by Andres.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-08 22:43:11 +00:00
Roland Scheidegger
f64d74aa19 mesa: (trivial) include <inttypes.h> for PRIx64 macros
Fixes a compile error with mingw.
2017-02-08 21:56:16 +01:00
Tim Rowley
c1aa444a3e swr: [rasterizer jitter] Pass LLVM-IR size into jitter
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-02-08 13:58:13 -06:00
Tim Rowley
e0a829d320 swr: [rasterizer core] Frontend SIMD16 WIP
Removed temporary scafolding in PA, widended the PA_STATE interface
for SIMD16, and implemented PA_STATE_CUT and PA_TESS for SIMD16.

PA_STATE_CUT and PA_TESS now work in SIMD16.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-02-08 13:58:06 -06:00
Tim Rowley
79174e52b5 swr: [rasterizer jitter] Disable unsafe FP optimizations in the jitter
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-02-08 13:58:00 -06:00
Tim Rowley
db599e316a swr: [rasterizer core] Frontend SIMD16 WIP
Widen simdvertex to SIMD16/simd16vertex in frontend for passing VS
attributes from VS to PA.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-02-08 13:57:52 -06:00
Tim Rowley
09c54cfd2d swr: [rasterizer jitter] Add DEBUGTRAP jit builder function
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-02-08 13:57:47 -06:00
Tim Rowley
b01f26e005 swr: [rasterizer jitter] Multisample blend jit fix
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-02-08 13:57:41 -06:00
Tim Rowley
8780706c62 swr: [rasterizer jitter] Change SimdVector representation to array
Make all SimdVectors in LLVM represented as simdscalar[4] rather
than a struct.

Fixes issues with promotion of values from i32 to i64 to match
register width.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-02-08 13:57:33 -06:00
Tim Rowley
d159b0bf34 swr: [rasterizer jitter] Fix issues with stream-out on llvm>=3.8
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-02-08 13:57:27 -06:00
Tim Rowley
8423ad437b swr: [rasterizer jitter] Adjust jitter header includes
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-02-08 13:57:20 -06:00
Tim Rowley
feecd7dcf5 swr: [rasterizer core] Frontend SIMD16 WIP
SIMD16 Primitive Assembly (PA) only supports TriList and RectList.

CUT_AWARE_PA, TESS, GS, and SO disabled in the SIMD16 front end.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-02-08 13:57:10 -06:00
Eric Engestrom
a618d6c3e9 docs: update package contents
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-02-08 12:00:28 -07:00
Eric Engestrom
06e40dc671 docs: fix unpacking instructions
File names were wrong, file formats were wrong, bunzip command was
wrong...

I also removed all but the simplest example; people who use pipes already
know how to untar, so let's simplify and remove potential confusion for
non-tech-savvy users.

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-02-08 12:00:24 -07:00
Eric Engestrom
d7e1a16f1a docs: remove dead 'beta' link
Release candidates haven't been in a 'beta' subdir in a long time, so let's
replace the dead link with an explanation instead.

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-02-08 12:00:19 -07:00
Eric Engestrom
5b10c362de docs: add a note about the new version scheme
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-02-08 12:00:14 -07:00
Bartosz Tomczyk
94262e5f5d r600/sb: Fix memory leak
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-02-08 17:36:05 +01:00
Timothy Arceri
90014d0766 mesa: use PRId64/PRIu64 when printing 64-bit ints
V2: actually use PRIu64

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-02-08 13:50:01 +11:00
Dave Airlie
c674f11e42 mesa/st: fix strict aliasing issue in int64 code.
This fixes the int64 code same as the double code.

Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-08 02:13:07 +00:00
Dave Airlie
30cff4f5f7 mesa/uniform: fix strict aliasing issues with int64 code.
This fixes these like the double version does.

Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-08 02:12:31 +00:00
Dave Airlie
6d5d6dad20 radv: handle dcc in explicit image resolve path. (v2)
We need to initialize dcc like we do in the subpass path.

v2: fix initial/final layouts
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-07 23:31:08 +00:00
Bas Nieuwenhuizen
0d1283850b radv: Enable fast clears by default.
Works for me on dota2 and talos now.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
2017-02-07 22:58:06 +01:00
Jason Ekstrand
1de3cd8a34 spirv: Add more asserts in vtn_vector_construct
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99465
2017-02-07 08:08:06 -08:00
Emil Velikov
25aa98c014 configure.ac: remove src/gallium/winsys/intel/drm/Makefile reference
Not wired up (not referenced in any SUBDIR), leading to `make distcheck'
failure.

Fixes: d77fa310ed "ilo: EOL drop unmaintained gallium drv from buildsys"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-07 14:18:13 +00:00
Emil Velikov
73bce69938 docs: reword ilo removal note
Properly annotate <li> and keep the note analogous to all the previous
ones - OpenVG, st/egl, etc.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-07 14:18:12 +00:00
Boyan Ding
97495c428d configure.ac: Remove redundant libglvnd stanza
There were two "libglvnd configuration" section in the squashed commit
that added libglvnd support, while only one in the original libglvnd
branch. A following commit moves one of them downwards. Now remove the
upper "older" one and move GL_LIB name decision downwards after the new
libglvnd configuration section.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
2017-02-07 14:17:53 +00:00
Emil Velikov
bef4d74047 travis: use both cores for make/make check
The instance offers 2 cores, so use them to speed things up.

v2: Set MAKEFLAGS instead [Eric]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-02-07 11:14:10 +00:00
Emil Velikov
30267172c7 travis: add nearly all gallium drivers to the list
Note: we need the explicit --enable-freedreno for libdrm since the
latter is 'smart' and disables it if building on !arm platforms.

The radeonsi and swr are explicitly left out since they require
'too-recent' LLVM - 3.6

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-02-07 11:14:10 +00:00
Emil Velikov
96d86b18ee travis: correct libdrm required regex to also track libdrm itself
The current regex was tracking only the libdrm_foo packages, while with
recent changed we bumped only (and rightfully so) libdrm.

Fix the regex to track any libdrm package.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-02-07 11:14:10 +00:00
Emil Velikov
49f6408940 configure.ac: add swr to the gallium drivers list.
v2: Rebase on top of ILO removal.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-02-07 11:14:10 +00:00
Emil Velikov
9d5b681a11 configure.ac: list all the dri-drivers in the help string
It's unlikely that any of the additions come as a suprise to anyone
i915, nouveau, radeon, r200. Regardless, state clearly what's
available.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-02-07 11:14:09 +00:00
Marc Di Luzio
21efe2528c glsl: correct compute shader checks for memoryBarrier functions
As per the spec -
"The functions memoryBarrierShared() and groupMemoryBarrier() are
available only in compute shaders; the other functions are available
in all shader types."

Conform to this by adding another delegate to check for compute
shader support instead of only whether the current stage is compute

This allows some fragment shaders in Dirt Rally to compile

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-02-06 21:12:33 -08:00
Li Qiang
83fb63d31d gallium/tgsi: fix oob access in parse instruction
When parsing texture instruction, it doesn't stop if the
'cur' is ',', the loop variable 'i' will also be increased
and be used to index the 'inst.TexOffsets' array. This can lead
an oob access issue. This patch avoid this.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Li Qiang <liq3ea@gmail.com>
2017-02-07 14:00:04 +10:00
Kenneth Graunke
ce8a63de6d Revert "i965: Disable guardband clipping in the smaller-than-viewport case."
This reverts commit 0bac2551e4.

Now that we position the guardband correctly (applying translations
in addition to scaling) and made it as large (or larger) than the
render target, this shouldn't be necessary.

Now we leave guardband clipping enabled 100% of the time, like the
Windows driver does.

Fixes GL45-CTS.gtf21.GL2FixedTests.clip.clip.  It tries to draw a
16384x64 rectangle, and it appears that some kind of numerical
imprecisions in the clipper result in some edge pixels going missing.
The Windows driver passes this test because of guardband clipping.

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-02-06 17:40:14 -08:00
Kenneth Graunke
ece0e535a4 i965: Always scissor on Gen6-7.5 instead of disabling guardband.
Previously we disabled the guardband when the viewport was smaller than
the framebuffer on Gen6-7.5, to prevent portions of primitives from
being draw outside of the viewport.  On Gen8+, we relied on the viewport
extents test to effectively scissor this away for us.

We can simply always enable scissoring instead.  We already include the
viewport in the scissor rectangle, so this will effectively do the
viewport extents test for us.  (The only difference is that the scissor
rectangle doesn't support sub-pixel values.  I think that's okay.)

Given that the viewport extents test is essentially a second scissor,
and is enabled for basically all 3D drawing on Gen8+, it stands to
reason that scissoring is cheap.  Enabling the guardband reduces the
cost of clipping, which is expensive.

The Windows driver appears to never disable guardband clipping, and
appears to use scissoring in this case.  I don't know if they leave
it on universally though.

This fixes misrendering in Blender, where the "floor plane" grid lines
started rendering at wrong angles after I disabled XY clipping of line
primitives.  Enabling the guardband seems to solve the issue.

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99339
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-02-06 17:40:14 -08:00
Jason Ekstrand
f3c068c5c8 i965: Use a better guardband calculation.
(Patch co-authored by Jason and Ken.)

We scaled the guardband based on the viewport size, but failed to
take into account the translation portion of the viewport transform.

This meant the guardband was always centered around the origin.
We want it to be centered around the screen-space drawing area,
which is the intersection of the viewport and the render target.

At best, getting this wrong would reduce the guardband's effectiveness
in some cases.  At worst, it might break things - objects outside of the
guardband are trivially rejected, so getting the guardband in the wrong
place and leaving guardband clipping enabled could cause problems.

v2: drop clamping of positive maximums.

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-02-06 17:40:14 -08:00
Kenneth Graunke
89ad7f1be6 i965: Combine the Gen6 SF and Clip viewport atoms.
The next patch will make the guardband calculation dependent on the
transformation matrix.  Instead of computing it in both atoms, just
combine them into a single atom.

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-02-06 17:40:14 -08:00
Dave Airlie
90ac2285f0 radv: pass FMASK alignment to application
As was done for dcc and cmask.

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-07 10:42:01 +10:00
Bas Nieuwenhuizen
47ca0f537d radv: Pass DCC alignment to application.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
2017-02-07 01:19:22 +01:00
Bas Nieuwenhuizen
eb01b20cc4 radv: Pass CMASK alignment to application.
CMASK alignment can be greater than image data alignment, so pass
it to the app so that it knows what alignment to backing memory
should have.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-02-07 01:18:53 +01:00
Dave Airlie
a864ef7f48 radv/ac: avoid the fmask path when doing txs.
This fixes the vulkan samples deferredmultisampling test.

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-06 22:57:52 +00:00
Bruce Cherniak
11d6f836d0 swr: [rasterizer core] Removed unused clip code.
Removed unused Clip() and FRUSTUM_CLIP_MASK define.

Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2017-02-06 16:30:50 -06:00
Bruce Cherniak
bf29495dcd swr: [rasterizer core] Remove dead code Clipper::ClipScalar()
Clipper::ClipScalar() is dead code and should be removed.  It is causing
an error with gcc-7 because it references a now defunct member.

v2: includes bugzilla reference, same code change

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99633
CC: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2017-02-06 16:27:53 -06:00
Eric Anholt
72e6d1f00a gallium: Remove vc4 simulator hack from loader infrastructure.
Now that there's MESA_LOADER_DRIVER_OVERRIDE for choosing the driver name
we load, we don't need this any more.

v2: Get the junk out of pipe_loader_drm.c, too.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v2)
2017-02-06 12:44:06 -08:00
Eric Anholt
3f462050c2 loader: Add an environment variable to override driver name choice.
My vc4 simulator has been implemented so far by having an entrypoint
claiming to be i965, which was a bit gross.  The simulator would be a lot
less special if we entered through the vc4 entrypoint like normal, so add
a loader environment variable to allow the i965 fd to probe as vc4.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-06 12:44:06 -08:00
Eric Anholt
61bb1a9795 targets: Use a macro to reduce cut and paste in driver setup.
All the replicated prototypes/function bodies obfuscated the interesting
logic of the file: the mapping from driver enable macros to entrypoints we
expose, and the way that the swrast entrypoints are special compared to
the DRM entrypoints.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-06 12:44:06 -08:00
Dave Airlie
13a28ff236 radeon/ac: move common llvm build functions to a separate file.
Suggested by Marek.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-07 05:46:35 +10:00
Nicolai Hähnle
8822f4dfb9 eglmesaext: add new enums for EGL_MESA_drm_image_formats
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-02-06 17:41:28 +01:00
Nicolai Hähnle
6b0d390184 docs: add EGL_MESA_drm_image_formats extension proposal 2017-02-06 17:41:10 +01:00
Nicolai Hähnle
7be0e602ed dri/common: clear the loaderPrivate pointer in driDestroyDrawable
The GLX specification says about glXDestroyPixmap:

    "The storage for the GLX pixmap will be freed when it is not current
     to any client."

We're not really following this language to the letter: some of the storage
is freed immediately (in particular, the dri3_drawable, which contains both
GLXDRIdrawable and loader_dri3_drawable). So we NULL out the pointers to
that freed storage; the previous patches added the corresponding NULL-pointer
checks.

This fixes memory corruption in piglit
./bin/glx-visuals-depth/stencil -pixmap -auto

Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-06 17:39:44 +01:00
Nicolai Hähnle
f446f3fb33 glx: guard swap-interval functions against destroyed drawables
The GLX specification says about glXDestroyPixmap:

    "The storage for the GLX pixmap will be freed when it is not current
     to any client."

So arguably, functions like glXSwapIntervalMESA can be called after
glXDestroyPixmap has been called for the currently bound GLXPixmap.
In that case, the GLXDRIDrawable no longer exists, and so we just skip
those calls.

Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-06 17:39:30 +01:00
Nicolai Hähnle
21ec35566b glx/dri3: guard in_current_context against a disappeared drawable
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-02-06 17:39:10 +01:00
Nicolai Hähnle
40c304fc06 glx/dri3: handle NULL pointers in loader-to-DRI3 drawable conversion
With a subsequent patch, we might see NULL loaderPrivates, e.g. when
a DRIdrawable is flushed whose corresponding GLXDRIdrawable was destroyed.
This resulted in a crash, since the loader vs. DRI3 drawable structures
have a non-zero offset.

Fixes glx-visuals-{depth,stencil} -pixmap

Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-06 17:39:01 +01:00
Juan A. Suarez Romero
02264bc6f9 anv/pipeline: set ThreadDispatchEnable conditionally
Set 3DSTATE_WM/ThreadDispatchEnable bit on/off based on the same
conditions as used in the GL version.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-02-06 10:27:44 +01:00
Alejandro Piñeiro
dfb1b543f3 main/fboject: default_framebuffer allowed for GetFramebufferParameter
Before 4.5, the default framebuffer was not allowed for
GetFramebufferParameter, so it should return INVALID_OPERATION for any
call using the default framebuffer.

4.5 included new pnames, and some of them are allowed for the default
framebuffer. For the rest, INVALID_OPERATION. From OpenGL 4.5 spec,
section 9.2.3 "Framebuffer Object Queries:

   "An INVALID_OPERATION error is generated by GetFramebufferParameteriv
    if the default framebuffer is bound to target and pname is not one
    of the accepted values from table 23.73, other than
    SAMPLE_POSITION."

Fixes:
GL45-CTS.direct_state_access.framebuffers_get_parameter_errors

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-02-06 08:50:21 +01:00
Alejandro Piñeiro
0fb0c57b15 main/fbobject: implement new 4.5 pnames for GetFramebufferParameter
4.5 added new pnames allowed for GetFramebufferParameter, and
GetNamedFramebufferParameter.

From OpenGL 4.5 spec, section 9.2.3 "Framebuffer Object Queries" (quoting
the paragraph with only the new pnames, not all the supported):

   "pname may also be one of DOUBLEBUFFER,
    IMPLEMENTATION_COLOR_READ_FORMAT, IMPLEMENTATION_COLOR_READ_TYPE,
    SAMPLES, SAMPLE_BUFFERS, or STEREO, indicating the corresponding
    framebuffer-dependent state from table 23.73. Values of
    framebuffer-dependent state are identical to those that would be
    obtained were the framebuffer object bound and queried using the
    simple state queries in that table. These values may be queried
    from either a framebuffer object or a default framebuffer."

Fixes:
GL45-CTS.direct_state_access.framebuffers_get_parameters

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-02-06 08:50:21 +01:00
Alejandro Piñeiro
0cd2a4737e main/framebuffer: refactor _mesa_get_color_read_format/type
Current implementation returns the value for the currently bound read
framebuffer. GetNamedFramebufferParameteriv allows to get it for any
given framebuffer. GetFramebufferParameteriv would be also interested
on that method

It was refactored by allowing to pass a given framebuffer. If NULL is
passed, it used the currently bound framebuffer.

It also adds a call to _mesa_update_state. When used only by
GetIntegerv, this one was called as part of the extra checks defined
at get_hash. But now that the method is used by more methods, and the
update is needed, it makes sense (and it is safer) just calling it on
the method itself, instead of rely on the caller.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-02-06 08:50:21 +01:00
Dave Airlie
106a51440d radv: fix shared memory load/stores.
If we have an indirect index here we need to scale it by attribute slots
e.g. is this is vec2[256] then we get an indir_index in the 0.255 range
but the vec2 are aligned inside vec4 slots. So scale the indir index,
then extract the channels.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-03 19:53:03 +00:00
Dave Airlie
a1a8aef4c9 radv/ac: correctly size shared memory usage.
We count the number of slots used, but slots are vec4 sized,
so we have to scale by 16 not 4.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-03 19:52:13 +00:00
Dave Airlie
66463b7f75 radv: fix compute shared memory stores since 64-bit.
These regressed and caused doom to stop loading.

Fixes:
03724af26 radv/ac: Implement Float64 load/store var.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-03 19:51:52 +00:00
Brian Paul
023a9e3d92 docs: replace URL in features.txt
Replace unmaintained http://dri.freedesktop.org/wiki/MissingFunctionality
URL with http://mesamatrix.net/

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95460
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-02-03 12:02:38 -07:00
Brian Paul
2fac98f865 mesa: whitespace fixes in context.c
Remove trailing whitespace, replace tabs with spaces.  Trivial.
2017-02-03 11:48:25 -07:00
Nanley Chery
84dbf68378 anv/blorp: Disable resolves for transparent black clears
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-02-03 09:23:13 -08:00
Nanley Chery
93b819154f anv/cmd_buffer: Don't temporarily enable CCS_E within a render pass
Compressing a render target and decompressing it in the same
single-subpass render pass may waste bandwidth. While this may be
beneficial in some circumstances, it does not help in all. Reclaims
about 1.95% FPS for Dota 2 on some configurations.

v2 (Jason Ekstrand):
- Provide a more thorough comment
- Enable CCS_D for input attachments
v3 (Jason Ekstrand):
- Provide performance numbers

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-02-03 09:23:13 -08:00
Kenneth Graunke
3f064e9a40 mesa: Don't crash when destroying contexts created with no visual.
dEQP-EGL.functional.create_context.no_config tries to create a context
with no config, then immediately destroys it.  The drawbuffer is never
set up, so we can't dereference it asking if it's double buffered, or
we'll crash on a null pointer dereference.

Just bail early.

Applications using EGL_KHR_no_config_context could hit this.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-02-03 08:55:02 -08:00
Samuel Pitoiset
af303abcdb winsys/amdgpu: avoid potential segfault in amdgpu_bo_map()
cs can be NULL when it comes from r600_buffer_map_sync_with_rings()
to avoid doing the same checks. It was checked for write mappings
but not for read mappings.

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-02-03 12:07:14 +01:00
Tapani Pälli
0a2dcd3a8a android: fix droid_create_image_from_prime_fd_yuv for YV12
Earlier changes introduced is_ycrcb flag which checks the component
order of u and v components. Condition for setting the flag was
incorrect, with ycrcb we are supposed to have cr before cb.

This patch (together with a fix in our gralloc) fixes corrupted
rendering from 'test-opengl-gl2_yuvtex' native test and corrupted
gallery thumbnail in application switcher on Android-IA.

Fixes: 51727b1cf5
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
2017-02-03 07:44:33 +02:00
Edward O'Callaghan
3879425917 ilo: EOL unmaintained older gallium intel driver
This is no longer actively maintained and is just
accumulating bitrot.

Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Acked-by: Chia-I Wu <olvaffe@gmail.com>
2017-02-03 16:13:46 +11:00
Edward O'Callaghan
d77fa310ed ilo: EOL drop unmaintained gallium drv from buildsys
This is no longer actively maintained and is just
accumulating bitrot.

Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Acked-by: Chia-I Wu <olvaffe@gmail.com>
2017-02-03 16:13:36 +11:00
Edward O'Callaghan
01b625ef1a ilo: EOL unplumb unmaintained gallium drv from winsys
This is no longer actively maintained and is just
accumulating bitrot.

Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Acked-by: Chia-I Wu <olvaffe@gmail.com>
2017-02-03 16:13:32 +11:00
Ilia Mirkin
2b4eaabff0 configure: libdrm is a single package
The intent of the libdrm_$driver version limits has always been to not
burden the "other" drivers with updating their libdrm unless really
necessary. Unfortunately the configure script erroneously only checked
the driver-specific bit and not the generic bit of libdrm as well. Fix
this.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-02-02 20:36:09 -05:00
Ilia Mirkin
7d3f9ed71c st/mesa: MAX_VARYING is the max supported number of patch varyings, not min
This fixes
GL45-CTS.tessellation_shader.tessellation_shader_tessellation.max_in_out_attributes
on nouveau. We only support 30 patch varyings (as 2 vec4 slots end up
being used for tess level settings), but were getting 32 exposed.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-02-02 20:28:58 -05:00
Ilia Mirkin
e73f87fcbd vbo: process buffer binding state changes on draw when recording
The VBO module keeps track of any vbo buffers. It updates this list when
receiving an InvalidateState call, however this never happens when
recording draws right now. Make sure that we do all the usual state
updates when recording draws so that the VBO list may be kept up to
date.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99631
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-02-02 20:28:27 -05:00
Dave Airlie
6cc3c46f58 radv/ac: move to using shared emit_ddxy code.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-03 09:54:04 +10:00
Dave Airlie
c9a2fc3679 radeonsi/ac: move most of emit_ddxy to shared code.
We can reuse this in radv.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-03 09:54:04 +10:00
Dave Airlie
278d5ef70a radv/ac: use shared thread id code
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-03 09:54:04 +10:00
Dave Airlie
c5f0a56aeb radeonsi/ac: move get thread id to shared code.
radv will use this.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-03 09:54:04 +10:00
Dave Airlie
1c5c268a8a radv/ac: migrate to using shared code for some load/store stuff.
This migrates to the code shared with radeonsi.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-03 09:54:04 +10:00
Dave Airlie
b3c28942c7 radeonsi/ac: move tbuffer store and buffer load to shared code.
These are all reuseable by radv.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-03 09:54:04 +10:00
Dave Airlie
a9773311f6 radeonsi/ac: move a bunch of load/store related things to common code.
These are all shareable with radv, so start migrating them to the
common code.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-03 09:54:04 +10:00
Eduardo Lima Mitev
e198a64e35 texgetimage: Add check for the effective target to GetTextureSubImage
OpenGL 4.5 spec, section "8.11.4 Texture Image Queries", page 233 of
the PDF states:

    "An INVALID_OPERATION error is generated if texture is the name of a buffer
     or multisample texture."

This is currently not being checked and e.g a multisample texture image can
be passed down to the driver hook. On i965, it is crashing the driver with an
assertion:

intel_mipmap_tree.c:3125: intel_miptree_map: Assertion `mt->num_samples <= 1' failed.

v2: (Ilia Mirkin) Move the check from gettextimage_error_check() to
    GetTextureSubImage() and use the texObj target.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-03 00:43:46 +01:00
Marek Olšák
dfe111368d Revert "radeonsi: decrease the number of texture slots to 24"
This reverts commit bdd860e307.

Requested by a game developer.

Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-03 00:39:48 +01:00
Dave Airlie
b457f67495 configure.ac: explicitly require libdrm for dri classic drivers.
Although this might come from somewhere else require it explicitly.

Reviewed-by: Chad Versace <chadversary@chromium.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-03 09:38:15 +10:00
Jason Ekstrand
37a6f48ceb intel/isl: Add a better comment for format_supports_ccs_e
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-02-02 13:33:43 -08:00
Jason Ekstrand
45b3eb4dfc anv: Remove the finishme for CCS_E with storage images
The data port can't handle CCS at all so replace the finishme with
better comments.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-02-02 13:33:43 -08:00
Jason Ekstrand
fc9f0db8e3 intel/isl: Assert that we don't use CCS for storage images
I enabled CCS for storage images in the Vulkan driver and ran it through
the CTS.  It didn't result in any hangs but it demonstrated that the data
port cannot handle CCS.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-02-02 13:33:43 -08:00
Jason Ekstrand
7e6a9d9c4b intel/isl: Add a formats_are_ccs_e_compatible helper
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-02-02 13:33:43 -08:00
Jason Ekstrand
6142e3c07c intel/isl: Add a format_supports_ccs_d helper
Nothing uses this yet but it serves as a nice bit of documentation
that's relatively easy to find.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-02-02 13:33:43 -08:00
Jason Ekstrand
ab06fc6684 intel/isl: Rename supports_lossless_compression to supports_ccs_e
The term "lossless compression" could potentially mean multisample
color compression, single-sample color compression or HiZ because they
are all lossless.  The term CCS_E, however, has a very precise meaning;
in ISL and is only used to refer to single-sample color compression.
It's also much shorter which is nice.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-02-02 13:33:43 -08:00
Nanley Chery
043d92fef9 anv/pass: Store the depth-stencil attachment's last subpass index
Commit 968ffd6c86 stored the last subpass
index of all the attachments but that of the depth-stencil attachment.
This could cause depth buffers used in multiple subpasses not to be in
the requested final layout. Fix this error.

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2017-02-02 10:36:14 -08:00
Nicolai Hähnle
a020cb3a72 gallium: turn PIPE_SHADER_CAP_DOUBLES into a screen capability
Make the cap consistent with PIPE_CAP_INT64.

Aside from the hypothetical case of using draw for vertex shaders (and
actually caring about doubles...), every implementation supports doubles
either nowhere or everywhere.

Also, st/mesa didn't even check the cap correctly in all supported
shader stages.

While at it, add a missing LLVM version check for 64-bit integers in
radeonsi. This is conservative: judging by the log, LLVM 3.8 might be
sufficient, but there are probably bugs that have been fixed since then.

v2: fix clover (Marek)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-02-02 16:53:42 +01:00
Plamena Manolova
96123dbad9 mesa: Enable EXT_compressed_ETC1_RGB8_sub_texture
Since we already have the functionality in place and games
like Game of Thrones seem to depend on this extension, I
think it makes sense to enable it by making it part of
the extension string even though it's still a draft:

https://www.khronos.org/registry/gles/extensions/EXT/EXT_compressed_ETC1_RGB8_sub_texture.txt

Note: OES_compressed_ETC1_RGB8_sub_texture seems to be listed
in gl2ext.h, but there's no documentation for it in the KHR
registry

Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-02 12:28:31 +00:00
Vinson Lee
6ee4665a77 configure: Only require libdrm 2.4.75 for intel.
Fixes: b8acb6b179 ("configure: Require libdrm >= 2.4.75")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Dave Airlie <airlied@redhat.com>
2017-02-02 13:10:00 +10:00
Lionel Landwerlin
7158255069 anv: enable VK_KHR_shader_draw_parameters
Enables 10 tests from:

   dEQP-VK.draw.shader_draw_parameters.*

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-02-02 01:33:16 +00:00
Lionel Landwerlin
9413e11869 anv: emit DrawID if needed
v2: use define for buffer ID (Jason)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-02-02 01:33:06 +00:00
Lionel Landwerlin
543d5db4e2 anv: always allocate a vertex element with vertexid or instanceid
Up to now on Gen8+ we only allocated a vertex element for
gl_InstanceIndex or gl_VertexIndex when a vertex shader uses
gl_BaseInstanceARB or gl_BaseVertexARB. This is because we would
configure the VF_SGVS packet to make the VF unit write the
gl_InstanceIndex & gl_VertexIndex values right behind the values
computed from the vertex buffers.

In the next commit we will also write the gl_DrawIDARB value. Our
backend expects to pull the gl_DrawIDARB value from the element
following the element containing gl_InstanceIndex, gl_VertexIndex,
gl_BaseInstanceARB and gl_BaseVertexARB (see
vec4_vs_visitor::setup_attributes). Therefore we need to allocate an
element for the SGVS elements as long as at least one of the SGVS
element is read by the shader. Otherwise our shader will use a
gl_DrawIDARB value pulled from the URB one element too far (most
likely garbage).

v2: Fix my english (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-02-02 01:32:58 +00:00
Lionel Landwerlin
289aef771d anv: move BaseVertexID/BaseInstanceID vertex buffer index to 31
v2: use define for buffer ID (Jason)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-02-02 01:32:48 +00:00
Lionel Landwerlin
98cf60a3ce anv: limit vertex buffers to 31
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-02-02 01:32:39 +00:00
Mauro Rossi
9c45bb731c android: fix llvm, elf dependencies for M, N releases
These changes set the correct llvm version and elf include path
which differ for Marshmallow and Nougat

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2017-02-01 23:01:35 +00:00
Jason Ekstrand
ccdd5b3738 anv: Don't use bogus alpha swizzles
For RGB formats in Vulkan, we use the corresponding RGBA format with a
swizzle of RGB1.  While this swizzle is exactly what we want for
texturing, it's not allowed for rendering according to the docs.  While
we haven't been getting hangs or anything, we should probably obey the
docs.  This commit just sanitizes all render swizzles so that the alpha
channel maps to ALPHA.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-02-01 14:41:06 -08:00
Micah Fedke
752ae38a09 Add missing copyright header to wayland-egl-priv.h
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-01 22:33:40 +00:00
Dave Airlie
cda9f3d8ec radv: handle VK_QUEUE_FAMILY_IGNORED in image transitions (v3)
The CTS tests at least are using this, and we were totally
ignoring it.

This hopefully fixes the bouncing multisample CTS tests.

v2: get family mask in ignored case from command buffer.
v3: only change things in one place, use logic from Bas.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-02 08:25:04 +10:00
Dave Airlie
fa316ed02f radv/ac: handle clip/cull distance sizing in geometry shader outputs
Otherwise we were writing these as 4 components, and things went bad.

Fixes (the remaining):
dEQP-VK.clipping.user_defined.*.vert_geom.*

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-02 08:25:04 +10:00
Dave Airlie
230e308ff9 radv/ac: add const_index to fetch index for gs inputs
This fixes clip distance fetches as they are single item loads
with a const_index like float[1].

Fixes:
dEQP-VK.clipping.user_defined.*.vert_geom.[0-6]

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-02 08:25:04 +10:00
Dave Airlie
dc68b920df radeonsi/ac: move frag interp emission code to shared llvm code.
This code should be used in radv, so move it to a shared location
in advance of doing that.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-02 08:24:53 +10:00
Timothy Arceri
b940b2fd16 st/mesa: inline get_mesa_program()
In the past I've gotten this function confused with the one in
ir_to_mesa.cpp of the same name. Now that the affected flag setting
has move into a helper it makes sense just to inline this remaining
code.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-02 08:31:28 +11:00
Timothy Arceri
a7050ea1f9 st/mesa: create set_prog_affected_state_flags() helper
This will be used when restoring tgsi from the on-disk shader
cache.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-02 08:31:28 +11:00
Timothy Arceri
8d3d8a6d4e st/mesa: st_atom_shader.c C99 tidy up
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-02 08:31:28 +11:00
Timothy Arceri
f3e2428a7a st/mesa: remove pre C99 statement block for variable declaration
Acked-by: Marek Olšák <marek.olsak@amd.com>
2017-02-02 08:31:28 +11:00
Jason Ekstrand
0c114f2cf0 isl: Add assertions for render target swizzle restrictions
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-02-01 12:07:54 -08:00
Boyuan Zhang
f90ccf48bc st/va: add h264 constrained baseline profile
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-02-01 14:32:32 -05:00
Boyuan Zhang
d596bd29ec st/vdpau: add h264 constrained baseline profile
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-02-01 14:32:32 -05:00
Boyuan Zhang
c29191eea8 radeon/uvd: add h264 constrained baseline support
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-02-01 14:32:32 -05:00
Boyuan Zhang
22841ec84a vl: add h264 constrained baseline profile
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-02-01 14:32:32 -05:00
Bas Nieuwenhuizen
f5f8eb2c7c radv: Enable VK_KHR_shader_draw_parameters.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-02-01 19:49:40 +01:00
Bas Nieuwenhuizen
cf8a11c1ba radv: Pass draw index to shader.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-02-01 19:49:40 +01:00
Bas Nieuwenhuizen
80f4331ed1 radv/ac: Add draw index support.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-02-01 19:49:40 +01:00
Robert Foss
25f2d3c1d3 i965: Prevent coverity warning
Add assert checking that num_sources is never larger than 3.

This prevents Coverity from concluding that the unhandled
cases of num_sources not being 0-3 are relevant.

Coverity-Id: 1399480-1399489
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-02-01 16:47:05 +00:00
Lionel Landwerlin
875b15eec4 spirv: add SPV_KHR_shader_draw_parameters support
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-02-01 15:08:33 +00:00
Lionel Landwerlin
bd46040162 compiler: add missing enums for debug
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-02-01 15:08:30 +00:00
Emil Velikov
1e8fd790e1 docs: add news item and link release notes for 13.0.4
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-01 11:21:59 +00:00
Emil Velikov
f2391e8134 docs: add sha256 checksums for 13.0.4
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 6bfc352f5a)
2017-02-01 11:20:28 +00:00
Emil Velikov
7b6931e7fb docs: add release notes for 13.0.4
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 3255d10da4)
2017-02-01 11:20:27 +00:00
Michel Dänzer
31136eae3a winsys/radeon: Allow visible VRAM size > 256MB with kernel driver >= 2.49
The kernel driver reports correct values now.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2017-02-01 16:38:14 +09:00
Tapani Pälli
58828fe4ae android: add vulkan build for intel
fixes to issues spotted by Emil Velikov:

   - set ANV_TIMESTAMP corretly
   - fix typo with VULKAN_GEM_FILES

v2: update to use Makefile.sources under vulkan
    instead of having own

v3: update to changes to generate from vk.xml
    (commit c7fc310)

v4: remove 'hw' relative path
    cleanups, remove unnecessary cruft

    review from Emil Velikov:

    - move to vulkan folder
    - remove timestamp gen, no longer necessary
    - more cleanups

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-01 07:58:49 +02:00
Ilia Mirkin
62b8f494fa mesa: use same is_color_attachment trick to discern error cases
All the other calls to retrieve the attachment have been covered except
this one - return the proper error for attachment points that are valid
enums but out of bound for the driver.

Fixes GL45-CTS.geometry_shader.layered_fbo.fb_texture_invalid_attachment

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-31 22:12:57 -05:00
Jason Ekstrand
92128590bc anv: Improve flushing around STATE_BASE_ADDRESS
It is not clear from the docs exactly how pipelined STATE_BASE_ADDRESS
actually is.  We know from experimentation that we need to flush the
render cache prior to emitting STATE_BASE_ADDRESS and invalidate the
texture cache afterwards.  The only thing the PRM says is that, on gen8+
we're supposed to invalidate the state cache after STATE_BASE_ADDRESS
but experimentation has indicated that doing so does nothing whatsoever.

Since we don't really know, let's do just a bit more flushing in the
hopes that this won't be a problem again.  In particular:

 1) Do a CS stall before we emit STATE_BASE_ADDRESS since we don't
    really know whether or not it's pipelined.

 2) Do a data cache flush in case what runs before STATE_BASE_ADDRESS
    is a compute shader.

 3) Invalidate the state and constant caches after STATE_BASE_ADDRESS
    because the state may be getting cached there (we don't really know).

Reported-by: Mark Janes <mark.a.janes@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-01-31 18:49:44 -08:00
Jason Ekstrand
f1f9794118 anv: Flush render cache before STATE_BASE_ADDRESS on gen7
We had no good reason for *not* doing this on gen7 before but we didn't
know it was needed.  Recently, when trying update to Vulkan CTS version
1.0.2 in our CI system, Mark discovered GPU hangs on Haswell that appear
to be STATE_BASE_ADDRESS related.  This commit fixes them.

Reported-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-01-31 18:49:44 -08:00
Jason Ekstrand
4871930451 isl/formats: Only advertise sampling for A4B4G4R4 on Broadwell
This causes hangs on Broadwell if you try to render to it.  I have no
idea how we managed to not hit this earlier.

Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-01-31 18:49:44 -08:00
Jason Ekstrand
a0348b5a0b intel/blorp: Handle clearing of A4B4G4R4 on all platforms
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-01-31 18:49:44 -08:00
Tom Stellard
226a2c6d6e radeonsi: Fix build on LLVM < 3.9 v2
This was broken by: e0cc0a614c

v2:
  - Use preprocessor macro

Tested-by: Mark Janes <mark.a.janes@intel.com>
2017-02-01 02:10:00 +00:00
Bas Nieuwenhuizen
798ae37cc9 radv: Enable Float64 support.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-02-01 01:09:34 +01:00
Bas Nieuwenhuizen
441ee1e65b radv/ac: Implement Float64 SSBO loads.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-02-01 01:09:34 +01:00
Bas Nieuwenhuizen
bb1ce63002 radv/ac: Implement Float64 UBO loads.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-02-01 01:09:29 +01:00
Bas Nieuwenhuizen
03724af262 radv/ac: Implement Float64 load/store var.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-02-01 01:09:05 +01:00
Bas Nieuwenhuizen
91074bb11b radv/ac: Implement Float64 SSBO stores.
No f16 support as I'm not quite sure about alignment yet.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-02-01 01:09:05 +01:00
Bas Nieuwenhuizen
29577b2123 radv/ac: Add core Float64 support.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-02-01 01:09:05 +01:00
Rob Herring
01e18b21d1 vc4: Enable Neon on arm android builds
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-31 14:06:21 -08:00
Rob Herring
83107acb7b vc4: fix arm64 build with Neon
The addition of Neon assembly breaks on arm64 builds because the assembly
syntax is different. For now, restrict Neon to ARMv7 builds.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-31 14:06:19 -08:00
Rob Herring
6d92f32852 vc4: Make Neon inline assembly clang compatible
clang throws an error on "%r2" and similar. I couldn't find any
documentation on what "%r?" is supposed to mean and I've never seen any
use like that as far as I remember. The parameter is supposed to be
cpu_stride and just %2/%3 should be sufficient.

There's no need for trailing ";" either, so remove those, too.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-31 14:06:09 -08:00
Tom Stellard
e0cc0a614c radeonsi: Set datalayout on the llvm module
This prevents LLVM from using sext instructions for local memory offsets
and allows the backend to fold immediate offsets into the instruction.

This also prevents some incorrect code generation for ptrtoint and
inttoptr instructions.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-31 20:39:30 +00:00
Francisco Jerez
11e9ebbf15 nir/spirv/glsl450: Implement IEEE-compliant handling of atan2(±∞, ±∞).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-01-31 10:33:33 -08:00
Francisco Jerez
013d40d1ce glsl: Implement IEEE-compliant handling of atan2(±∞, ±∞).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-01-31 10:33:33 -08:00
Francisco Jerez
7215375c44 nir/spirv/glsl450: Rewrite atan2 implementation to fix accuracy and handling of zero/infinity.
See "glsl: Rewrite atan2 implementation to fix accuracy and handling
of zero/infinity." for the rationale, but note that the instruction
count benefit discussed there is somewhat less important for the SPIRV
implementation, because the current code already emitted no control
flow instructions -- Still this saves us one hardware instruction per
scalar component on Intel SKL hardware.

Fixes the following Vulkan CTS tests on Intel hardware:

    dEQP-VK.glsl.builtin.precision.atan2.highp_compute.scalar
    dEQP-VK.glsl.builtin.precision.atan2.highp_compute.vec2
    dEQP-VK.glsl.builtin.precision.atan2.highp_compute.vec3
    dEQP-VK.glsl.builtin.precision.atan2.highp_compute.vec4
    dEQP-VK.glsl.builtin.precision.atan2.mediump_compute.vec2
    dEQP-VK.glsl.builtin.precision.atan2.mediump_compute.vec4

Note that most of the test-cases above expect IEEE-compliant handling
of atan2(±∞, ±∞), which this patch doesn't explicitly handle, so
except for the last two the test-cases above weren't expected to pass
yet.  The reason they do is that the i965 back-end implementation of
the NIR fmin and fmax instructions is not quite GLSL-compliant (it
complies with IEEE 754 recommendations though), because fmin/fmax of a
NaN and a non-NaN argument currently always return the non-NaN
argument, which causes atan() to flush NaN to one and return the
expected value.  The front-end should probably not be relying on this
behavior for correctness though because other back-ends are likely to
behave differently -- A follow-up patch will handle the atan2(±∞, ±∞)
corner cases explicitly.

v2: Fix up argument scaling to take into account the range and
    precision of exotic FP24 hardware.  Flip coordinate system for
    arguments along the vertical line as if they were on the left
    half-plane in order to avoid division by zero which may give
    unspecified results on non-GLSL 4.1-capable hardware.  Sprinkle in
    some more comments.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-01-31 10:33:27 -08:00
Francisco Jerez
e9ffd12827 glsl: Rewrite atan2 implementation to fix accuracy and handling of zero/infinity.
This addresses several issues of the current atan2 implementation:

 - Negative zero (and negative denorms which end up getting flushed to
   zero) isn't handled correctly by the current implementation.  The
   reason is that it does 'y >= 0' and 'x < 0' comparisons to decide
   on which side of the branch cut the argument is, which causes us to
   return incorrect results (off by up to 2π) for very small negative
   values.

 - There is a serious precision problem for x values of large enough
   magnitude introduced by the floating point division operation being
   implemented as a mul+rcp sequence.  This can lead to the quotient
   getting flushed to zero in some cases introducing an error of over
   8e6 ULP in the result -- Or in the most catastrophic case will
   cause us to return NaN instead of the correct value ±π/2 for y=±∞
   and x very large.  We can fix this easily by scaling down both
   arguments when the absolute value of the denominator goes above
   certain threshold.  The error of this atan2 implementation remains
   below 25 ULP in most of its domain except for a neighborhood of y=0
   where it reaches a maximum error of about 180 ULP.

 - It emits a bunch of instructions including no less than three
   if-else branches per scalar component that don't seem to get
   optimized out later on.  This implementation uses about 13% less
   instructions on Intel SKL hardware and doesn't emit any control
   flow instructions.

v2: Fix up argument scaling to take into account the range and
    precision of exotic FP24 hardware.  Flip coordinate system for
    arguments along the vertical line as if they were on the left
    half-plane in order to avoid division by zero which may give
    unspecified results on non-GLSL 4.1-capable hardware.  Sprinkle in
    some more comments.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-01-31 10:32:45 -08:00
Francisco Jerez
69042a5be4 i965/fs: Fix nir_op_fsign of absolute value.
This does point at the front-end emitting silly code that could have
been optimized out, but the current fsign implementation would emit
bogus IR if abs was set for the argument (because it would apply the
abs modifier on an unsigned integer type), and we shouldn't rely on
the upper layer's optimization passes for correctness.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-01-31 10:32:43 -08:00
Francisco Jerez
7ec3af3f8f glsl/ir_builder: Add rcp builder.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-01-31 10:32:43 -08:00
Francisco Jerez
6643a97de3 glsl: Fix constant evaluation of the rcp op.
Will avoid a regression in a future commit that introduces some
additional rcp operations.  According to the GLSL 4.10 specification:

"Dividing by 0 results in the appropriately signed IEEE Inf."

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-01-31 10:32:43 -08:00
Francisco Jerez
e81130d7a1 mesa/program: Translate csel operation from GLSL IR.
This will be used internally by the GLSL front-end in order to
implement some built-in functions. Plumb it through MESA IR for
back-ends that rely on this translation pass.

v2: Add comment.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-01-31 10:32:42 -08:00
Wladimir J. van der Laan
56314f5baf etnaviv: Set SE.CLIP registers, add margins for scissor/clip registers
This fixes rendering of full-screen quads (and other screen-filling
geometry, e.g. ioquake3 walls up-close) on gc3000. It should be a no-op
on other hardware.

- It looks like SE_CLIP registers were not set at all.
  I'm amazed that rendering worked without them. Emit them to
  avoid issues on gc3000.

- Define constants
  ETNA_SE_SCISSOR_MARGIN_RIGHT (0x1119)
  ETNA_SE_SCISSOR_MARGIN_BOTTOM (0x1111)
  ETNA_SE_CLIP_MARGIN_RIGHT (0xffff)
  ETNA_SE_CLIP_MARGIN_BOTTOM (0xffff)

  These demarcate the margin (fixp16) between the computed sizes and the
  value sent to the chip. I have set these to the numbers used by the
  Vivante driver for gc2000. I am not sure whether any old hardware was
  relying on the old numbers, or whether those were just a guess. But if
  so, these need to be moved to the _specs structure.

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-01-31 19:29:23 +01:00
Wladimir J. van der Laan
fe3bb8cdb5 etnaviv: Generate new sin/cos instructions on GC3000
Shaders using sin/cos instructions were not working on GC3000.

The reason for this turns out to be that these chips implement sin/cos
in a different way (but using the same opcodes):

- Need their input scaled by 1/pi instead of 2/pi.

- Output an x and y component, which need to be multiplied to
  get the result.

- tex_amode needs to be set to 1.

Add a new bit to the compiler specs and generate these instructions
as necessary.

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-01-31 19:29:16 +01:00
Nanley Chery
33e0c5d003 anv/cmd_buffer: Use the proper depth input attachment surface state
Commit 2852efcda4 moved the location of
the depth input attachment surface state from the render pass to the
image view, but failed to update the surface state location used when
emitting the binding table. Fix this by loading the surface state from
the correct location.

Fixes:
dEQP-VK.renderpass.formats.d16_unorm.input.*
dEQP-VK.renderpass.formats.d24_unorm_s8_uint.input.*
dEQP-VK.renderpass.formats.d32_sfloat.input.*
dEQP-VK.renderpass.formats.x8_d24_unorm_pack32.input.*
dEQP-VK.renderpass.attachment_allocation.input_output.93
dEQP-VK.renderpass.attachment_allocation.input_output.92
dEQP-VK.renderpass.attachment_allocation.input_output.82
dEQP-VK.renderpass.attachment_allocation.input_output.46

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2017-01-31 09:00:50 -08:00
Bartosz Tomczyk
fc27181f9e glsl: fix heap-buffer-overflow
The `end+1` skips the ']', whereas the `strlen+1` includes the final
'\0' in the move to terminate the string.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-31 15:58:52 +01:00
Wladimir J. van der Laan
658568941d etnaviv: Cannot render to rb-swapped formats
Exposing rb swapped (or other swizzled) formats for rendering would
involve swizzing in the pixel shader. This is not the case at the
moment, so reject requests for creating such surfaces.

(GPUs that need an extra resolve step anyway due to multiple pixel
pipes, such as gc2000, might also do this swap in the resolve operation.
But this would be tricky to keep track of)

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-01-31 09:28:28 +01:00
Christian Gmeiner
82fe240a99 etnaviv: Avoid infinite loop in find_frame()
Use of unsigned loop control variable with '>= 0' would lead
to infinite loop.

Reported by clang:

etnaviv_compiler.c:1024:39: warning: comparison of unsigned expression
>= 0 is always true [-Wtautological-compare]
   for (unsigned sp = c->frame_sp; sp >= 0; sp--)
                                   ~~ ^  ~

v2: Simply use the same datatype as c->frame_sp is using.

CC: <mesa-stable@lists.freedesktop.org>
Reported-by: Rhys Kidd <rhyskidd@gmail.com>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Rhys Kidd <rhyskidd@gmail.com>
2017-01-31 09:19:25 +01:00
Dave Airlie
8477aa71d9 radv/ac: apply slice rounding to 1d arrays as well.
Fixes:
dEQP-VK.glsl.texture_functions.texture.*1darray*

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 11:13:15 +10:00
Dave Airlie
3882f3da22 radv/geom: check if esgs and gsvs ring exists before filling geom rings
There are some corner cases where you end up with an esgs ring, but no
gsvs ring, test for both before dereferencing.

Fixes:
dEQP-VK.geometry.emit.points_emit_0_end_0

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 11:13:15 +10:00
Dave Airlie
723941bb3d radv: enable geometryShader and multiViewport capabilities.
This enables geometry shader support on radv.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 09:30:53 +10:00
Dave Airlie
ca822e1b7c radv: handle layer export from vs->fs properly
Fixes:
dEQP-VK.geometry.layered.1d_array.fragment_layer

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 09:30:49 +10:00
Dave Airlie
c9c8ae1fd3 radv: emit esgs itemsize register.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 09:30:46 +10:00
Dave Airlie
77ec78669a radv: handle prim id inputs to fragment shader.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 09:30:41 +10:00
Dave Airlie
105ce24d46 radv: emit geometry shaders to hardware
This emits the compiled geometry shader and other state registers.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 09:30:37 +10:00
Dave Airlie
1fa5b755c2 radv: emit geometry ring size and pointers via preamble (v2)
This uses the scratch infrastructure to handle the esgs
and gsvs rings.

(this replaces the old code that did this with patching).

v2: fix correct ring sizes, reset sizes (Bas)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 09:30:19 +10:00
Dave Airlie
8f41fe4389 radv: add gs ring size calculations to pipeline.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 09:30:15 +10:00
Dave Airlie
99936d3606 radv: add pipeline creation support for geometry shaders (v2.1)
This adds gs copy shader support to the pipeline cache, and few
geometry related changes.

v2: rebase for spill changes.
v2.1: fix incorrect pipeline destruction.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 09:30:10 +10:00
Dave Airlie
fd4ea9e62d radv/ac: handle primitive id
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 09:30:08 +10:00
Dave Airlie
4ec294adce radv/ac: handle emitting vertex outputs to esgs ring.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 09:30:05 +10:00
Dave Airlie
ac642c6195 radv/ac: handle gs inputs
This handles geometry shader inputs written by the vertex (es) shader
to the esgs ring.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 09:30:01 +10:00
Dave Airlie
80cdf2c17e radv/ac: add geom input support to get deref offset.
This just adds the API and fixes up the callers.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 09:29:59 +10:00
Dave Airlie
23999a363b radv/ac: handle invocation and primitive id intrinsics
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 09:29:55 +10:00
Dave Airlie
63fa6c6eb4 radv/ac: handle geometry emit vertex and end prim intrinsics.
This handles emitting things to the gsvs ring, and sending the
correct GS msgs.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 09:29:52 +10:00
Dave Airlie
2a56186d57 radv/ac: handle emitting gs epilogue
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 09:29:48 +10:00
Dave Airlie
a615a01942 radv/ac: add copy shader creation
This create the gs copy shader and compiles it.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 09:29:40 +10:00
Dave Airlie
09cd037ca4 radv/ac: setup function parameters for vs as es and copy shader.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 09:29:33 +10:00
Dave Airlie
e1e9301b2a radv: pass some necessary gs info back to state handling.
We need this info to program some registers.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 09:29:30 +10:00
Dave Airlie
68a77411e1 radv: emit vertex shader to correct hw block.
This emits the shader to the ES block in the correct case.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 09:29:27 +10:00
Dave Airlie
2a57bddd4c radv/ac: propogate as_es flag into shader info from key.
This just places the flag into the shader info so we can use it from
the driver after we create the shader.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 09:29:23 +10:00
Dave Airlie
b941a88e01 radv: extend shader stage code to cover geometry shaders.
This enables the paths for setting up user ptrs to vs/es and gs.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 09:29:20 +10:00
Dave Airlie
ec7bf863d2 radv/ac: start setting up the geom shader rings (v2)
This sets up the rings and adds the variables
needed to make them work.

v2: rework for sharing ring and scratch
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 09:29:17 +10:00
Dave Airlie
ca91db2402 radv/ac: handle geom shader sgpr/vgpr inputs
This just sets up the gpr inputs.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 09:29:13 +10:00
Dave Airlie
374e978438 radv/ac: add geom shader sendmsg defines.
This just adds some defines needed for geom shaders.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 09:29:10 +10:00
Dave Airlie
583cf8efd4 radv/ac: add some geom shader info from nir->ac shader.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 09:28:50 +10:00
Dave Airlie
ecb8a34910 radv: move hw vertex shader emit to separate function
This is to later allow ES shaders to be emitted.

Review-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 09:28:46 +10:00
Dave Airlie
3b507855cb radv: fixup ia multi vgt param code to handle geom shaders.
This fixes up a few of the commented out blocks.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 09:28:28 +10:00
Dave Airlie
68c5da7e66 radv: add code to set gs_table_depth.
Review-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 09:28:24 +10:00
Dave Airlie
f26fa879b7 radv: add small helper to denote when a geom shader is in the pipeline.
Review-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 09:28:13 +10:00
Robert Foss
0b63f47030 radv: Prevent Coverity warning
Prevent Coverity seeing potential errors when src is
no initialized in the switch case.

Coverity-Id: 1396397
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-01-30 23:59:22 +01:00
Timothy Arceri
30aa22dec0 mesa: add new MESA_GLSL flag for printing shader cache debug info
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-31 09:51:31 +11:00
Carl Worth
ba1eb854bd glsl: add cache to ctx and add sha1 string fields
We also add a flag for detecting shaders written to shader cache.

V2: dont leak cache

Signed-off-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-31 09:51:30 +11:00
Carl Worth
b8cb1a05cd glsl: add new uniform fields to be used to restore state from cache
Signed-off-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-31 09:51:30 +11:00
Carl Worth
0f60c6616e glsl: Switch to disable-by-default for the GLSL shader cache
The shader cache is expected to be developed incrementally over a
fairly long series of commits. For that period of instability, we
require users to opt into the shader cache by setting:

	MESA_GLSL_CACHE_ENABLE=1

In the future, when the shader cache is complete, we can revert this
commit so that the cache will be on by default.

The user can always disable the cache with
MESA_GLSL_CACHE_DISABLE=1. That functionality is not affected by this
commit, (nor will it be affected by the future revert).

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-31 09:51:30 +11:00
Dave Airlie
0ecd426490 radv/ac: implement txs for buffer textures.
This fixes a bunch of buffer related:
dEQP-VK.memory.pipeline_barrier.*
tests, that were crashing in LLVM due to this being missing.

Reviewed-by: Andres Rodriguez<andresx7@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 06:26:53 +10:00
Dave Airlie
ecc3fa3ba3 radv/ac: handle nir irem opcode.
This fixes:
dEQP-VK.spirv_assembly.instruction.compute.opsrem.*

Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org"
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 05:38:57 +10:00
Dave Airlie
059dd17175 radv/ac: fix multisample subpass image.
We weren't adding the fragment position properly.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 04:44:59 +10:00
Dave Airlie
a1c1ba7d56 radv: handle transfer_write as a dst flag.
It appears we can get image barriers like:
    srcStageMask:                   VkPipelineStageFlags = 4096 (VK_PIPELINE_STAGE_TRANSFER_BIT)
    dstStageMask:                   VkPipelineStageFlags = 4096 (VK_PIPELINE_STAGE_TRANSFER_BIT)
    dependencyFlags:                VkDependencyFlags = 0
    memoryBarrierCount:             uint32_t = 0
    pMemoryBarriers:                const VkMemoryBarrier* = NULL
    bufferMemoryBarrierCount:       uint32_t = 0
    pBufferMemoryBarriers:          const VkBufferMemoryBarrier* = NULL
    imageMemoryBarrierCount:        uint32_t = 1
    pImageMemoryBarriers:           const VkImageMemoryBarrier* = 0x7ffc882367b0
        pImageMemoryBarriers[0]:        const VkImageMemoryBarrier = 0x7ffc882367b0:
            sType:                          VkStructureType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER (45)
            pNext:                          const void* = NULL
            srcAccessMask:                  VkAccessFlags = 4096 (VK_ACCESS_TRANSFER_WRITE_BIT)
            dstAccessMask:                  VkAccessFlags = 4096 (VK_ACCESS_TRANSFER_WRITE_BIT)
            oldLayout:                      VkImageLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL (7)
            newLayout:                      VkImageLayout = VK_IMAGE_LAYOUT_GENERAL (1)
            srcQueueFamilyIndex:            uint32_t = 4294967295
            dstQueueFamilyIndex:            uint32_t = 4294967295
            image:                          VkImage = 0x2df55e0
            subresourceRange:               VkImageSubresourceRange = 0x7ffc882367e0:
                aspectMask:                     VkImageAspectFlags = 1 (VK_IMAGE_ASPECT_COLOR_BIT)
                baseMipLevel:                   uint32_t = 0
                levelCount:                     uint32_t = 1
                baseArrayLayer:                 uint32_t = 0
                layerCount:                     uint32_t = 1

This fixes all the CTS dEQP-VK.memory.pipeline_barrier.transfer_dst tests here,
not sure if this is a too large hammer.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-31 04:42:21 +10:00
Samuel Pitoiset
af7fef12f7 r600: fix a compilation warning in r600_screen_create()
Should be r600_common_screen instead of r600_screen.

Fixes: 80157a2c20 ("gallium/radeon: clean up r600_query_init_backend_mask")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 18:13:18 +01:00
Marek Olšák
f8bc628b2c gallium/radeon: merge dirty_fb_counter and dirty_tex_descriptor_counter
to simplify things in draw_vbo a little

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 17:45:29 +01:00
Marek Olšák
75c425e511 winsys/radeon: clamp vram_vis_size to 256MB
the value from the kernel is wrong

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 17:45:29 +01:00
Marek Olšák
eba9e9dd1d radeonsi: handle count_from_stream_output in a few IA_MULTI_VGT_PARAM cases
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 17:45:29 +01:00
Marek Olšák
a0740d59aa radeonsi: don't invoke DCC decompression in update_all_texture_descriptors
This fixes a bug uncovered by the 17-part patch series, specifically:
  "gallium/radeon: merge dirty_fb_counter and dirty_tex_descriptor_counter"

If dirty_tex_counter has been updated and set_shader_image invokes DCC
decompression, the DCC decompression itself checks the counter and updates
descriptors, which in turn invokes the same DCC decompression. The blitter
can't handle the recursion and the driver eventually crashes.

Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 17:45:29 +01:00
Marek Olšák
f8dd2f5bac radeonsi: fold info->indirect conditionals into the last one in draw_vbo
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 17:29:36 +01:00
Marek Olšák
408f9a1584 radeonsi: atomize the scratch buffer state
The update frequency is very low.

Difference: Only account for the size when allocating a new one and when
            starting a new IB, and check for NULL. (v3)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 17:29:36 +01:00
Bartosz Tomczyk
a41f2527ae r600: Fix stack overflow
Commit 7b5878ee04 increased number of
outputs to 64, but left output array intact. This caused stack overflow
when number of outputs is bigger then 32. Found by ASAN.

Cc: "12.0 13.0 17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 15:30:03 +01:00
Samuel Pitoiset
e2c15ea092 gallium/radeon: add new HUD queries for monitoring the CP
There are even more counters in the CP_STAT register but I think
these ones are enough for now.

v2: only read (and expose) CP_STAT on VI+

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-30 14:37:00 +01:00
Samuel Pitoiset
0e04a078c5 gallium/radeon: add new GPU-sdma-busy HUD query
For simplicity, GPU-sdma-busy will return 0 on previous gens.

v2: only read SRBM_STATUS2 on Evergreen+

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-30 14:37:00 +01:00
Samuel Pitoiset
b0f7ddef4f gallium/radeon: rename grbm to mmio in the gpu load path
We also want to monitor other MMIO counters like SRBM_STATUS2 in
order to know if SDMA is busy.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-30 14:37:00 +01:00
Marek Olšák
2fc5fe0e85 winsys/amdgpu: add a fast exit path into amdgpu_cs_add_buffer
The time spent in the function dropped by 37% for torcs.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:57:09 +01:00
Samuel Pitoiset
86eb52adad winsys/amdgpu: do not iterate twice when adding fence dependencies
The perf difference is very small, 3.25->2.84% in amdgpu_cs_flush()
in the DXMD benchmark.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-30 13:44:25 +01:00
Samuel Pitoiset
5a6b1aadea winsys/amdgpu: add one likely() call in amdgpu_cs_flush()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-30 13:44:19 +01:00
Samuel Pitoiset
db2b0210b1 hud: fix compilation warnings in hud_nic_graph_install()
v2: use PRId64 instead of PRIx64

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-30 13:43:30 +01:00
Samuel Pitoiset
0b646ad05e st/mesa: make st_texture_get_sampler_view() static
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:42:50 +01:00
Marek Olšák
62732ce263 gallium/radeon: remove r600_common_context::max_db
this cleanup is based on the vulkan driver, which seems to do the same thing

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
9327780da6 winsys/amdgpu: fix ADDR_REGISTER_VALUE::backendDisables
This would be a fix if the value was used anywhere.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
80157a2c20 gallium/radeon: clean up r600_query_init_backend_mask
This just needs to be done for r600g in the screen.
We don't need an IB submission for every new context created for GCN.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
5f99c49008 radeonsi: precompute IA_MULTI_VGT_PARAM values into a table
The perf difference is very small: 0.99% -> 0.40% for the time spent
in si_get_ia_multi_vgt_param when si_draw_vbo is 20%. Pretty much nothing.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
c78177fc64 radeonsi: move VGT_VERTEX_REUSE_BLOCK_CNTL into shader states for Polaris
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
ccecf79c2b radeonsi: state atom IDs don't have to be off by one
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
ac059f1c23 radeonsi: use a bitmask for looping over dirty PM4 states
also move it to draw_vbo, because it should be 0 in most cases

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
802fcdc0d2 radeonsi: atomize L2 prefetches
to move the big conditional statement out of draw_vbo

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
c99ba3eb47 radeonsi: unbind disabled shader stages to prevent useless L2 prefetches
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
4a4ff66dbe radeonsi: also prefetch compute shaders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
879c73fac8 radeonsi: update dirty_level_mask only after the first draw after FB change
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
cecc068774 gallium/radeon: allow VRAM-only placements again on APUs & recent amdgpu
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
0d0f357de6 radeonsi: don't set +fp64-denormals
it's the default and the name will change to +fp64-fp16-denormals.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Marek Olšák
b177162489 radeonsi: remove si_shader_context::param_tess_offchip
we don't use on-chip tess.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-30 13:27:14 +01:00
Lucas Stach
e158b74971 etnaviv: force vertex buffers through the MMU
This fixes a vertex data corruption issue if some of the vertex streams
go through the MMU and some don't.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Philipp Zabel <p.zabel@pengutronix.de>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-01-30 12:40:57 +01:00
Andres Rodriguez
33f418bd67 radv: Expose VK_KHR_maintenance1
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-01-30 08:44:11 +01:00
Andres Rodriguez
7b890a36df radv: Fix vkCmdCopyImage for 2d slices into 3d Images
Previously the z offset of the destination image was being ignored. It
should be taken into account when copying into a 3d target.

Also, img_extent_el.depth was being incorrectly clamped to 1 due to the
source image being VK_IMAGE_TYPE_2D. This would result in the blit
failing to iterate over all the 3d slices. Instead we clamp to the
destination image type.

Fixes failures in CTS tests:
dEQP-VK.api.copy_and_blit.image_to_image.3d_images.*

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-01-30 08:44:07 +01:00
Bas Nieuwenhuizen
4eae3597eb radv: Expose transfer format features.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-01-30 08:42:26 +01:00
Bas Nieuwenhuizen
34bfe4b1bb radv: Don't allow any operations on non-supported depth/stencil formats.
We really use the depth block for the blits.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-01-30 08:42:26 +01:00
Andres Rodriguez
f8d5e1ab2d radv: use new error codes for AllocateDescriptorSets
There is a new error code in Maintenance1 that is more specific to the
situation: VK_ERROR_OUT_OF_POOL_MEMORY_KHR

Fixes CTS test case:
dEQP-VK.api.descriptor_pool.out_of_pool_memory

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-01-30 08:42:17 +01:00
Andres Rodriguez
e199a993b2 radv: vkAllocateCommandBuffers should NULL all output handles
This is part of the spec and fixes CTS tests:
dEQP-VK.api.object_management.alloc_callback_fail_multiple.command_buffer_*

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-01-30 08:38:13 +01:00
Andres Rodriguez
ec0f5c005c radv: add trim command pool stub
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-01-30 08:37:54 +01:00
Kenneth Graunke
2f7a7ae131 i965: Support the force_glsl_version driconf option.
Gallium drivers have had this for a while.  It makes sense to support
it consistently across drivers, so expose it in i965 as well.

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-01-29 18:20:57 -08:00
Kenneth Graunke
02216a1ddf i965: Fix check for negative pitch in can_do_fast_copy_blit().
At this point, the pitch is in bytes.  We haven't yet divided the pitch
by 4 for tiled surfaces, so abs(pitch) may be larger than 32K.  This
means the bit 15 trick won't work.

The caller now has signed integers anyway, so just pass those through
and do the obvious check.

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-29 18:20:35 -08:00
Bas Nieuwenhuizen
c4d7b9cd29 radv: Handle command buffers that need scratch memory.
v2: Create the descriptor BO with CPU access.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-01-30 02:07:20 +01:00
Bas Nieuwenhuizen
ccff93e138 radv: Track scratch usage across pipelines & command buffers.
Based on code written by Dave Airlie.

Signed-off-by: Bas Nieuwenhuizen <basni@oogle.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-01-30 02:07:16 +01:00
Bas Nieuwenhuizen
29c1f67e9f radv/ac: Add compiler support for spilling.
Based on code written by Dave Airlie.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-01-30 02:07:12 +01:00
Bas Nieuwenhuizen
d115b67712 radv/amdgpu: Support a preamble CS.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-01-30 02:07:08 +01:00
Timothy Arceri
2842dea310 i965: add assert to while_jumps_before_offset()
jip should always be negative here as its the result of
do instruction - while instruction.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-30 10:17:54 +11:00
Timothy Arceri
77a6597bb7 i965: fix up asserts in brw_inst_set_jip()
We are casting from a signed 32bit int to an unsigned 16bit int
so shift 15 bits rather than 16.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-30 10:17:46 +11:00
Bas Nieuwenhuizen
b8ee45ebdc llvmpipe: Use LLVMDumpModule, not DumpModule.
Forgot the prefix ...

Fixes: 0fca80b3db
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
2017-01-29 17:03:25 +01:00
Bas Nieuwenhuizen
0fca80b3db various: Fix missing DumpModule with recent LLVM.
Since LLVM revision 293359 DumpModule gets only implemented when
either a debug build or LLVM_ENABLE_DUMP is set.

This patch adds a direct replacement for the function for radv and
radeonsi, However, as I don't know a good place to put common LLVM
code for all three I inlined the implementation for LLVMPipe.

v2: Use the new code for LLVM 3.4+ instead of LLVM 5+ & fixed indentation

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-01-29 10:25:00 +01:00
Ilia Mirkin
ce7a045fee r600g: use ieee variants of multiplication instructions
This matches the behavior of most other drivers, including nouveau,
radeonsi, and i965.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-29 00:00:07 -05:00
Ilia Mirkin
bacbb01105 r600g: add support for optionally using non-IEEE mul ops
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-28 23:59:43 -05:00
Eric Anholt
5b7e2697dc vc4: Coalesce into TLB writes as well as VPM/tex.
This generally cuts an instruction when blending is enabled and we thus
have a single instruction generating the color value.

total instructions in shared programs: 91759 -> 91634 (-0.14%)
instructions in affected programs:     5338 -> 5213 (-2.34%)
2017-01-28 19:35:20 -08:00
Eric Anholt
c1299615fb vc4: Avoid an extra temporary and mov in ffloor/ffract/fceil.
shader-db results:

total instructions in shared programs: 92611 -> 91764 (-0.91%)
instructions in affected programs:     27417 -> 26570 (-3.09%)

The star is one shader in glmark2's terrain (drops 16% of its
instructions), but there are also wins in mupen64plus and glb2.7.
2017-01-28 19:35:20 -08:00
Eric Anholt
0079df0b2d vc4: Flip the switch to run the GLSL compiler optimization loop once.
This has almost no effect on shader-db:

total instructions in shared programs: 92572 -> 92611 (0.04%)
instructions in affected programs:     4486 -> 4525 (0.87%)

Looking at 2 of the 7 different shaders that were hurt (all of which were
in mupen64), they all appear to be just differences in order of
instructions at the NIR level.

The advantage is that this should significantly reduce time in the compiler.
2017-01-28 19:35:20 -08:00
Kenneth Graunke
7c5629a269 i965: Unbind deleted shaders from brw_context, fixing malloc heisenbug.
Applications may delete a shader program, create a new one, and bind it
before the next draw.  With terrible luck, malloc may randomly return a
chunk of memory for the new gl_program that happened to be the exact
same pointer as our previously bound gl_program.  In this case, our
logic to detect new programs in brw_upload_pipeline_state() would break:

      if (brw->vertex_program != ctx->VertexProgram._Current) {
         brw->vertex_program = ctx->VertexProgram._Current;
         brw->ctx.NewDriverState |= BRW_NEW_VERTEX_PROGRAM;
      }

Because the pointer is the same, we'd think it was the same program.
But it could be wildly different - a different stage altogether,
different sets of resources, and so on.  This causes utter chaos.

As unlikely as this seems, I believe I hit this when running a subset
of the CTS in a loop, in a group of tests that churns through simple
programs, deleting and rebuilding them.  Presumably malloc uses a
bucketing cache of sorts, and so freeing up a gl_program and allocating
a new one fairly quickly causes it to reuse that memory.

The result was that brw->vertex_program->info.num_ssbos claimed the
program had SSBOs, while brw->vs.base.prog_data.binding_table claimed
that there were none.  This was crazy, because the binding table is
calculated from info.num_ssbos - the shader info appeared to change
between shader compile time and draw time.  Careful use of watchpoints
revealed that it was being clobbered by rzalloc's memset when building
an entirely different program...

Fortunately, our 0xd0d0d0d0 canary for unused binding table entries
caused us to crash out of bounds when trying to upload SSBOs, or we
may have never discovered this heisenbug.

Fixes crashes in GL45-CTS.compute_shader.sso-case2 when using a hacked
cts-runner that only runs GL45-CTS.compute_shader.s* in EGL config ID 5
at 64x64 in a loop with 100 iterations.

Cc: "17.0 13.0 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-27 21:52:37 -08:00
Bas Nieuwenhuizen
96c60b7f07 radv/ac: Use base in push constant loads.
Apparently the source is not an address but an offset, so we actually
need to use the base.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
CC: <mesa-stable@lists.freedesktop.org>
2017-01-28 03:07:39 +01:00
Andres Rodriguez
e8047980d2 radv: drop support for VK_AMD_NEGATIVE_VIEWPORT_HEIGHT
This extension was not correctly supported, and it conflicts with the
VK_KHR_MAINTENANCE1 spec.

Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-28 11:02:35 +10:00
Dave Airlie
e9b16c74fa radv: implement VK_KHR_GET_PHYSICAL_DEVICE_PROPERTIES_2
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-28 10:52:23 +10:00
Dave Airlie
989ec61703 radv: use proper maximum slice for layered view
this fixes deferred shadows with geom shaders enabled.

but I think this fix is fine by itself.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-28 10:52:20 +10:00
Chad Versace
6403e37651 i965/sync: Implement fences based on Linux sync_file
This patch implements a new type of struct brw_fence, one that is based
struct sync_file.

This completes support for EGL_ANDROID_native_fence_sync.

* Background

  Linux 4.7 added a new file type, struct sync_file. See

    commit 460bfc41fd52959311ed0328163f785e023857af
    Author:  Gustavo Padovan <gustavo.padovan@collabora.co.uk>
    Date:    Thu Apr 28 10:46:57 2016 -0300
    Subject: dma-buf/sync_file: de-stage sync_file headers

  A sync file is a cross-driver explicit synchronization primitive. In a
  sense, sync_file's relation to synchronization is similar to dma_buf's
  relation to memory: both are primitives that can be imported and
  exported across drivers (at least in theory).

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-27 13:10:07 -08:00
Chad Versace
0b6dd31d68 i965/sync: Rename brw_fence_insert()
Rename to brw_fence_insert_locked(). This is correct because the fence's
mutex is effectively locked, as all callers are also *creators* of the
fence, and have not yet returned the new fence.

This reduces noise in the next patch, which defines and uses
brw_fence_insert(), an unlocked variant.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-27 13:10:07 -08:00
Chad Versace
a5c17f5c29 i965/sync: Fail sync creation when batchbuffer flush fails
Pre-patch, brw_sync.c ignored the return value of
intel_batchbuffer_flush().

When intel_batchbuffer_flush() fails during eglCreateSync
(brw_dri_create_fence), we now give up, cleanup, and return NULL.

When it fails during glFenceSync, however, we blindly continue and hope
for the best because there does not exist yet a way to tell core GL that
sync creation failed.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-27 13:10:07 -08:00
Chad Versace
014d0e0f88 i965/sync: Add brw_fence::type
This a refactor patch; no expected changed in behavior.

Add `enum brw_fence_type` and brw_fence::type. There is only one type
currently, BRW_FENCE_TYPE_BO_WAIT. This patch reduces a lot of noise in
the next, which adds new type BRW_FENCE_TYPE_SYNC_FD.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-27 13:10:06 -08:00
Chad Versace
d1ce499dae i965: Add intel_batchbuffer_flush_fence()
A variant of intel_batchbuffer_flush() with parameters for in and out
fence fds.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-27 13:10:06 -08:00
Chad Versace
358661c794 i965: Add intel_screen::has_fence_fd
This bool maps to I915_PARAM_HAS_EXEC_FENCE_FD.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-27 13:10:06 -08:00
Chad Versace
b8acb6b179 configure: Require libdrm >= 2.4.75
Required to implement EGL_ANDROID_native_fence_sync on i965.
Specifically, i965 needs drm_intel_gem_bo_exec_fence(),
I915_PARAM_HAS_EXEC_FENCE, and libsync.h.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-27 13:10:06 -08:00
Emil Velikov
cb6be5c8c0 configure.ac: list radeon in --with-vulkan-drivers help string
Analogous to what we do for the dri and gallium drivers.

Cc: 17.0 13.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@colllabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-01-27 19:25:30 +00:00
Emil Velikov
6f2dec0a23 radv: automake: Don't install vk_platform.h or vulkan.h.
These files belong to the vulkan loader.

Identical to
045f38a507 vulkan: Don't install vk_platform.h or vulkan.h.

Cc: Dave Airlie <airlied@redhat.com>
Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-27 19:25:26 +00:00
Jason Ekstrand
d96ade1c4c anv: Advertise API version 1.0.39
I'm pretty sure we've kept up with the bug fixes.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-01-27 10:06:14 -08:00
Eric Engestrom
5f301fe2e6 gbm/dri: fix memory leaks in error path
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
[Emil Velikov: make sure it builds]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-27 17:56:58 +00:00
Emil Velikov
1d104f9aa7 docs/releasing: add a note about the relnotes template
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-27 17:56:58 +00:00
Emil Velikov
2e076af067 mesa: remove explicit __STDC_FORMAT_MACROS define
Analogous to previous commits.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-27 17:56:57 +00:00
Emil Velikov
1cfe97ff0e nouveau: remove explicit __STDC_FORMAT_MACROS define
Already handled by the build.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-27 17:56:57 +00:00
Emil Velikov
027e04932a scons: swr: remove explicit __STDC_.*_MACROS defines
Analogous to previous commits.

Cc: George Kyriazis <george.kyriazis@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-27 17:56:57 +00:00
Emil Velikov
e809fadb86 gallium: remove explicit __STDC_.*_MACROS defines
Analogous to previous commits.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-27 17:56:57 +00:00
Emil Velikov
01e28c6cf5 gallivm: remove explicit __STDC_.*_MACROS defines
Correctly handled by the build systems.

Cc: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-27 17:56:57 +00:00
Emil Velikov
74a174e12f glsl: remove explicit __STDC_FORMAT_MACROS define
Correctly handled by all the build systems.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-27 17:56:57 +00:00
Emil Velikov
9e9e917c26 autoconf: set all __STDC_*_MACROS
Analogous to previous commit(s), with a minor detail - here we set the
macros when building both C and C++ sources.

Resolving that is a more challenging task that we'll sort out another
day.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-27 17:56:57 +00:00
Emil Velikov
d68ffa9446 scons: always set __STDC_*_MACROS for C++ sources
Analogous to previous commit - just set the lot once throughout.

Cc: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-27 17:56:57 +00:00
Emil Velikov
13e2928d57 android: always set __STDC_*_MACROS for C++ sources
Various parts of the code depend on the macros being defined.

Just set those unconditionally, only where needed (c++ sources) so that
we can drop the workarounds through the code.

Cc: Rob Herring <robh@kernel.org>
Cc: Chih-Wei Huang <cwhuang@android-x86.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-27 17:56:57 +00:00
Emil Velikov
c4862fa382 st/xa: automake: remove duplicate -Wall
Already handled by configure.ac

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-27 17:56:57 +00:00
Emil Velikov
6a5850b04a mesa: move variable declaration to where its used
The variable replacement was unused when building w/o
ENABLE_SHADER_CACHE. Since we can mix variable declarations and code,
move it to where its used.

Fixes: 9f8dc3bf03 "utils: build sha1/disk cache only with
Android/Autoconf"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-27 17:56:57 +00:00
Emil Velikov
01874d5278 st/mesa: use correct return statement for a void function
Analogous to previous commit.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-27 17:56:56 +00:00
Emil Velikov
c1960e23ff mesa: use correct return statement for a void function
Using return foo() is incorrect even if foo itself returns void.
Spotted by AppVeyor, as below:

teximage.c(3653) : warning C4098: 'copyteximage' : 'void' function returning a value

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-27 17:56:56 +00:00
Emil Velikov
be3b5e015c svga: remove const qualifier from SVGA3D_vgpu10_GenMips() prototype
Does not match the function definition or how it's used. Triggers the
following warning in AppVeyor

svga_cmd_vgpu10.c(1301) : warning C4028: formal parameter 2 different from declaration

Cc: Charmaine Lee <charmainel@vmware.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-27 17:56:56 +00:00
Emil Velikov
cf00cc72e9 nir: add extra const notation in compare_blocks()
MSVC warns about different const qualifiers. Add the extra const to
silence it.

nir_phi_builder.c(244) : warning C4090: 'initializing' : different 'const' qualifiers
nir_phi_builder.c(245) : warning C4090: 'initializing' : different 'const' qualifiers

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-27 17:56:56 +00:00
Emil Velikov
a2dea3b654 nir: silence implicit conversion to 64bit
MSVC warns about implicit conversion as below. Annotate the literal
appropriately to silence the warning.

nir_gather_info.c(249) : warning C4334: '<<' : result of 32-bit shift
implicitly converted to 64 bits (was 64-bit shift intended?)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-27 17:56:56 +00:00
Emil Velikov
01849ae0dc i915, i965: automake: remove NA include directive
The path in question (... dri/intel/server) was removed years ago.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-27 17:56:56 +00:00
Emil Velikov
091f2b8c98 mesa/tests: automake: include builddir prior to srcdir
Analogous to previous commit.

Cc: "12.0 13.0" <mesa-dev@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-27 17:56:56 +00:00
Emil Velikov
6ba96bdcab dri/osmesa: automake: include builddir prior to srcdir
Analogous to previous commit.

Cc: "12.0 13.0" <mesa-dev@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-27 17:56:56 +00:00
Emil Velikov
ede4ff9adc dri/swrast: automake: include builddir prior to srcdir
Analogous to previous commit.

Cc: "12.0 13.0" <mesa-dev@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-27 17:56:56 +00:00
Emil Velikov
5a0ba1e5de radeon, r200: automake: include builddir prior to srcdir
Analogous to previous commit.

Cc: "12.0 13.0" <mesa-dev@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-27 17:56:56 +00:00
Emil Velikov
ee5de93269 mapi: automake: include builddir prior to srcdir
Analogous to previous commit.

Cc: "12.0 13.0" <mesa-dev@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-27 17:56:56 +00:00
Emil Velikov
af860850a0 loader: automake: include builddir prior to srcdir
Analogous to previous commit.

Cc: "12.0 13.0" <mesa-dev@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-27 17:56:56 +00:00
Emil Velikov
912b4f5472 glx/windows: automake: include builddir prior to srcdir
Analogous to previous commit.

Cc: "12.0 13.0" <mesa-dev@lists.freedesktop.org>
Cc: Jon Turney <jon.turney@dronecode.org.uk>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-27 17:56:55 +00:00
Emil Velikov
5b874cee09 glx/apple: automake: include builddir prior to srcdir
Analogous to previous commit.

Cc: "12.0 13.0" <mesa-dev@lists.freedesktop.org>
Cc: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jeremy Sequoia <jeremyhu@apple.com>
2017-01-27 17:56:55 +00:00
Emil Velikov
d66f9e6d93 glx: automake: include builddir prior to srcdir
Analogous to previous commit.

Cc: "12.0 13.0" <mesa-dev@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-27 17:56:55 +00:00
Emil Velikov
d221bf9b91 d3dadapter9: automake: include builddir prior to srcdir
Analogous to previous commit.

Cc: "12.0 13.0" <mesa-dev@lists.freedesktop.org>
Cc: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-27 17:56:55 +00:00
Emil Velikov
517f34b4be st/dri: automake: include builddir prior to srcdir
Analogous to previous commit.

Cc: "12.0 13.0" <mesa-dev@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-27 17:56:55 +00:00
Emil Velikov
02f991c00d clover: automake: remove -I$(srcdir)
Already implicitly handled by the build system.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Aaron Watry <awatry@gmail.com>
2017-01-27 17:56:55 +00:00
Emil Velikov
65d5a60cac clover: automake: include builddir prior to srcdir
Analogous to previous commit.

Cc: "12.0 13.0" <mesa-dev@lists.freedesktop.org>
Cc: Aaron Watry <awatry@gmail.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-27 17:56:55 +00:00
Emil Velikov
c5921ae0d2 egl: automake: include builddir prior to srcdir
Analogous to previous commit.

Cc: "12.0 13.0" <mesa-dev@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-27 17:56:55 +00:00
Emil Velikov
90ac5c339e i915: automake: include builddir prior to srcdir
Analogous to previous commit.

Cc: "12.0 13.0" <mesa-dev@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-27 17:56:55 +00:00
Emil Velikov
4622c75dfb i965: automake: include builddir prior to srcdir
The latter can contain stale generated file, which, as-is, we'll end up
using.

Fixes: bfd17c76c1 "i965: Port INTEL_PRECISE_TRIG=1 to NIR."
Cc: "12.0 13.0" <mesa-dev@lists.freedesktop.org>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2017-01-27 17:56:55 +00:00
Emil Velikov
a922c82125 freedreno: automake: correctly set MKDIR_GEN
Analogous to previous commit.

Fixes: 4610e5ef28 "freedreno/ir3: fix sin/cos"
Cc: "12.0 13.0" <mesa-dev@lists.freedesktop.org>
Cc: Rob Clark <robclark@freedesktop.org>
Cc: Nicolas Dechesne <nicolas.dechesne@linaro.org>
Reported-by: Nicolas Dechesne <nicolas.dechesne@linaro.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Nicolas Dechesne <nicolas.dechesne@linaro.org>
2017-01-27 17:56:55 +00:00
Emil Velikov
5eed48d237 i965: automake: correctly set MKDIR_GEN
Otherwise we might end up w/o the respective folder (depending on
autotools version) and fail at build time.

Fixes: bfd17c76c1 "i965: Port INTEL_PRECISE_TRIG=1 to NIR."
Cc: "12.0 13.0" <mesa-dev@lists.freedesktop.org>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-27 17:56:54 +00:00
Eric Engestrom
1ee2ae8348 anv: add missing extension errors in vk_errorf()
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-27 17:23:32 +00:00
Eric Engestrom
86879bf4ed anv: add missing core errors in vk_errorf()
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-27 17:23:32 +00:00
Lionel Landwerlin
ba26c79157 anv: don't assert on out of memory descriptor pool in debug mode
Fixes:
   dEQP-VK.api.descriptor_pool.out_of_pool_memory

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-01-27 17:23:32 +00:00
Eric Engestrom
4da0d1c59a docs/repository: fix name of main branch
This is git, not svn :P

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-27 16:52:23 +00:00
Eric Engestrom
87619a1a6a egl: EGL_PLATFORM_SURFACELESS_MESA is now upstream
EGL_PLATFORM_SURFACELESS_MESA is in eglext.h as of last commit.

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-01-27 16:52:23 +00:00
Eric Engestrom
a98b3a0872 egl: update headers from registry
Khronos introduced a new macro (suggested by Google) to avoid using
C-style casts in C++ code, as those generate warnings.

Khronos Bugzilla: https://cvs.khronos.org/bugzilla/show_bug.cgi?id=16113
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-01-27 16:52:23 +00:00
Eric Engestrom
06842585df radv: add missing extension errors in vk_errorf()
v2(Bas): Remove the extra VK_ERROR_FRAGMENTED_POOL cases.

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-01-27 17:33:05 +01:00
Eric Engestrom
43cf967512 radv: add missing core errors in vk_errorf()
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-01-27 17:33:05 +01:00
Andreas Boll
1f2a890ace configure.ac: Require LLVM for r300 only on x86 and x86_64
b3119a3 introduced a strict LLVM requirement for r300 on all
architectures and thus configure fails on architectures where LLVM is
not available or buggy.

r300 doesn't strictly require LLVM, but for performance reasons we
highly recommend LLVM usage. So require it at least on x86 and x86_64
architectures as we have done before b3119a3.

Fixes: b3119a3 ("configure.ac: Check gallium LLVM version in gallium_require_llvm")
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-27 12:31:17 +01:00
Nicolai Hähnle
c5e76a262a gallium: enable int64 on radeonsi, llvmpipe, softpipe
All of these have had support for the TGSI opcodes since before most of
the glsl compiler work landed.

Also update the docs accordingly, including the missing note about i965.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-27 10:19:48 +01:00
Dave Airlie
93dc5c1a06 st/mesa: add support for enabling ARB_gpu_shader_int64.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-27 10:19:43 +01:00
Dave Airlie
278580729a st/glsl_to_tgsi: add support for 64-bit integers
v2: add conversion opcodes.

v3 (idr): Rebase on replacemtn of TGSI_OPCODE_I2U64 with
TGSI_OPCODE_I2I64.

v4 (idr): "cut them down later" => Remove ir_unop_b2u64 and
ir_unop_u642b.  Handle these with extra i2u or u2i casts just like
uint(bool) and bool(uint) conversion is done.

v5 (nha): add clarifying comment about a subtle assumption

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-27 10:19:39 +01:00
Dave Airlie
f804506d4d gallium: Add integer 64 capability
v1.1: move to using a normal CAP. (Marek)

v2: fill in the cap everywhere

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-27 10:19:25 +01:00
Topi Pohjolainen
a283a4ee2f meta: Refactor texture format translation
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-01-27 08:57:26 +02:00
Topi Pohjolainen
542bb85049 intel/blorp/dbg: Name blit shaders for easy recognition in dumps
Blorp clears already have an equivalent.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-01-27 08:57:26 +02:00
Topi Pohjolainen
56094cfb9e i965/hiz/gen6: Stop setting false qpitch
which is not applicable for "all slices at each lod". Current
logic makes one to believe it has some purpose. When miptree
layout is calculated brw_miptree_layout_texture_array() sets
the qpitch unconditionally but later on ignores it altogether
for ALL_SLICES_AT_EACH_LOD.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-01-27 08:57:26 +02:00
Topi Pohjolainen
b13d30a72b i965/blorp/gen6: Remove dead code in hiz setup
Such as comment states for intel_miptree_hiz_buffer::mt, hiz_mt
only exists for gen6. In addition, intel_hiz_miptree_buf_create()
uses MIPTREE_LAYOUT_FORCE_ALL_SLICE_AT_LOD unconditionally.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-01-27 08:57:26 +02:00
Topi Pohjolainen
b864e3d7ee i965/gen6: Simplify hiz surface setup
In intel_hiz_miptree_buf_create() intel_miptree_aux_buffer::bo
is unconditionally initialised to point to the same buffer
object as hiz_mt does. The same goes for
intel_miptree_aux_buffer::pitch/qpitch.

This will make following patches simpler to read.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-01-27 08:57:26 +02:00
Topi Pohjolainen
40bf622ced i965/blorp/gen6: Simplify hiz surface setup
In intel_hiz_miptree_buf_create() intel_miptree_aux_buffer::bo
is unconditionally initialised to point to the same buffer
object as hiz_mt does. Also intel_miptree_aux_buffer::offset
is initialised to zero (calloc()).

This will make following patches significantly simpler to read.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-01-27 08:57:26 +02:00
Topi Pohjolainen
5201d2991b i965/gen6: Remove check for stencil format
There are is no alternative.

Reviewed-by: Samuel Iglesias Gons\341lvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-01-27 08:57:26 +02:00
Topi Pohjolainen
19412abb3f i965: Remove check for hiz on earlier gens than SNB
Only caller, brw_workaround_depthstencil_alignment(), returns
early for gen6+.

While at it, reduce scope for brw_get_depthstencil_tile_masks() as
well.

Reviewed-by: Samuel Iglesias Gons\341lvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-01-27 08:57:26 +02:00
Topi Pohjolainen
26a9e039fd i965/miptree: Remove redundant check for null texture
There exact same check earlier in brw_miptree_layout() which
intel_miptree_create_layout() in turn calls unconditionally.

Reviewed-by: Samuel Iglesias Gons\341lvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-01-27 08:57:26 +02:00
Topi Pohjolainen
bcec4113cc i965/miptree: Tell when brw_miptree_layout() fails
In addition, let intel_miptree_create_layout() release the
miptree - it is the allocator.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-01-27 08:57:25 +02:00
Topi Pohjolainen
aa9e21a316 i965/meta: Remove unused brw_get_rb_for_slice()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gons<C3><A1>lvez <siglesias@igalia.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-01-27 08:57:25 +02:00
Michel Dänzer
d9f8bae616 clover: Fix build against clang SVN >= r293097
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-01-27 09:53:14 +09:00
Eric Anholt
9baf1ff8fc vc4: Use NEON to speed up utile stores on Pi2+.
Improves 1024x1024 TexSubImage2D by 41.2371% +/- 3.52799% (n=10).
2017-01-26 12:50:05 -08:00
Eric Anholt
4d30024238 vc4: Use NEON to speed up utile loads on Pi2.
We had a lot of memcpy call overhead because gpu_stride wasn't being
inlined.  But if you split out the stride==8 and stride==16 cases like
this code does while still using memcpy, you'd no longer have glibc's
NEON memcpy applied at which point we'd be doing 16 uncached reads
instead of 64/(NEON memcpy granularity), for about a 30% performance
hit.  By hand writing the assembly, we can get a whole cacheline
loaded at a time.

Unfortunately, NEON intrinsics turned out to be unusable -- they
didn't have the vldm instruction available.

Note that, for now, the NEON code is only enabled when building for ARMv7
(Pi 2+).  We may want to do runtime detection for the Raspbian case, in
the future.

Improves 1024x1024 GetTexImage by 208.256% +/- 7.07029% (n=10).
2017-01-26 12:48:10 -08:00
Eric Anholt
347b69e7d7 vc4: Move LT tiling code to a separate file.
This paves the way for building it twice, with NEON assembly or not.
2017-01-26 12:23:31 -08:00
Eric Anholt
14cf5c60b8 vc4: Use unreachable() in an unreachable codepath for tiling. 2017-01-26 12:23:31 -08:00
Samuel Pitoiset
eca96ea308 gallium/radeon: add VRAM-vis-usage HUD query
This new query returns the current visible usage of VRAM accessed
by the CPU. It will return 0 on radeon because it's unimplemented.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-26 19:40:52 +01:00
Samuel Pitoiset
9f087e1c7c gallium/radeon: query the CPU accessible size of VRAM
R600_DEBUG="info" can be used to display that size, as well as
the total amount of VRAM/GTT.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-26 19:40:14 +01:00
Ian Romanick
13439031c8 mesa: Arrange validate_uniform_parameters parameters to match call sites
Saves a measly 20 bytes on IA32 and nothing on x64.  Depending on
exactly when this is applied, a lot of variation is possible due to
function alignment.

   text	   data	    bss	    dec	    hex	filename
6670131	 228340	  22552	6921023	 699b3f	lib/i965_dri.so before
6670111	 228340	  22552	6921003	 699b2b	lib/i965_dri.so after
6342932	 293872	  29880	6666684	 65b9bc	lib64/i965_dri.so before
6342932	 293872	  29880	6666684	 65b9bc	lib64/i965_dri.so after

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-01-26 09:46:18 -08:00
Ian Romanick
9be5fd3c87 mesa: Arrange _mesa_uniform parameters to match the call sites
By putting the parameters first that match the parameters to the call
site, 4 (of 14) instructions are saved at _mesa_Uniform4fv on x64.  On
IA32, the details of the instructions change, but it is the same count
and mix of instructions.

Before:

0000000000000830 <_mesa_Uniform4fv>:
     830:       48 83 ec 10             sub    $0x10,%rsp
     834:       49 89 d0                mov    %rdx,%r8
     837:       48 8b 15 00 00 00 00    mov    0x0(%rip),%rdx        # 83e <_mesa_Uniform4fv+0xe>
     83e:       89 f8                   mov    %edi,%eax
     840:       89 f1                   mov    %esi,%ecx
     842:       41 b9 02 00 00 00       mov    $0x2,%r9d
     848:       64 48 8b 3a             mov    %fs:(%rdx),%rdi
     84c:       48 8b 97 c8 01 02 00    mov    0x201c8(%rdi),%rdx
     853:       48 8b 72 70             mov    0x70(%rdx),%rsi
     857:       6a 04                   pushq  $0x4
     859:       89 c2                   mov    %eax,%edx
     85b:       e8 00 00 00 00          callq  860 <_mesa_Uniform4fv+0x30>
     860:       48 83 c4 18             add    $0x18,%rsp
     864:       c3                      retq

After:

00000000000007f0 <_mesa_Uniform4fv>:
     7f0:       48 83 ec 10             sub    $0x10,%rsp
     7f4:       48 8b 05 00 00 00 00    mov    0x0(%rip),%rax        # 7fb <_mesa_Uniform4fv+0xb>
     7fb:       41 b9 02 00 00 00       mov    $0x2,%r9d
     801:       64 48 8b 08             mov    %fs:(%rax),%rcx
     805:       48 8b 81 c8 01 02 00    mov    0x201c8(%rcx),%rax
     80c:       6a 04                   pushq  $0x4
     80e:       4c 8b 40 70             mov    0x70(%rax),%r8
     812:       e8 00 00 00 00          callq  817 <_mesa_Uniform4fv+0x27>
     817:       48 83 c4 18             add    $0x18,%rsp
     81b:       c3                      retq

Saves a measly 416 bytes of text on x64.  Depending on exactly when this
is applied, a lot of variation is possible due to function alignment.

   text	   data	    bss	    dec	    hex	filename
6670131	 228340	  22552	6921023	 699b3f	lib/i965_dri.so before
6670131	 228340	  22552	6921023	 699b3f	lib/i965_dri.so after
6343348	 293872	  29880	6667100	 65bb5c	lib64/i965_dri.so before
6342932	 293872	  29880	6666684	 65b9bc	lib64/i965_dri.so after

There is likely to be no performance change with just this patch.
_mesa_uniform immediately calls validate_uniform_parameters with
parameters in the "wrong" (different from the call site) order.

v2: Rebase on GL_ARB_gpu_shader_fp64.

v3: Rebase on GL_ARB_gpu_shader_int64.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-01-26 09:46:14 -08:00
Ian Romanick
9f7ac45ce4 mesa: Arrange _mesa_uniform_matrix parameters to match the call sites
By putting the parameters first that match the parameters to the call
site, 4 (of 16) instructions are saved at _mesa_UniformMatrix4fv on
x64.  On IA32, the details of the instructions change, but it is the
same count and mix of instructions.

Before:

0000000000001380 <_mesa_UniformMatrix4fv>:
    1380:       48 83 ec 10             sub    $0x10,%rsp
    1384:       48 8b 05 00 00 00 00    mov    0x0(%rip),%rax        # 138b <_mesa_UniformMatrix4fv+0xb>
    138b:       41 89 f8                mov    %edi,%r8d
    138e:       41 89 f1                mov    %esi,%r9d
    1391:       0f b6 d2                movzbl %dl,%edx
    1394:       64 48 8b 38             mov    %fs:(%rax),%rdi
    1398:       48 8b b7 c8 01 02 00    mov    0x201c8(%rdi),%rsi
    139f:       48 8b 76 70             mov    0x70(%rsi),%rsi
    13a3:       68 06 14 00 00          pushq  $0x1406
    13a8:       51                      push   %rcx
    13a9:       52                      push   %rdx
    13aa:       b9 04 00 00 00          mov    $0x4,%ecx
    13af:       ba 04 00 00 00          mov    $0x4,%edx
    13b4:       e8 00 00 00 00          callq  13b9 <_mesa_UniformMatrix4fv+0x39>
    13b9:       48 83 c4 28             add    $0x28,%rsp
    13bd:       c3                      retq

After:

0000000000001360 <_mesa_UniformMatrix4fv>:
    1360:       48 83 ec 10             sub    $0x10,%rsp
    1364:       48 8b 05 00 00 00 00    mov    0x0(%rip),%rax        # 136b <_mesa_UniformMatrix4fv+0xb>
    136b:       0f b6 d2                movzbl %dl,%edx
    136e:       64 4c 8b 00             mov    %fs:(%rax),%r8
    1372:       49 8b 80 c8 01 02 00    mov    0x201c8(%r8),%rax
    1379:       68 06 14 00 00          pushq  $0x1406
    137e:       6a 04                   pushq  $0x4
    1380:       6a 04                   pushq  $0x4
    1382:       4c 8b 48 70             mov    0x70(%rax),%r9
    1386:       e8 00 00 00 00          callq  138b <_mesa_UniformMatrix4fv+0x2b>
    138b:       48 83 c4 28             add    $0x28,%rsp
    138f:       c3                      retq

Saves a measly 576 bytes of text on x64.

   text	   data	    bss	    dec	    hex	filename
6670131	 228340	  22552	6921023	 699b3f	lib/i965_dri.so before
6670131	 228340	  22552	6921023	 699b3f	lib/i965_dri.so after
6343924	 293872	  29880	6667676	 65bd9c	lib64/i965_dri.so before
6343348	 293872	  29880	6667100	 65bb5c	lib64/i965_dri.so after

v2: Rebase on GL_ARB_gpu_shader_fp64.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-01-26 09:46:09 -08:00
Ian Romanick
874393186b mesa: Trivial clean-ups in uniform_query.cpp
This is C++, so we can mix code and declarations.  Doing so allows
constification.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-01-26 09:46:07 -08:00
Lionel Landwerlin
bbe8705c57 spirv: handle undefined components for OpVectorShuffle
Fixes:
   dEQP-VK.spirv_assembly.instruction.compute.opspecconstantop.vector_related
   dEQP-VK.spirv_assembly.instruction.graphics.opspecconstantop.vector_related*

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
2017-01-26 17:31:21 +00:00
Lionel Landwerlin
df7063cba3 spirv: handle OpUndef as part of the variable parsing pass
Looking at the following bit of SPIRV shader :

...
%zero        = OpConstant %i32 0
%ivec3_0     = OpConstantComposite %ivec3 %zero %zero %zero
%vec3_undef  = OpUndef %ivec3
%sc_0        = OpSpecConstant %i32 0
%sc_1        = OpSpecConstant %i32 0
%sc_2        = OpSpecConstant %i32 0
...

Our compiler currently stops parsing variables & types on the OpUndef
and switches to instructions, leaving the following sc_[0-2] variables
untreated.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
2017-01-26 17:29:29 +00:00
Lionel Landwerlin
c3421106ec anv: fix descriptor pool internal size allocation
The size of the pool is slightly smaller than the size of the
structure containing the whole pool. We need to take that into account
on when setting up the internals.

Fixes a crash due to out of bound memory access in:
   dEQP-VK.api.descriptor_pool.out_of_pool_memory

v2: Drop debug traces (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
2017-01-26 17:24:21 +00:00
Kenneth Graunke
f8f7ea508b i965: Make intelEmitCopyBlit not truncate large strides.
When trying to blit larger tiled surfaces, the pitch can be larger than
32768 bytes, which means it won't fit in a GLshort.  Passing it in will
truncate the stride to 0, which has...surprising results.

The pitch can be up to 32,768 DWords, or 128kB.  We measure it in bytes,
but divide by 4 when programming it.  So we need to handle values up to
131,072.  Switch from GLshort to int32_t to avoid the truncation.

Fixes GL45-CTS.gtf30.GL3Tests.depth_texture.depth_texture_copyteximage
at widths greater than 8192.

v2: Use int32_t as negative values can be used (Jason).

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-26 01:43:20 -08:00
Kenneth Graunke
fcf723b647 i965: Use a UW source type for CS_OPCODE_CS_TERMINATE.
SIMD16 compute shaders use a send(16) with mlen 1 for the EOT message,
using a source of g127 for the single register.  With a UD type, this
supposedly could read g128, which doesn't exist, causing the simulator
to get cranky.  Use a UW type to avoid this.

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-01-26 00:52:52 -08:00
Iago Toral Quiroga
9b25769da6 anv/lower_input_attachments: honor sample index parameter to subpassLoad()
According to GL_KHR_vulkan_glsl, the signature of subpassLoad() is:

gvec4 subpassLoad(gsubpassInput   subpass);
gvec4 subpassLoad(gsubpassInputMS subpass, int sample);

So the multisampled case always receives an explicit sample index that we
should use. The current implementation was ignoring this parameter
and using gl_SampleID value instead.

Fixes:
dEQP-VK.pipeline.multisample_shader_builtin.sample_id.*

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
2017-01-26 08:11:21 +01:00
Kenneth Graunke
5106df85da i965: Fix fast depth clears for surfaces with a dimension of 16384.
I hadn't bothered to set this bit because I figured it would just
paper over us getting the rectangle wrong.  But it turns out that
there is a legitimate reason to use it, so let's do so.

The alternative would be to chop up 16k clears to multiple 8k clears,
which is pointlessly painful.

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2017-01-25 22:24:08 -08:00
Chad Versace
022e5c7e5a anv: Implement VK_KHR_get_physical_device_properties2
Reviewed-by: Jason Ekstranad <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-25 19:18:47 -08:00
Chad Versace
cd03021c83 anv: Refactor anv_GetPhysicalDeviceQueueFamilyProperties()
Add a helper function, anv_get_queue_family_properties(), which fills the
struct.  This patch reduces churn in the following patch that implements
vkGetPhysicalDeviceQueueFamilyProperties2KHR.

Reviewed-by: Jason Ekstranad <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-25 19:18:46 -08:00
Chad Versace
5826190095 anv: Refactor anv_GetPhysicalDeviceFormatProperties()
Add a helper function, anv_get_image_format_properties(), which does all
the work and has a VkPhysicalDeviceImageFormatInfo2KHR parameter. This
patch reduces churn in the following patch that implements
vkGetPhysicalDeviceImageFormatProperties2KHR.

Reviewed-by: Jason Ekstranad <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-25 19:18:43 -08:00
Chad Versace
b2de77a07d anv: Revive struct anv_common
The struct was deleted by:
  commit efe9d1cde3
  Author: Edward O'Callaghan <funfunctor@folklore1984.net>
  Subject: anv: Clean up some unused variables

Unlike the original anv_common, the new one has a non-const pNext
pointer because we will use it for the output structs of
VK_KHR_get_physical_device_properties2.

v2:
  - Retype pNext from void* to struct anv_common*.

Reviewed-by: Jason Ekstranad <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-25 19:18:33 -08:00
Chad Versace
c5d99c9983 anv: Define macro anv_debug()
This is a printf-like macro that prints a debug message to stderr when
built with DEBUG.  If no DEBUG, then do nothing.

Reviewed-by: Jason Ekstranad <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-25 19:17:45 -08:00
Ian Romanick
fd43bee0ea mesa: Fix copy-and-paste bug in _mesa_(Program|)Uniform[1234](i|ui)64vARB functions
All of the functions were passing 1 to _mesa_uniform instead of passing
count.

Fixes 16 unsed parameter warnings like:

main/uniforms.c: In function ‘_mesa_Uniform1i64vARB’:
main/uniforms.c:1692:47: warning: unused parameter ‘count’ [-Wunused-parameter]
 _mesa_Uniform1i64vARB(GLint location, GLsizei count, const GLint64 *value)
                                               ^~~~~

This is why I build with extra warnings enabled.  Unfortunately, there
are so many unused parameter warnings in Mesa that I didn't notice these
added warnings for over 6 months. :(

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-25 09:28:40 -08:00
Lionel Landwerlin
173dd60ced spirv: bump headers to SPIRV 1.1
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-25 17:22:23 +00:00
Lionel Landwerlin
05e2d99bf2 spirv: add default handler for new enums
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-25 17:22:23 +00:00
Lionel Landwerlin
4fd54d611f spirv: fix typos
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-25 17:21:15 +00:00
Lionel Landwerlin
25e21cb8d0 anv: set command buffer to NULL when allocations fail
The spec section 5.2 says:

   "vkAllocateCommandBuffers can be used to create multiple command
   buffers. If the creation of any of those command buffers fails, the
   implementation must destroy all successfully created command buffer
   objects from this command, set all entries of the pCommandBuffers
   array to VK_NULL_HANDLE and return the error."

Fixes:
   dEQP-VK.api.object_management.alloc_callback_fail_multiple.command_buffer_primary
   dEQP-VK.api.object_management.alloc_callback_fail_multiple.command_buffer_secondary

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-01-25 17:15:30 +00:00
Jason Ekstrand
d6397dd625 vulkan/wsi: Lower the maximum image sizes
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "17.0" <mesa-dev@lists.freedesktop.org>
2017-01-25 09:05:30 -08:00
Jason Ekstrand
659edd9f5c vulkan/wsi/wayland: Handle VK_INCOMPLETE for GetPresentModes
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "17.0" <mesa-dev@lists.freedesktop.org>
2017-01-25 09:05:25 -08:00
Jason Ekstrand
dc578ef060 vulkan/wsi/wayland: Handle VK_INCOMPLETE for GetFormats
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "17.0" <mesa-dev@lists.freedesktop.org>
2017-01-25 09:04:56 -08:00
George Kyriazis
e259efd805 swr: Update fs texture & sampler state logic
In swr_update_derived() update texture and sampler state on a new fragment
shader.  GALLIUM_HUD can update fs using a previously bound texture and
sampler.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-01-25 10:02:50 -06:00
Samuel Pitoiset
cff199ceb7 gallium/radeon: add a new HUD query for the number of mapped buffers
Useful when debugging applications which map a ton of buffers
and also because we used to run into Linux's limit on the number
of simultaneous mmap() calls.

v2: - update the commit message

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-25 15:19:21 +01:00
Iago Toral Quiroga
56495080ed spirv: handle gl_SampleMask
SPIR-V maps both gl_SampleMask and gl_SampleMaskIn to the same
builtin (SampleMask). The only way to tell which one we are dealing with
is to check if it is an input or an output.

Fixes:
dEQP-VK.pipeline.multisample_shader_builtin.sample_mask.write.*

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-25 08:08:16 +01:00
Iago Toral Quiroga
9467d78d38 spirv: acknowledge multisampled input attachments
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-25 08:07:09 +01:00
Dave Airlie
2ab2be092d radv: program a default point size.
Along the lines of what
3b804819 anv: Default PointSize to 1.0 if not written by the shader
does for anv, program a default point size in the hw of 1.0.

This preempt fixes a bunch of geom shader tests.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-25 09:58:38 +10:00
Marek Olšák
eac7df43ca radeonsi: handle first_non_void correctly in si_create_vertex_elements
This fixes R11G11B10_FLOAT, because it's in the category of "OTHER",
meaning that it doesn't have any channel description.

Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-24 23:52:01 +01:00
Marek Olšák
d9ef549238 st/mesa: destroy pipe_context before destroying st_context (v2)
If radeonsi starts compiling an optimized shader variant asynchronously
with a GL debug callback set and the application destroys the GL context,
radeonsi crashes when trying to write shader stats into the debug output
of a non-existent context after compilation, because st/mesa was destroyed
before pipe_context.

Firefox with WebGL2 enabled hits this bug.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99456

v2: protect against a double destroy in st_create_context_priv and callers.

Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-24 23:52:01 +01:00
Timothy Arceri
dd65f0efc9 nir: bump loop max unroll limit
The original number was chosen in an attempt to match the limits applied to
GLSL IR.

A look at the git history of the why these limits were chosen for GLSL IR
shows it was more to do with the slow speed of unrolling large loops in
GLSL IR than anything else. The speed of loop unrolling in NIR is not a
problem so we may wish to bump this even higher in future.

No shader-db change, however a furture change will disbale the GLSL IR
optimisation loop in the i965 backend results in 4 loops from The Talos
Principle failing to unroll. Bumping the limit allows them to unroll which
results in the instruction count matching the previous output from when the
GLSL IR opts were still enabled.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-25 09:43:29 +11:00
Timothy Arceri
34ab9b0947 glsl: lower constant arrays to uniform arrays before optimisation loop
Previously the constant array would not get copy propagated until the backend
did its GLSL IR opt loop. I plan on removing that from i965 shortly which
caused huge regressions in Deus-ex and Tomb Raider which have large
constant arrays. Moving lowering before the opt loop in the GLSL linker
fixes this and unexpectedly improves some compute shaders also.

shader-db results BDW:

instructions helped:   shaders/closed/steam/deus-ex-mankind-divided/374.shader_test CS SIMD16: 204 -> 194 (-4.90%)
instructions helped:   shaders/closed/steam/deus-ex-mankind-divided/318.shader_test CS SIMD8: 1010 -> 741 (-26.63%)
instructions helped:   shaders/closed/steam/deus-ex-mankind-divided/144.shader_test CS SIMD8: 542 -> 385 (-28.97%)

cycles helped:   shaders/closed/steam/deus-ex-mankind-divided/318.shader_test CS SIMD8: 1831382 -> 1818492 (-0.70%)
cycles helped:   shaders/closed/steam/deus-ex-mankind-divided/144.shader_test CS SIMD8: 216238 -> 206180 (-4.65%)
cycles helped:   shaders/closed/steam/deus-ex-mankind-divided/374.shader_test CS SIMD16: 18484 -> 16644 (-9.95%)

total instructions in shared programs: 13060313 -> 13059877 (-0.00%)
instructions in affected programs: 1756 -> 1320 (-24.83%)
helped: 3
HURT: 0

total cycles in shared programs: 256586698 -> 256561910 (-0.01%)
cycles in affected programs: 2066104 -> 2041316 (-1.20%)
helped: 3
HURT: 0

V3: only call the opt loop if lowering progressed (Suggested by Eric)

V2: call opts before and after lowering (Suggested by Ken)

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-25 09:07:30 +11:00
Ian Romanick
c4a0c1efff mesa: Don't advertise GL_OES_read_format in core profile
OpenGL ES implementations are not allowed to ship ARB extensions, and
OpenGL implementations are not allowed to ship OES extensions.

The functionality is also included in GL_ARB_ES2_compatibility.  Ever
OpenGL core-profile driver currently exposes both extensions.  I don't
know of any applications that explicitly check for GL_OES_read_format,
so removing it seems very unlikely to cause problems.  No functionality
is removed.

I have left this extension in place for compatibility profile.  There
are still OpenGL 1.x drivers in Mesa, and adding code to check for
compatibility profile and not GL_ARB_ES2_compatibility for
GL_IMPLEMENTATION_COLOR_READ_TYPE and GL_IMPLEMENTATION_COLOR_READ_FORMAT
just feels dumb.

Three other other alternatives considered:

 - Remove the string from compatibility profile drivers but leave the
   functionality in place.

 - Add a flag to expose the extension string, and set it in every OpenGL
   driver that does not expose GL_ARB_ES2_compatibility (and those
   drivers only).  I tried this.  You can't have two instances of an
   extension in the extension table (one dummy_true for ES1 and one with
   a flag for compatibility profile), so the implementation requires a
   bit of effort.

 - Only expose the extension in compatibility if the version is less
   than 2.0.  I didn't see an easy way to do this.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
2017-01-24 13:39:26 -08:00
Brian Paul
b87eedd405 docs: fix incorrect link to 12.0.6 release notes
Trivial.
2017-01-24 14:30:44 -07:00
Jason Ekstrand
a435991d3c anv: Expose VK_KHR_maintenance1
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-24 12:27:48 -08:00
Jason Ekstrand
756533520e anv: Return better errors from AllocateDescriptorSets
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-24 12:27:48 -08:00
Jason Ekstrand
99bb4c22a5 anv: Allow selecting the slice of a 3D image
As per VK_KHR_maintenance1, clients can render to a slice of a 3D image
by creating a VK_IMAGE_VIEW_TYPE_2D view of it.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-24 12:27:48 -08:00
Jason Ekstrand
6d79111834 anv: Report FORMAT_FEATURE_TRANSFER_SRC/DST_BIT_KHR
As of VK_KHR_maintenance1, these are supposed to be reported for any
formats on which we support transfer operations.  For us, this is
anything that we can texture from.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-24 12:27:48 -08:00
Jason Ekstrand
8a8630486b anv: Add trivial support for TrimCommandPoolKHR
Our command buffers already efficiently use a global pool so trimming
doesn't really need to do anything.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-24 12:27:48 -08:00
Jason Ekstrand
5edcc96bf6 anv: Set viewport extents correctly when height is negative
As per VK_KHR_maintenance1, setting a negative height in the viewport
can be used to get flipped coordinates.  This is, aparently, very useful
when porting D3D apps to Vulkan.  All we need to do to support this is
to make sure we actually set the min and max correctly.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-24 12:27:48 -08:00
Matt Turner
045f38a507 vulkan: Don't install vk_platform.h or vulkan.h.
These files belong to the vulkan loader.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-24 11:27:20 -08:00
Roland Scheidegger
aceae09ef0 glsl: fix compile errors with mingw due to missing PRIx64 definitions
define __STDC_FORMAT_MACROS and include <inttypes.h> (same as
ir_builder_print_visitor.cpp already does).

Otherwise, some mingw build errors out (since
8e7e1ae036 and
bbce1c538d presumably) with:
src/compiler/glsl/ir_print_visitor.cpp:479:40: error: expected ‘)’ before ‘PRIu64’
   case GLSL_TYPE_UINT64:fprintf(f, "%" PRIu64, ir->value.u64[i]); break;

(Note even with that fix I get other format specifier warnings:
src/compiler/glsl/ir_print_visitor.cpp:473:47:
warning: unknown conversion type character ‘a’ in format [-Wformat=]
                fprintf(f, "%a", ir->value.f[i]);
                                               ^
src/compiler/glsl/ir_print_visitor.cpp:473:47:
warning: too many arguments for format [-Wformat-extra-args]
but it still compiles at least)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-24 19:12:46 +01:00
Roland Scheidegger
f4df21ed95 gallivm: don't try to use fast rcp for fdiv
The use of fast rcp instruction is disabled, and will always fall back
to use a division instead (1 / x). Hence, if we get a division opcode,
it doesn't make much sense trying to split that into rcp/mul.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-24 19:12:46 +01:00
Roland Scheidegger
25208949d7 gallivm: (trivial) fix ddiv cpu implementation
we can't use the cpu implementation of fdiv, as this one uses different
lp_build_context, which causes assertion failure.
Just use default fdiv action (there is no fast rcp for doubles which we
could potentially use anyway).

Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-24 19:12:46 +01:00
Roland Scheidegger
3b575a955c tgsi: implement ddiv opcode
softpipe (along with llvmpipe) claims to support arb_gpu_shader_fp64,
so we really need to support that opcode.

Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-24 19:12:46 +01:00
Jason Ekstrand
4c180f9633 i965/blorp: Use the correct ISL format for combined depth/stencil
In brw_blorp_copyteximage, we use the format from the render buffer.
This could be a combined depth/stencil format.  In this case, we handle
stencil properly but we give blorp the wrong ISL format.  Specifically,
we would give blorp ISL_FORMAT_R32G32B32A32_FLOAT which is the wrong
size was causing GPU hangs.

Fixes: GL45-CTS.gtf30.GL3Tests.packed_depth_stencil.packed_depth_stencil_copyteximage

Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-01-24 10:06:07 -08:00
Samuel Pitoiset
0054dded03 st/glsl_to_tgsi: fix compilation warnings since int64 types
state_tracker/st_glsl_to_tgsi.cpp:302:28: warning: ‘glsl_to_tgsi_instruction::tex_type’
	is too small to hold all values of ‘enum glsl_base_type’
    glsl_base_type tex_type:4;

Fixes: 8ce53d4a2f ("glsl: Add basic ARB_gpu_shader_int64 types")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-24 12:45:39 +01:00
Samuel Pitoiset
d90d37db73 gallium/radeon: undef the very specific UPDATE_COUNTER macro
Also, wrap this into a do { ... } while (0). Suggested by Nicolai.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-24 11:17:25 +01:00
Topi Pohjolainen
ba6399df94 i965/blorp: Add also depth and stencil buffers to render cache
v2 (Jason, Curro): Add stencil also even though it is not
                   enabled yet.

Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-01-24 10:41:58 +02:00
Ben Widawsky
e63ab36d0e gbm: Fix width height getters return type (trivial)
v2: Other way round... to make consistent, make both return type have
the fixed width - uint32_t.

Cc: Daniel Stone <daniel@fooishbar.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Daniel Stone <daniels@collabora.com>
2017-01-23 21:43:38 -08:00
Ben Widawsky
bb9ff98b4c gbm: Move getters to match order in header file (trivial)
Other things are out of order, but I need to add a getter so I'm just
fixing those.

This helps people adding to GBM know where the right place to put things
is.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Daniel Stone <daniels@collabora.com>
2017-01-23 21:43:34 -08:00
Emil Velikov
530cd248f5 docs: add news item and link release notes for 12.0.6
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-24 02:15:30 +00:00
Emil Velikov
9b16bd8b6c docs: use correct year for the 12.0.6 release notes
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 13953f012d)
2017-01-24 02:15:30 +00:00
Emil Velikov
c16e7e0a60 docs: add sha256 checksums for 12.0.6
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 36e3f2542d)
2017-01-24 02:15:30 +00:00
Emil Velikov
b1137cb9de docs: add release notes for 12.0.6
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 555885a0bf)
2017-01-24 02:15:30 +00:00
Emil Velikov
9924cdecd9 docs/releasing: remove stray "cd"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-24 02:15:29 +00:00
Ilia Mirkin
b755f2f233 nv50: add support for MUL_ZERO_WINS property
This is simply keyed off the vertex shader, as that's guaranteed to be
present in any pipeline.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-01-23 20:37:14 -05:00
Ilia Mirkin
8c764a2321 nvc0: add support for MUL_ZERO_WINS property
This sets the dnz flag on all the relevant multiplication operations. At
emission time, this will only be supported by nvc0+, so nv50 will need a
different solution.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-01-23 20:37:14 -05:00
Ilia Mirkin
e1346f25bf st/nine: set the MUL_ZERO_WINS flag when supported
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2017-01-23 20:37:10 -05:00
Ilia Mirkin
6e40938fbc gallium: add PIPE_CAP_TGSI_MUL_ZERO_WINS
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2017-01-23 20:36:47 -05:00
Ilia Mirkin
a2b2cd81d1 gallium: add TGSI_PROPERTY_MUL_ZERO_WINS
This will be useful for proper D3D9 emulation, where this behavior is
expected by some shaders.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2017-01-23 20:35:55 -05:00
Marek Olšák
573bf0940a radeonsi: always set the TCL1_ACTION_ENA when invalidating L2
Some CIK-VI docs say this is the default behavior on SI. That doesn't
answer whether it's also the default behavior on CIK-VI.

Cc: 17.0 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-23 23:43:38 +01:00
Marek Olšák
5d3dd70cab radeonsi: don't declare LDS in TES
not used since we started using the offchip tess ring

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-23 23:43:38 +01:00
Marek Olšák
59c5da40ed radeonsi: preload PS inputs only if KILL is used
so that most shaders can get lower VGPR usage thanks to lazy input loading.
I think this is a more accurate constraint that prevents the black transitions
in Witcher 2.

Affected shaders (7758):
Max Waves: 57437 -> 58231 (1.38 %)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-23 23:43:38 +01:00
Marek Olšák
7b32ae4df5 gallium/radeon: adjust the rule for using the LINEAR_ALIGNED layout
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-23 23:43:38 +01:00
Marek Olšák
e248390e93 winsys/amdgpu: drop all IBs if at least one was rejected within the context
The corruption is inevitable and hangs are possible too.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-23 23:43:38 +01:00
Marek Olšák
1840800860 winsys/amdgpu: report a rejected IB as a lost context
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-23 23:43:38 +01:00
Dave Airlie
dcfcb3047c vulkan: import latest registry for 1.0.39 extensions.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-24 08:13:37 +10:00
Dave Airlie
e38bee34bf vulkan: bump vulkan.h to 1.0.39 version
This introduces a bunch of new extension defines.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-24 08:13:23 +10:00
Grazvydas Ignotas
f65b3641c3 radv: don't resubmit the same cs over and over while tracing
Fixes: 97dfff54 ("radv: Dump command buffer on hang.")
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
CC: <mesa-stable@lists.freedesktop.org>
2017-01-23 22:27:05 +01:00
Samuel Pitoiset
aa2ace8e49 gallium/radeon: add HUD queries for monitoring some hw blocks
It's also possible to monitor them via performance counters but
the hardware can only use two counters simultaneously. It seems
easier to re-use the existing code which reads from MMIO instead
of writing a multi-pass approach.

v2: - add new lines after ':'

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-23 21:19:49 +01:00
Samuel Pitoiset
a704f19247 gallium/radeon: refactor the GRBM counters path
This will allow to expose more queries in order to know which
blocks are busy/idle.

v2: - add new lines after ':'

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-23 21:19:49 +01:00
George Kyriazis
00847e4f14 swr: Align query results allocation
Some query results struct contents are declared as cache line aligned.
Use aligned malloc, and align the whole struct, to be safe.

Fixes crash when compiling with clang.

CC: <mesa-stable@lists.freedesktop.org>

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-01-23 14:15:54 -06:00
Bruce Cherniak
b829206b07 swr: Prune empty nodes in CalculateProcessorTopology.
CalculateProcessorTopology tries to figure out system topology by
parsing /proc/cpuinfo to determine the number of threads, cores, and
NUMA nodes.  There are some architectures where the "physical id" begins
with 1 rather than 0, which was creating and empty "0" node and causing a
crash in CreateThreadPool.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97102
Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
CC: <mesa-stable@lists.freedesktop.org>
2017-01-23 13:52:26 -06:00
Matt Turner
d349449a16 i965: Use UNUSED to silence unused variable (used in assert). 2017-01-23 10:50:20 -08:00
Rainer Hochecker
09b140abb5 dri: allow 16bit R/GR images to be exported via drm buffers
This allows eglCreateImageKHR to access P010 surfaces created by vaapi

Signed-off-by: Rainer Hochecker <fernetmenta@online.de>
Acked-by: Ben Widawky <ben@bwidawsk.net>
2017-01-23 08:47:15 -08:00
Christian König
1338d912f5 st/va: make sure that we call begin_frame() only once v2
This fixes "st/va: delay calling begin_frame until we have all parameters".

v2: call begin frame after decoder (re)creation as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Tested-by: Andy Furniss <adf.lists@gmail.com>
2017-01-23 17:00:04 +01:00
Eric Engestrom
50141e131a drirc: remove spurious tabs
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-23 16:34:58 +01:00
Nicolai Hähnle
cfabbbcfd7 st/glsl_to_tgsi: use DDIV instead of DRCP + DMUL
Fixes GL45-CTS.gpu_shader_fp64.built_in_functions.

v2: use DDIV unconditionally (Roland)

Reviewed-by: Roland Scheidegger <sroland@vmware.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Tested-by: Glenn Kennard <glenn.kennard@gmail.com>
Tested-by: James Harvey <lothmordor@gmail.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
2017-01-23 16:17:26 +01:00
Nicolai Hähnle
b71c415c3d glsl: split DIV_TO_MUL_RCP into single- and double-precision flags
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Tested-by: Glenn Kennard <glenn.kennard@gmail.com>
Tested-by: James Harvey <lothmordor@gmail.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
2017-01-23 16:17:19 +01:00
Nicolai Hähnle
e4f8f9a638 r600: implement DDIV
Tested-by: Glenn Kennard <glenn.kennard@gmail.com>
Tested-by: James Harvey <lothmordor@gmail.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
2017-01-23 16:17:15 +01:00
Nicolai Hähnle
488560cfe6 r600: factor out cayman_emit_unary_double_raw
We will use it for DDIV.

Tested-by: Glenn Kennard <glenn.kennard@gmail.com>
Tested-by: James Harvey <lothmordor@gmail.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
2017-01-23 16:17:12 +01:00
Nicolai Hähnle
76b02d2fe1 r600: double multiply can handle only one multiply at a time
It seems clear that trying to multiply two pairs of doubles would result
in the temporary register getting overwritten by the second pair. So
make the code more explicit.

Tested-by: Glenn Kennard <glenn.kennard@gmail.com>
Tested-by: James Harvey <lothmordor@gmail.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
2017-01-23 16:15:45 +01:00
Timothy Arceri
f3f9207786 glsl: fix tes linking regression
Fixes regression caused by cbeba6bd48. I accidentally pushed the
wrong version of the patch.
2017-01-23 19:07:22 +11:00
Timothy Arceri
38a67f020d mesa: remove unused gl_shader_info field from gl_linked_shader
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-23 14:48:04 +11:00
Timothy Arceri
79f07e87c9 mesa/glsl: set and get cs layouts to and from shader_info
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-23 14:48:04 +11:00
Timothy Arceri
b96bddae67 mesa/glsl: set and get gs layouts directly to and from shader_info
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-23 14:48:04 +11:00
Timothy Arceri
cbeba6bd48 mesa/glsl/i965: set and get tes layouts directly to and from shader_info
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-23 14:48:04 +11:00
Timothy Arceri
64e201ab8f glsl: use last_vert_prog to get last {clip,cull}_distance_array_size
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-23 14:48:04 +11:00
Timothy Arceri
fc707f570f mesa/glsl: set {clip,cull}_distance_array_size directly in gl_program
There are some line wrapping violations here but those lines will get
deleted in the following patch.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-23 14:48:04 +11:00
Timothy Arceri
f86d15ed94 st/mesa/glsl: change xfb_program field to last_vert_prog
Now that the i965 backend doesn't depend on this field we can
make it more generic and short circuit a bunch of code paths.

The new field will be used in a following patch for another
clean-up.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-23 14:48:04 +11:00
Timothy Arceri
c505d6d852 mesa: use gl_program for CurrentProgram rather than gl_shader_program
This makes much more sense and should be more performant in some
critical paths such as SSO validation which is called at draw time.

Previously the CurrentProgram array could have contained multiple
pointers to the same struct which was confusing and we would often
need to fish out the information we were really after from the
gl_program anyway.

Also it was error prone to depend on the _LinkedShader array for
programs in current use because a failed linking attempt will lose
the infomation about the current program in use which is still
valid.

V2: fix validate_io() to compare linked_stages rather than the
consumer and producer to decide if we are looking at inward
facing shader interfaces which don't need validation.

Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>

To avoid build regressions the following 2 patches were squashed in to
this commit:

mesa/meta: rewrite _mesa_shader_program_use() and _mesa_program_use()

These are rewritten to do what the function name suggests, that is
_mesa_shader_program_use() sets the use of all stage and
_mesa_program_use() sets the use of a single stage.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>

mesa: update active relinked program

This likely fixes a subroutine bug were
_mesa_shader_program_init_subroutine_defaults() would never have been
called for the relinked program as we previously just set
_NEW_PROGRAM as dirty and never called the _mesa_use* functions when
linking.

Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-01-23 14:48:04 +11:00
Rob Clark
31daeb5bf1 freedreno/a5xx: set frag shader threadsize
Signed-off-by: Rob Clark <robdclark@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
2017-01-22 14:12:05 -05:00
Rob Clark
8d6af93e76 freedreno/a5xx: set fragcoordxy properly
What a3xx docs call IJPERSPCENTERREGID.. the xy coord passed into
bary.f.  We were incorrectly setting both this and gl_FragCoord.xy to
the same register resulting in all sorts of hilarity.

Fixes stk, vdrift, 0ad, probably a bunch others.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
2017-01-22 14:11:43 -05:00
Rob Clark
278b97946f freedreno/ir3: setup var locations in standalone compiler
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-01-22 14:11:26 -05:00
Rob Clark
6cc93bedc1 freedreno/a5xx: fix psize
Note spritelist (POINTLIST_PSIZE) seems not to be a thing anymore on
a5xx.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
2017-01-22 14:11:15 -05:00
Rob Clark
141a4f86d6 freedreno/a5xx: srgb fix
Signed-off-by: Rob Clark <robdclark@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
2017-01-22 14:11:04 -05:00
Rob Clark
69fbb458cf freedreno/a5xx: fix int vbos
Signed-off-by: Rob Clark <robdclark@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
2017-01-22 14:10:54 -05:00
Rob Clark
16671e9704 freedreno/a5xx: fix clear for uint/sint formats
Signed-off-by: Rob Clark <robdclark@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
2017-01-22 14:10:42 -05:00
Rob Clark
4d9aa4f67d freedreno/a5xx: fix cull state
Signed-off-by: Rob Clark <robdclark@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
2017-01-22 14:10:28 -05:00
Rob Clark
4c39458460 freedreno: update generated headers
Signed-off-by: Rob Clark <robdclark@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
2017-01-22 14:09:45 -05:00
Lionel Landwerlin
494b63f525 anv: descriptors: don't update immutables samplers with anything but their immutable value
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-21 19:22:27 +00:00
Jason Ekstrand
bb96b03461 nir/search: Use the correct bit size for integer comparisons
The previous code always compared integers as 64-bit.  Due to variations
in sign-extension in the code generated by nir_opt_algebraic.py, this
meant that nir_search doesn't always do what you want.  Instead, 32-bit
values should be matched as 32-bit and 64-bit values should be matched
as 64-bit.  While we're here we unify the unsigned and signed paths.
Now that we're using the right bit size, they should be the same since
the only difference we had before was sign extension.

This gets the UE4 bitfield_extract optimization working again.  It had
stopped working due to the constant 0xff00ff00 getting sign-extended
when it shouldn't have.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
2017-01-21 10:34:21 -08:00
Jason Ekstrand
817f9e3b17 intel/blorp/copy: Properly handle clear colors for CCS_E images
In order to handle CCS_E, we stomp the image format to a UINT format and
then do some bitcasting logic in the shader.  This works fine since SKL
render compression only considers the channel layout of the format and
not the format itself.  In order for this to work on images that have
been fast-cleared, we need to also convert the clear color so that, when
interpreted as UINT, it provides the same bit value as it would have in
the original format.  This fixes a bunch of OpenGL ES CTS tests for
copy_image when we start using CCS more aggressively.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
2017-01-21 10:34:09 -08:00
Kenneth Graunke
bb5db5564f glsl: Rename [u]int64_t tokens.
basetsd.h on Windows defines INT64 and UINT64 typedefs which conflict
with these.  Append "_TOK" to avoid conflicts.

Should fix the Windows build.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 19:39:20 -08:00
Matt Turner
892781d6c7 Revert "i965: Really don't emit Q or UQ moves on Gen < 8"
This reverts commit c95380c404.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-20 19:12:31 -08:00
Matt Turner
d871f8e820 i965: Select DF type for 64-bit integers on Gen < 8.
Gen8 adds Q/UQ types. We attempted to change the types back to DF in the
generator (commit c95380c40), but an assertion added in the FP64 series
(commit e481dcc3) triggers before that code has a chance to execute.

In fact, using Q/UQ in the IR and then changing to DF in the generator
would not work in the presence of source modifiers, etc.

Fixes: d6fcede6 ("i965: Return Q and UQ types for int64 and uint64")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-20 19:12:24 -08:00
Ian Romanick
db6d23cfd2 i965: Enable ARB_gpu_shader_int64 on Gen8+
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Ian Romanick
fc16bf125f i965: Split SIMD16 CMP of Q and UQ instructions
This is basically the same as happens for doubles.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Ian Romanick
51807c6493 i965: Enable 64-bit integer support for almost all unary and binary operations
Integer comparison functions (e.g., nir_op_ilt) are handled in the next
commit.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Ian Romanick
821d7cece8 i965: Enable uploading 64-bit integer uniforms
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Ian Romanick
e0579c5017 i965: Add 64-bit integer support for conversions and bitcasts
v2 (idr): Make the "from" type in a cast unsized.  This reduces the
number of required cast operations at the expensive slightly more
complex code.  However, this will be a dramatic improvement when other
sized integer types are added.  Suggested by Connor.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Ian Romanick
f2fa510594 i965: Enable emitting Q and UQ instructions in the fs backend
v2: Fixup assertion in brw_reg_type_to_hw_type to allow
BRW_REGISTER_TYPE_{UQ,Q} on Gen8+.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Ian Romanick
409e0b2d48 i965: Add support for constant evaluation on Q and UQ types
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Ian Romanick
d6fcede60f i965: Return Q and UQ types for int64 and uint64
It seems like maybe this should return a different type based on Gen.  Q
and UQ only exist on Gen8+, but, based on the old comment, I believe
previous Gens can generate 64-bit moves.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Ian Romanick
c95380c404 i965: Really don't emit Q or UQ moves on Gen < 8
It's much easier to do this in the generator rather than while coming
out of NIR.  brw_type_for_nir_type doesn't know the Gen, so we'd have to
add a bunch of plumbing.  The alternate fix is to not emit int64 moves
for doubles in the first place... but that seems even more difficult.

This change won't catch non-MOV instructions that try to use 64-bit
integer types on Gen < 8.  This may convert certain kinds of bugs in to
different kinds of bugs that are more difficult to detect (since the
assertions in the function won't catch them).

NOTE: I don't think anything can emit mixed-type 64-bit moves until the
same platform supports both ARB_gpu_shader_fp64 and
ARB_gpu_shader_int64.  When we enable int64 on Gen < 8, we can solve
this problem other ways.

This prevents regressions on HSW in the next patch.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Ian Romanick
30164d501d nir: Add support for 64-bit integer types to split_var_copies_block
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2017-01-20 15:41:23 -08:00
Ian Romanick
3c9b35372b nir: Enable 64-bit integer support for almost all unary and binary operations
v2: Don't up-convert the shift count parameter if shift instructions.
Suggested by Connor.  Add type_is_singed() function.  This will make
adding 8- and 16-bit types easier.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
2017-01-20 15:41:23 -08:00
Ian Romanick
fda33e09d8 nir: Shift count for shift opcodes is always 32-bits
Previously both sources were unsized.  This caused problems when the
thing being shifted was 64-bit but the shift count was 32-bit.  The
expectation in NIR is that all unsized sources (and destination) will
ultimately have the same size.

The changes in nir_opt_algebraic.py are to prevent errors like:

 Failed to parse transformation:
03:12:25   (('extract_i8', 'a', 'b'), ('ishr', ('ishl', 'a', ('imul', ('isub', 3, 'b'), 8)), 24), 'options->lower_extract_byte')
03:12:25 Traceback (most recent call last):
03:12:25   File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 610, in __init__
03:12:25     xform = SearchAndReplace(xform)
03:12:25   File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 495, in __init__
03:12:25     BitSizeValidator(varset).validate(self.search, self.replace)
03:12:25   File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 311, in validate
03:12:25     validate_dst_class = self._validate_bit_class_up(replace)
03:12:25   File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 414, in _validate_bit_class_up
03:12:25     src_class = self._validate_bit_class_up(val.sources[i])
03:12:25   File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 420, in _validate_bit_class_up
03:12:25     assert src_class == src_type_bits
03:12:25 AssertionError

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
2017-01-20 15:41:23 -08:00
Ian Romanick
8ad74a2745 nir: Lower packing and unpacking of 64-bit integer types
This change makes me wonder whether double packing should be
reimplemented as int64BitsToDouble(packInt2x32(v)).  I'm a little on the
fence since not all platforms that support fp64 natively support int64.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2017-01-20 15:41:23 -08:00
Ian Romanick
3460d05a71 nir: Add 64-bit integer support for conversions and bitcasts
v2 (idr): "cut them down later" => Remove ir_unop_b2u64 and
ir_unop_u642b.  Handle these with extra i2u or u2i casts just like
uint(bool) and bool(uint) conversion is done.

v3 (idr): Make the "from" type in a cast unsized.  This reduces the
number of required cast operations at the expensive slightly more
complex code.  However, this will be a dramatic improvement when other
sized integer types are added.  Suggested by Connor.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2017-01-20 15:41:23 -08:00
Ian Romanick
3ca0029a0d nir: Add 64-bit integer constant support
v2: Rebase on 19a541f (nir: Get rid of nir_constant_data)

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v1]
2017-01-20 15:41:23 -08:00
Ian Romanick
48e122244b nir: Add GLSL_TYPE_INT64 and GLSL_TYPE_UINT64 to glsl_get_bit_size
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2017-01-20 15:41:23 -08:00
Ian Romanick
81952814a3 glsl: Optimize redundant pack(unpack()) and unpack(pack()) combinations
The lowering passes 64-bit integer operations will generate a lot of
these.

v2: Modify the HANDLE_PACK_UNPACK_INVERSE so that the breaks apply to
the switch instead of the 'do { } while(true)' loop.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Ian Romanick
7122d851aa glsl: Add a lowering pass for 64-bit integer modulus
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Ian Romanick
695b04f7eb glsl: Add "built-in" functions to do 64%64 => 64 modulus
These functions are directly available in shaders.  A #define is added
to detect the presence.  This allows these functions to be tested using
piglit regardless of whether the driver uses them for lowering.  The
GLSL spec says that functions and macros beginning with __ are reserved
for use by the implementation... hey, that's us!

v2: Use function inlining.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Ian Romanick
82c31f3eb9 glsl: Add a lowering pass for 64-bit integer division
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Ian Romanick
012f2995c3 glsl: Add "built-in" functions to do 64/64 => 64 division
These functions are directly available in shaders.  A #define is added
to detect the presence.  This allows these functions to be tested using
piglit regardless of whether the driver uses them for lowering.  The
GLSL spec says that functions and macros beginning with __ are reserved
for use by the implementation... hey, that's us!

v2: Use function inlining.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Ian Romanick
50d52df278 glsl: Add a lowering pass for 64-bit integer sign()
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Ian Romanick
6b03b345eb glsl: Add "built-in" function for 64-bit integer sign()
These functions are directly available in shaders.  A #define is added
to detect the presence.  This allows these functions to be tested using
piglit regardless of whether the driver uses them for lowering.  The
GLSL spec says that functions and macros beginning with __ are reserved
for use by the implementation... hey, that's us!

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Ian Romanick
6c3af04363 glsl: Add a lowering pass for 64-bit integer multiplication
v2: Rename lower_64bit.cpp and lower_64bit_test.cpp to lower_int64.
Suggested by Matt.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Ian Romanick
330fc2413c glsl: Add "built-in" functions to do 64x64 => 64 multiplication
These functions are directly available in shaders.  A #define is added
to detect the presence.  This allows these functions to be tested using
piglit regardless of whether the driver uses them for lowering.  The
GLSL spec says that functions and macros beginning with __ are reserved
for use by the implementation... hey, that's us!

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Ian Romanick
aa38bf1e59 glsl: Move builtin_function related prototypes to a separate file
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Ian Romanick
8358e58f25 glsl/standalone: Enable ARB_gpu_shader_int64
v2: Add missing break in GLSL_TYPE_INT64 case.  Notice by Matt.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Dave Airlie
8dfea5348c i965: Avoid int64 warnings.
Just add operations to the switch statement here.

v2 (idr): "cut them down later" => Remove ir_unop_b2u64 and
ir_unop_u642b.  Handle these with extra i2u or u2i casts just like
uint(bool) and bool(uint) conversion is done.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Dave Airlie
c101cee2ba i965: Avoid int64 induced warnings
Just add types into unsupported or double equivalent spots.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Dave Airlie
a53f315ad8 mesa/program: Add unused ir operations.
v2 (idr): "cut them down later" => Remove ir_unop_b2u64 and
ir_unop_u642b.  Handle these with extra i2u or u2i casts just like
uint(bool) and bool(uint) conversion is done.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Ian Romanick
f82ced5af3 glsl: Allow GLSL_TYPE_INT64 for ir_unop_abs and ir_unop_sign
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Ian Romanick
8e7e1ae036 glsl: Print GLSL_TYPE_UINT64 and GLSL_TYPE_INT64 values
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Ian Romanick
0d14fec345 glsl: Add interaction between ARB_gpu_shader_int64 and ARB_shader_clock
If ARB_gpu_shader_int64 is supported, ARB_shader_clock also adds
clockARB() that returns a uint64_t.  Rather than add new opcodes and
intrinsics for this, just wrap the existing intrinsic with a
packUint2x32.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Dave Airlie
bfc4080d38 glsl: Add 64-bit integer functions
These are all the allowed 64-bit functions from ARB_gpu_shader_int64
spec.

v2: restrict int64/double functions better.

v3 (idr): Delete spurious blank lines.  Suggested by Matt.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Dave Airlie
050f38ef0b glsl/varying_packing: Add 64-bit integer support
As for the double code, but using the 64-bit integer conversions.

v2 (idr): Remove some spurious u2i() and i2u() operations when packing
and unpacking, respectively, int64_t varyings.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Dave Airlie
923aebdd46 glsl/ast: Add 64-bit integer support in some places.
Just add support in two more places in ast parsing.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Dave Airlie
9ba9a7f854 glsl: Add 64-bit integer support to some operations.
This adds 64-bit integer support to some AST and IR operations where
it is needed.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-01-20 15:41:23 -08:00
Dave Airlie
25c7a61b28 glsl/ir_builder: Add support for some 64-bit bitcasts.
We need builder support to implement some of the builtins.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Dave Airlie
78cc44280e glsl/ast: Add 64-bit integer support to conversion functions
This adds support to call the new operations on conversions.

v2 (idr): Delete an unnecessary break-statement.  Noticed by Matt.  Add
a missing blank line.  Noticed by Ian.

v3 (idr): "cut them down later" => Remove ir_unop_b2u64 and
ir_unop_u642b.  Handle these with extra i2u or u2i casts just like
uint(bool) and bool(uint) conversion is done.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Matt Turner <mattst88@gmail.com> [v2]
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-20 15:41:23 -08:00
Dave Airlie
85faf5082f glsl: Add 64-bit integer support for constant expressions
This just adds the new operations and add 64-bit integer support to all
the existing cases where it is needed.

v2: fix some issues found in testing.
v2.1: add unreachable (Ian), add missing int/uint pack/unpack (Dave).

v3 (idr): Rebase on top of idr's series to generate
ir_expression_operation_constant.h. In addition, this version:

    Adds missing support for ir_unop_bit_not, ir_binop_all_equal,
    ir_binop_any_nequal, ir_binop_vector_extract,
    ir_triop_vector_insert, and ir_quadop_vector.

    Removes support for uint64_t from ir_unop_abs and ir_unop_sign.

v4 (idr): "cut them down later" => Remove ir_unop_b2u64 and
ir_unop_u642b.  Handle these with extra i2u or u2i casts just like
uint(bool) and bool(uint) conversion is done.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v2]
Reviewed-by: Matt Turner <mattst88@gmail.com> [v3]
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-20 15:41:23 -08:00
Dave Airlie
a68b6ee063 glsl/ir: Add support for 64-bit integer conversions.
This adds all the conversions in the world, I'm not 100% sure of all of
these are needed, but add all of them and we can cut them down later.

v2: fix issue with packing output types.

v3 (idr): Rebase on top of idr's series to generate
ir_expression_operation_constant.h.  Fix transposed ir_validate
assertions for ir_unop_u642i64 and ir_unop_i642u64.  Add missing
automatic type setup for ir_unop_u642i64 and ir_unop_i642u64.

v4 (idr): "cut them down later" => Remove ir_unop_b2u64 and
ir_unop_u642b.  Handle these with extra i2u or u2i casts just like
uint(bool) and bool(uint) conversion is done.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v2]
Reviewed-by: Matt Turner <mattst88@gmail.com> [v3]
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-20 15:41:23 -08:00
Dave Airlie
7dd63c10c3 glsl: Add 64-bit integer support to uniform initialiser code
Just add support to the double case, same code should work.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Dave Airlie
8df5287c23 glsl/varyings: Add 64-bit integer support.
This adds 64-bit ints to the link_varyings 64-bit support.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Dave Airlie
bbce1c538d glsl/ast/ir: Add 64-bit integer constant support
This adds support for 64-bit integer constants to the parser,
ast and ir.

v2: fix a few issues found in testing.

v3: Add missing ir_constant copy contructor support.

v4: Use PRIu64 and PRId64 in printfs in glsl_parser_extras.cpp.
Suggested by Nicolai.  Rebase on Marek's linalloc changes.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v2]
Reviewed-by: Matt Turner <mattst88@gmail.com> [v3]
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-20 15:41:23 -08:00
Dave Airlie
249007d13c mesa: Add support for 64-bit integer uniforms
This hooks up the API to the internals for 64-bit integer uniforms.

v2: update to use non-strict aliased alternatives

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Dave Airlie
8ce53d4a2f glsl: Add basic ARB_gpu_shader_int64 types
This adds the builtins and the lexer support.

To avoid too many warnings, it adds basic support to the type in a few
other places in mesa, mostly in the trivial places.

It also adds a query to be used later for if a type is an integer 32 or 64.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Dave Airlie
e90830bb8e glsl: Add ARB_gpu_shader_int64 boilerplate.
This just adds the basic boilerplate support.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Dave Airlie
839ce21143 mesa: Add ARB_gpu_shader_int64 extension bits
This just adds the usual boilerplate in mesa core.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Dave Airlie
150f2fa789 mapi: Add support for ARB_gpu_shader_int64.
Just add the boilerplate xml code.

v2 (idr): Update dispatch_sanity.  Only add extension functions in core
profile.

v3 (idr): Remove comment line from gl_API.xml.  Suggested by Matt.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-20 15:41:23 -08:00
Lionel Landwerlin
74c23bde5b anv: don't require render target isl bit for depth/stencil surfaces
Blorp can deal with depth/stencil surfaces blits/copies without the
render target requirement. Also having both render target and
depth/stencil requirement is incompatible from isl's point of view.

This fixes an image creation issue in the high level quality settings
of the Unity3D player, which requires a depth texture with src/dst
transfer & 4x multisampling.

v2: Simply aspect checking condition (Jason)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
2017-01-20 21:39:51 +00:00
Lionel Landwerlin
8a28e764d0 spirv: don't assert with location decorations on non i/o variables
Some applications might add location decoration to samplers. Rather
than raising an error it seems it would make more sense to just
discard these decorations.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
2017-01-20 21:39:46 +00:00
Matt Turner
f57bdd4849 i965: Validate "Special Cases for Byte Operations"
Do this in general_restrictions_based_on_operand_types() because the two
rules that "Special Cases for Byte Operations" relax are checked there.
2017-01-20 11:40:52 -08:00
Matt Turner
75b7f5a269 i965: Validate "Region Alignment Rules" 2017-01-20 11:40:52 -08:00
Matt Turner
f817d132c1 i965: Validate "General Restrictions Based on Operand Types" 2017-01-20 11:40:52 -08:00
Matt Turner
83696b2234 i965: Validate "General Restrictions on Regioning Parameters" 2017-01-20 11:40:52 -08:00
Matt Turner
df0b7bcdfd i965: Replace reg_type_size[] with a function.
A function is necessary to handle immediate types.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-20 11:40:52 -08:00
Matt Turner
ada891d472 i965: Validate math instruction sources.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-20 11:40:52 -08:00
Matt Turner
fce0612fc2 i965: Claim that SEND/math has two sources.
src1 must be a descriptor (including the information to determine that
the SEND is doing an extended math operation), but src0 can actually be
null since it serves as the source of the implicit GRF -> MRF move.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-20 11:40:52 -08:00
Matt Turner
c9724682b5 i965: Simplify num_sources_from_inst().
desc will always be non-NULL, because brw_validate_instructions() does
not attempt to validate any instructions that fail the
is_unsupported_inst() check.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-20 11:40:52 -08:00
Matt Turner
9fd12666d0 i965: Factor out send_restrictions() function.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-20 11:40:52 -08:00
Matt Turner
7abc65dd7c i965: Factor out sources_not_null() validation function.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-20 11:40:52 -08:00
Matt Turner
a693305b61 i965: Structure code so unsupported inst will not generate more errors.
We want to rely on brw_opcode_desc() always returning non-NULL in other
validation functions. Other validation functions will be in the else
case of the block added in this patch.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-20 11:40:52 -08:00
Matt Turner
f0429359cc i965: Add a test for the EU assembly validator.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-20 11:40:52 -08:00
Matt Turner
ae9c69e1cf i965: Add a CHECK macro to call more complicated validation funcs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-20 11:40:52 -08:00
Matt Turner
25448e4b7e i965: Make ERROR_IF usable from other functions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-20 11:40:52 -08:00
Matt Turner
f9a4fc9b15 i965: Mark error annotation on correct SIMD16 inst.
inst, whose assignment can be seen in the last line of context pointed
to the correct instruction in the SIMD16 program, but src_offset was the
offset from the beginning of the SIMD16 program.

So if an instruction at offset 0x100 in the SIMD16 program was illegal,
we would mark an error on the instruction at offset 0x100 (which is
likely in the SIMD8 program).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-20 11:40:52 -08:00
Matt Turner
59003f3447 i965/vec4: Use UW-typed operands when dest is UW.
Using a UD-typed operand makes the execution size D, and if the size of
the execution type is greater than the size of the destination type, the
destination must be appropriately strided.

We actually just want UW-types all around.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-20 11:40:52 -08:00
Matt Turner
68bcbfa9e4 i965: Use W-typed immediate in brw_F32TO16().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-20 11:40:52 -08:00
Matt Turner
3eada948a0 gtest: Update to 1.8.0.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-20 11:40:52 -08:00
Matt Turner
cbc39e541f i965: Don't change F->VF if dest type is DF.
We change the immediate source type to VF to allow instruction
compaction, but there are no entires in the compaction table for DF, so
there's no point in doing this.

Additionally, I mixing floating-point types is now allowed except for
F and VF.
2017-01-20 11:40:52 -08:00
Lionel Landwerlin
a72dea9483 anv: fix comment typo
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-20 16:46:32 +00:00
Lionel Landwerlin
0c3d058723 spirv: fix warn string typo
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-20 16:46:29 +00:00
Lionel Landwerlin
bac6fe5c77 blorp: remove unnecessary struct declaration
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-20 16:46:21 +00:00
Marek Olšák
74f40d1570 Revert "radeonsi: reject invalid vertex element formats"
This reverts commit 9e4d1d8a7c.

It broke arb_vertex_type_10f_11f_11f_rev-draw-vertices, which has
first_non_void == -1.
2017-01-20 16:02:45 +01:00
Philipp Zabel
a37cf630b4 gallium: add pipe_screen::resource_changed callback wrappers
Add resource_changed to the ddebug, rbug, and trace wrappers. Since it
is optional, there is no need to add it to noop.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Suggested-by: Nicolai Hähnle <nhaehnle@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-01-20 15:30:30 +01:00
Philipp Zabel
97de7e6586 st/mesa: ask pipe driver to recreate derived internal resources when (re-)binding external textures
Use the resource_changed callback to invalidate internal resources
derived from external textures when they are (re-)bound. This is needed
to comply with the requirement from the GL_OES_EGL_image_external
extension that a call to glBindTexture guarantees that all further
sampling will return values that correspond to the values in the
external texture at or after the time that glBindTexture was called.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-01-20 15:30:30 +01:00
Philipp Zabel
9bab714c61 mesa: update external textures when (re-)binding
To comply with the requirement from the GL_OES_EGL_image_external
extension that a call to glBindTexture guarantees that all further
sampling will return values that correspond to the values in the
external texture at or after the time that glBindTexture was called,
do not bail out early from mesa_BindTextures if the target is
external.
This will later allow the state tracker to instruct the pipe driver
to invalidate internal resources derived from the external texture.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-01-20 15:30:30 +01:00
Philipp Zabel
c70ed79e79 etnaviv: implement resource_changed to invalidate internal resources derived from imported buffers
Implement the resource_changed pipe callback to invalidate internal
resources derived from imported buffers. This is needed to update the
texture for re-imported renderables.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-01-20 15:30:30 +01:00
Philipp Zabel
362edc868c etnaviv: initialize seqno of imported resources
Imported resources already have contents that we want to be copied to
texture resources derived from them. Set initial seqno of imported
resources to 1, just as if it had already been rendered to.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-01-20 15:30:29 +01:00
Philipp Zabel
2c95d6dac3 st/dri: ask the driver to update its internal copies on reimport
For imported buffers that can't be used directly as a source to the
texture samplers, the pipe driver might need to create an internal
copy, for example in a different tiling layout. When buffers are
reimported they may contain new image data, so the driver internal
copies need to be recreated.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-01-20 15:30:29 +01:00
Philipp Zabel
30853f55a3 gallium: add pipe_screen::resource_changed
Add a hook to tell drivers that an imported resource may have changed
and they need to update their internal derived resources.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-01-20 15:30:29 +01:00
Emil Velikov
5872850b88 configure.ac: move require_dri_shared_libs_and_glapi() before its users
Otherwise we'll get a lovely message as below:
"require_dri_shared_libs_and_glapi: command not found"

Cc: Steven Newbury <steve@snewbury.org.uk>
Reported-by: Steven Newbury <steve@snewbury.org.uk>
Fixes: da410e6afa "configure: explicitly require shared glapi for
enable-dri"
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Steven Newbury <steve@snewbury.org.uk>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-20 14:27:08 +00:00
Samuel Pitoiset
383fc8e9f3 gallium/hud: add missing break in hud_cpufreq_graph_install()
Fixes: e99b9395be "gallium/hud: Add support for CPU frequency monitoring"
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2017-01-20 10:33:47 +01:00
Tapani Pälli
4148881513 android: correct typo in build
Fixes: 63c58dfc65
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-20 07:49:10 +02:00
Elie TOURNIER
9fdaeb7776 nir: add min/max optimisation
Add the following optimisations:

min(x, -x) = -abs(x)
min(x, -abs(x)) = -abs(x)
min(x, abs(x)) = x
max(x, -abs(x)) = x
max(x, abs(x)) = abs(x)
max(x, -x) = abs(x)

shader-db:

total instructions in shared programs: 13067779 -> 13067775 (-0.00%)
instructions in affected programs: 249 -> 245 (-1.61%)
helped: 4
HURT: 0

total cycles in shared programs: 252054838 -> 252054806 (-0.00%)
cycles in affected programs: 504 -> 472 (-6.35%)
helped: 2
HURT: 0

Signed-off-by: Elie Tournier <tournier.elie@gmail.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-19 21:44:28 -08:00
Jason Ekstrand
f22ee14644 nir/algebraic: Only include nir_search_helpers once
We were including it once per value, so probably around 10k times.
Let's not cause the compiler any more work than we have to.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-01-19 21:40:30 -08:00
Anuj Phogat
6de293284b i965: Remove unnecessary mt->compressed checks
It's harmless to use ALIGN_NPOT() for uncompressed formats
because they have block width/height = 1.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2017-01-19 14:28:18 -08:00
Anuj Phogat
c7e37a0cb8 i965: Fix indentation in brw_miptree_layout_2d()
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2017-01-19 14:28:18 -08:00
Anuj Phogat
47d9b3a9dd i965: Fix comment to include 3d textures
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2017-01-19 14:28:18 -08:00
Chad Versace
de0b0a3a9c i965: Delete pending CCS and HiZ ops in intel_miptree_make_shareable()
Fixes crash in piglit
`egl_khr_gl_renderbuffer_image-clear-shared-image GL_DEPTH_COMPONENT24`
on Skylake.

The crash happened because blorp attempted to execute a pending hiz
clear after the hiz buffer was deleted. Deleting the pending hiz ops
when the hiz buffer gets deleted fixes the crash.

For good measure, this patch also deletes all pending CCS/MCS ops when
the CCS/MCS buffer gets deleted. I'm now aware of any bugs
caused by the dangling ops, but deleting them is clearly the right thing
to do.

Cc: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99265
2017-01-19 13:47:57 -08:00
Andres Rodriguez
e0674e740b vulkan/wsi: clarify the severity of lack of DRI3 v2
The current message sounds like a small warning, clarify that it can
result in lack of presentation support and application crashes.

v2: add "if they do" (Bas)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98263
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Acked-by: Jason ekstrand <jason@jlekstrand.net>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-19 15:41:42 +00:00
Andres Rodriguez
a3ad6a34c6 radv: fix include order for installed headers v2
In situations where libdrm_amdgpu and mesa are installed to the same
location, the mesa installed headers will take precedence over the git
source headers.

This is due to the AMDGPU_CFLAGS containing the install directory.

This situation can cause build errors if the git version of a header is
newer than the currently installed version of a header (e.g. git pull
updates vulkan.h)

Note: using the same install prefix for mesa and libdrm is probably a
common occurrence since it is described in the radeonBuildHowTo wiki:
https://www.x.org/wiki/radeonBuildHowTo/

v2: added sign-off

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-19 15:41:38 +00:00
Emil Velikov
0f8afde7ba docs/releasing: document post branch version bump
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-19 15:38:30 +00:00
Emil Velikov
49e4204b12 mesa: Bump version to 17.1.0-devel
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-19 15:38:30 +00:00
Marek Olšák
9e4d1d8a7c radeonsi: reject invalid vertex element formats
This should fix a coverity defect.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-01-19 16:38:37 +01:00
Marek Olšák
e490b7812c radeonsi: don't forget to add HTILE to the buffer list for texturing
This fixes VM faults. Discovered by Samuel Pitoiset.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98975
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99450

Cc: 17.0 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-01-19 16:38:37 +01:00
Nayan Deshmukh
31908d6a4a st/vdpau: only send buffers with B8G8R8A8 format to X
PresentPixmap only works if the pixmap depth matches with the
window depth, otherwise it returns a BadMatch protocol error.
Even if the depths match, the result won't look correctly
if the VDPAU RGB component order doesn't match the X11 one so
we only allow the X11 format.
For other buffers we copy them to a buffer which is send to X.

v2: only send buffers with format VDP_RGBA_FORMAT_B8G8R8A8
v3: reword commit message
v4: add comment explaining the code

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-01-19 15:34:02 +01:00
Nicolai Hähnle
3cd092c415 radeonsi: fix texture gather on stencil textures
At least on VI, texture gather doesn't work with a 24_8 data format, so
use 8_8_8_8 and a modified swizzle instead.

A bit of background: When creating a GL_STENCIL_INDEX8 texture, we select
the X24S8 pipe format because we don't support stencil-only render targets
properly. With mip-mapping this can lead to a setup where the tiling is
incompatible with stencil texturing, and a flushed stencil texture is
used. For the flushed stencil, a literal X24S8 is used because there were
issues with an 8bpp DB->CB copy.

Longer term, it would be good if we could get away from these workarounds,
i.e. properly support an S8 format for stencil-only rendering and flushed
stencil. Since stencil texturing is somewhat rare, it's not a high
priority.

Fixes GL45-CTS.texture_cube_map_array.sampling.

Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-01-19 15:02:57 +01:00
Alejandro Piñeiro
905961452a mesa/main: Fix FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE for NONE attachment type
When the attachment type is NONE (att->Type),
FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE should be NONE always.

Note that technically, the current behaviour follows the spec. From
OpenGL 4.5 spec, Section 9.2.3 "Framebuffer Object Queries":

   "If the value of FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE is NONE, then
    either no framebuffer is bound to target; or the default
    framebuffer is bound, attachment is DEPTH or STENCIL, and the
    number of depth or stencil bits, respectively, is zero."

Reading literally this paragraph, for the default framebuffer, NONE
should be only returned if attachment is DEPTH and STENCIL without
being allocated.

But it doesn't makes too much sense to return DEFAULT_FRAMEBUFFER if
the attachment type is NONE. For example, this can happens if the
attachment is FRONT_RIGHT run on monoscopic mode, as that attachment
is only available on stereo mode.

With the current behaviour, defensive querying of the object type
would not work properly. So you could query the object type checking
for NONE, get DEFAULT_FRAMEBUFFER, and then get and INVALID_OPERATION
when requesting other pnames (like RED_SIZE), as the real attachment
type is NONE.

This fixes:
GL45-CTS.direct_state_access.framebuffers_get_attachment_parameters

v2: don't change the behaviour for att->Type != GL_NONE, as caused
    some ES CTS regressions
v3: simplify condition (Iago)

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-01-19 11:55:41 -02:00
Zachary Michaels
d7d32b3bfe radeonsi: Always leave poly_offset in a valid state
This commit makes si_update_poly_offset set poly_offset to NULL if
uses_poly_offset is false. This way poly_offset either points into the
currently queued rasterizer, or it is NULL.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99451
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-19 10:50:16 +01:00
Nicolai Hähnle
a7c635ec65 mesa/main: fix meta caller of _mesa_ClampColor
Since _mesa_ClampColor properly checks for support of the API function
now, it's meta callers need to check support as well.

Fixes: 963311b71f ("mesa/main: fix version/extension checks in _mesa_ClampColor")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99401
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
2017-01-19 09:13:25 +01:00
Timothy Arceri
4d65f68a9b mesa/glsl: move TransformFeedbackBufferStride to gl_shader
Here we remove the single use of this field in gl_linked_shader
which allows us to move the field out of gl_shader_info

While we are at it we rewrite link_xfb_stride_layout_qualifiers()
to be more clear.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-19 17:05:26 +11:00
Timothy Arceri
e603cf1841 glsl: exit loop early if we find xfb layout qualifers
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-19 17:05:26 +11:00
Timothy Arceri
7983ed5f65 glsl: set InnerCoverage directly in gl_program
Also move out of the shared gl_shader_info.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-19 17:05:26 +11:00
Timothy Arceri
1f141eaef6 glsl: tidy up PostDepthCoverage shader field
There is no reason for this to be in the shared gl_shader_info or
to copy it to gl_program at the end of linking (its already there).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-19 17:05:26 +11:00
Timothy Arceri
3d41f4b990 mesa/glsl: move pixel_center_integer to gl_shader
This is only used by gl_linked_shader as a temp during linking
so use a temp there instead.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-19 17:05:26 +11:00
Timothy Arceri
0a9d102ddc mesa/glsl: move origin_upper_left to gl_shader
This is only used by gl_linked_shader as a temp during linking
so use a temp there instead.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-19 17:05:26 +11:00
Timothy Arceri
ceeedb9bb0 mesa/glsl: move uses_gl_fragcoord to gl_shader
This is only used by gl_linked_shader as a temp during linking
so use a temp there instead.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-19 17:05:26 +11:00
Timothy Arceri
66a6050ad8 mesa/glsl: move redeclares_gl_fragcoord to gl_shader
This is never used in gl_linked_shader other than as a temp
during linking so just use a temp instead.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-19 17:05:26 +11:00
Timothy Arceri
cc7ecce253 mesa/glsl: move ARB_fragment_coord_conventions_enable field
This is only used by gl_shader not gl_linked_shader so move it
there.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-19 17:05:26 +11:00
Timothy Arceri
ae28c5a60c st/mesa/glsl: set early_fragment_tests directly in shader_info
We also move EarlyFragmentTests out of the gl_shader_info struct
as it is now only used by gl_shader.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-19 17:05:26 +11:00
Timothy Arceri
5c93d27423 mesa/glsl/i965: set and use tcs vertices_out directly
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-19 17:05:26 +11:00
Timothy Arceri
4cd709e2bc i965: get outputs_written from gl_program
There is no need to go via the pointer in nir_shader. This change
is required for the shader cache as we don't create a nir_shader.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-19 17:05:26 +11:00
Dave Airlie
ef71b867ee gallivm: use #ifdef not #if for PIPE_ARCH_BIG_ENDIAN
This fixes the build on ppc/s390.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-19 16:00:53 +10:00
Timothy Arceri
3fe8d04a6d mesa: don't always set _NEW_PROGRAM when linking
We only need to set it when linking was successful and the program
being linked is currently active.

The programs_in_use mask is just used as a flag for now but in
a future change we will use it to update the CurrentProgram array.

V2: make sure to flush vertices before linking (suggested by Marek)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-19 15:55:02 +11:00
Timothy Arceri
aad93402c0 mesa: change init subroutine defaults helper to work per gl_program
A later patch will result in SSO programs calling this helper
per gl_program rather than per gl_shader_program.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-19 15:55:02 +11:00
Timothy Arceri
90d950038f mesa/glsl: move ProgramResourceList to gl_shader_program_data
We also move NumProgramResourceList at the same time.

GLES does interface validation on SSO at runtime so we need to move
this to be able to switch to storing gl_program pointers in
CurrentProgram.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-19 15:55:02 +11:00
Timothy Arceri
62f718bfcb glsl: store number of explicit uniform loactions in gl_shader_program
This allows us to cleanup the functions that pass this count around,
but more importantly we will be able to call the uniform linking
functions from that backends linker without having to pass this
information to the backend directly via Driver.LinkShader().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-19 15:55:02 +11:00
Timothy Arceri
c054bbf0d4 glsl: create a new link_and_validate_uniforms() helper
Currently this just breaks up the linking code a bit but in the
future i965 will call this from the backend via Driver.LinkShader()
so that we can do NIR optimisations before assigning uniform
locations.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-19 15:55:02 +11:00
Timothy Arceri
ce4fb3c8a1 glsl: make a bunch of varying linking functions static
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-19 15:55:02 +11:00
Timothy Arceri
90fffd1770 glsl: move more varying linking code to link_varyings.cpp
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-19 15:55:02 +11:00
Topi Pohjolainen
180653c357 i965/blorp: Make post draw flush more explicit
Blits do not need any special treatment as the target buffer
object is added to render cache just as one does for normal draw.
Color clears and resolves in turn require explicit "end of pipe
synchronization". It is not clear what this means exactly but the
assumption is that render cache flush with command stream stall
should be sufficient.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-18 22:42:47 +02:00
Topi Pohjolainen
46b346899d i965/gen6: Issue direct depth stall and flush after depth clear
instead of calling unconditionally brw_emit_mi_flush() which
does:

   brw_emit_pipe_control_flush(brw,
                                PIPE_CONTROL_DEPTH_CACHE_FLUSH |
                                PIPE_CONTROL_RENDER_TARGET_FLUSH |
                                PIPE_CONTROL_CS_STALL);

   brw_emit_pipe_control_flush(brw,
                                PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
                                PIPE_CONTROL_CONST_CACHE_INVALIDATE);

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-18 22:42:47 +02:00
Topi Pohjolainen
e6da6943fe i965: Make depth clear flushing more explicit
Current blorp logic issues unconditional "flush everything"
(see brw_emit_mi_flush()) after each render. For example, all
blits issue this unconditionally which shouldn't be needed if
they set render cache properly so that subsequent renders do
necessary flushing before drawing.

In case of piglit:

ext_framebuffer_multisample-accuracy all_samples depth_draw small

intel_hiz_exec() is always preceded by blorb blit and the
unconditional flush looks to hide the lack of stall and flushes
in depth clears. By removing the brw_emit_mi_flush() I get gpu
hangs.

This patch adds the stalls and flushes mandated by the spec
and gets rid of those hangs.

v2 (Jason, Ken): Document the rational for separating
                 depth cache flush and stall on Gen7.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-18 22:42:47 +02:00
Topi Pohjolainen
4840a53e90 i965/blorp: Use the render cache mechanism instead of explicit flushing
by replacing brw_emit_mi_flush() with brw_render_cache_set_check_flush().
The latter splits the flush in two:

   brw_emit_pipe_control_flush(brw,
                               PIPE_CONTROL_DEPTH_CACHE_FLUSH |
                               PIPE_CONTROL_RENDER_TARGET_FLUSH |
                               PIPE_CONTROL_CS_STALL);

   brw_emit_pipe_control_flush(brw,
                               PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
                               PIPE_CONTROL_CONST_CACHE_INVALIDATE);

instead of

   int flags = PIPE_CONTROL_NO_WRITE | PIPE_CONTROL_RENDER_TARGET_FLUSH;
   if (brw->gen >= 6) {
      flags |= PIPE_CONTROL_INSTRUCTION_INVALIDATE |
               PIPE_CONTROL_CONST_CACHE_INVALIDATE |
               PIPE_CONTROL_DEPTH_CACHE_FLUSH |
               PIPE_CONTROL_VF_CACHE_INVALIDATE |
               PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
               PIPE_CONTROL_CS_STALL;
   }
   brw_emit_pipe_control_flush(brw, flags);

v2 (Jason): Check that destination exists before trying to add to
            render cache. Depth clears and resolves don't have it.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-18 22:42:47 +02:00
Emil Velikov
ea8b2624c8 utils: really remove the __END_DECLS macro
Fixes: d1efa09d34 "util: import sha1 implementation from OpenBSD"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-18 20:09:57 +00:00
Emil Velikov
9f8dc3bf03 utils: build sha1/disk cache only with Android/Autoconf
Earlier commit imported a SHA1 implementation and relaxed the SHA1 and
disk cache handling, broking the Windows builds.

Restrict things for now until we get to a proper fix.

Fixes: d1efa09d34 "util: import sha1 implementation from OpenBSD"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-18 20:09:01 +00:00
Emil Velikov
d1efa09d34 util: import sha1 implementation from OpenBSD
At the moment we support 5+ different implementations each with varying
amount of bugs - from thread safely problems [1], to outright broken
implementation(s) [2]

In order to accommodate these we have 150+ lines of configure script and
extra two configure toggles. Whist an actual implementation being
~200loc and our current compat wrapping ~250.

Let's not forget that different people use different code paths, thus
effectively makes it harder to test and debug since the default
implementation is automatically detected.

To minimise all these lovely experiences, import the "100% Public
Domain" OpenBSD sha1 implementation. Clearly document any changes needed
to get building correctly, since many/most of those can be upstreamed
making future syncs easier.

As an added bonus this will avoid all the 'fun' experiences trying to
integrate it with the Android and SCons builds.

v2: Manually expand __BEGIN_DECLS/__END_DECLS and document (Tapani).

Furthermore it seems that some games (or surrounding runtime) static
link against OpenSSL resulting in conflicts. For more information see
the discussion thread [3]

Bugzilla [1]: https://bugs.freedesktop.org/show_bug.cgi?id=94904
Bugzilla [2]: https://bugs.freedesktop.org/show_bug.cgi?id=97967
[3] https://lists.freedesktop.org/archives/mesa-dev/2017-January/140748.html
Cc: Mark Janes <mark.a.janes@intel.com>
Cc: Vinson Lee <vlee@freedesktop.org>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Jonathan Gray <jsg@jsg.id.au>
Tested-by: Jonathan Gray <jsg@jsg.id.au>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Tapani Pälli <tapani.palli@intel.com> (v1)
Acked-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
2017-01-18 19:07:23 +00:00
Kenneth Graunke
5b4a531207 i965: Make brw_cache_item structure private to brw_program_cache.c.
struct brw_cache_item is an implementation detail of the program cache.
We don't need to make those internals available to the entire driver.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2017-01-18 10:53:14 -08:00
Marek Olšák
c67a2793b3 radeonsi: determine in advance which VBOs should be added to the buffer list
v2: now it should be correct

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-18 19:51:31 +01:00
Marek Olšák
1db2bf8d2b radeonsi: use fewer pointer dereferences in upload_vertex_buffer_descriptors
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-18 19:51:31 +01:00
Marek Olšák
b9b9540a60 radeonsi: reject invalid vertex buffer indices at state creation
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-18 19:51:31 +01:00
Marek Olšák
cf248929bf radeonsi: use a global dirty mask for shader pointers
Only vertex buffers use a separate bool flag.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-18 19:51:31 +01:00
Marek Olšák
861d7af1cb radeonsi: use a bitmask-based loop in si_decompress_textures
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-18 19:51:31 +01:00
Marek Olšák
4bde7d3d3c radeonsi: skip an unnecessary mutex lock for L2 prefetches
the mutex lock is inside util_range_add.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-18 19:51:31 +01:00
Marek Olšák
d93b0eacb7 radeonsi: si_cp_dma_prepare is a no-op for L2 prefetches
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-18 19:51:31 +01:00
Marek Olšák
395c49849d radeonsi: add SI_CPDMA_SKIP_BO_LIST_UPDATE
the next commit will use it in a clever way, because the CP DMA prefetch
doesn't need this.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-18 19:51:31 +01:00
Marek Olšák
35cd7551a4 radeonsi: use the correct target machine when building shader variants
If the shader selector is created with a different context than
the shader variant, we should use the calling context's target machine
for the shader variant.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99419

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-18 19:51:31 +01:00
Marek Olšák
3ae3be6dd4 radeonsi: move shader pipe context state into a separate structure
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-18 19:51:31 +01:00
Ben Widawsky
b0cc55f298 i965: Fix SURFACE_STATE to handle non-zero aux offsets
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Stone <daniels@collabora.com>
2017-01-18 09:38:18 -08:00
Christian Gmeiner
65a44a76fc Revert "etnaviv: Fake occlusion query capability"
This reverts commit b7ac0f5671.

This is a half baked solution needs some rework to fixes issues
with reported counter bits (GL_QUERY_COUNTER_BITS_ARB).
Also it enables PIPE_CAP_QUERY_TIME_ELAPSED accidently.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-01-18 17:39:28 +01:00
Mauro Rossi
730574c58e android: ac/debug: move sid_tables.h generation and IB decode to amd/common
This patch is the porting to android of the following commits:

b838f64 "ac/debug: Move sid_tables.h generation to common code."
0ef1b4d "ac/debug: Move IB decode to common code."

Fixes android building errors due to sid_tables.h
and ac_debug.c, ac_debug.h moved to amd/common

Tested by building nougat-x86

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-18 16:28:59 +00:00
Mauro Rossi
02185a1c9b android: gallium/auxiliary: fix building error in Android 7.0
Conditional libLLVMCore static library dependency is added,
for the case when MESA_ENABLE_LLVM is true

Fixes the following building error with Android 7.0:

In file included from
external/mesa/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp:62:
...
external/llvm/include/llvm/IR/Attributes.h:68:14: fatal error:
'llvm/IR/Attributes.inc' file not found
    #include "llvm/IR/Attributes.inc"
             ^
1 error generated.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-18 16:24:19 +00:00
Mauro Rossi
f93f7cae14 android: amd/common: fix LLVMInitializeAMDGPU* functions declaration
LLVMInitializeAMDGPU* functions need to be explicitly declared
and mesa expects them via <llvm-c/Target.h> header,
but LLVM needs to be instructed to invoke its own LLVM_TARGET(AMDGPU) macro,
or the functions will not be available.

A new llvm cflag (-DFORCE_BUILD_AMDGPU) serves this purpose,
the same mechanism is used also by other llvm targets e.g. FORCE_BUILD_ARM

A necessary prerequisite is to have AMDGPU target handled accordingly
in llvm config files i.e. {Target,AsmParser,AsmPrinter}.def
for llvm device build includes.

This avoids the following building errors:

external/mesa/src/amd/common/ac_llvm_util.c:43:2: error: implicit declaration of function 'LLVMInitializeAMDGPUTargetInfo' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
        LLVMInitializeAMDGPUTargetInfo();
        ^
external/mesa/src/amd/common/ac_llvm_util.c:44:2: error: implicit declaration of function 'LLVMInitializeAMDGPUTarget' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
        LLVMInitializeAMDGPUTarget();
        ^
external/mesa/src/amd/common/ac_llvm_util.c:45:2: error: implicit declaration of function 'LLVMInitializeAMDGPUTargetMC' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
        LLVMInitializeAMDGPUTargetMC();
        ^
external/mesa/src/amd/common/ac_llvm_util.c:46:2: error: implicit declaration of function 'LLVMInitializeAMDGPUAsmPrinter' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
        LLVMInitializeAMDGPUAsmPrinter();
        ^
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-18 16:21:40 +00:00
Mauro Rossi
db3aaa3137 android: radeonsi: fix LLVMInitializeAMDGPU* functions declaration
LLVMInitializeAMDGPU* functions need to be explicitly declared
and mesa expects them via <llvm-c/Target.h> header,
but LLVM needs to be instructed to invoke its own LLVM_TARGET(AMDGPU) macro,
or the functions will not be available.

A new llvm cflag (-DFORCE_BUILD_AMDGPU) serves this purpose,
the same mechanism is used also by other llvm targets e.g. FORCE_BUILD_ARM

A necessary prerequisite is to have AMDGPU target handled accordingly
in llvm config files i.e. {Target,AsmParser,AsmPrinter}.def
for llvm device build includes.

This avoids the following building errors:

external/mesa/src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c:129:2: error: implicit declaration of function 'LLVMInitializeAMDGPUTargetInfo' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
        LLVMInitializeAMDGPUTargetInfo();
        ^
external/mesa/src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c:130:2: error: implicit declaration of function 'LLVMInitializeAMDGPUTarget' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
        LLVMInitializeAMDGPUTarget();
        ^
external/mesa/src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c:131:2: error: implicit declaration of function 'LLVMInitializeAMDGPUTargetMC' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
        LLVMInitializeAMDGPUTargetMC();
        ^
external/mesa/src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c:132:2: error: implicit declaration of function 'LLVMInitializeAMDGPUAsmPrinter' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
        LLVMInitializeAMDGPUAsmPrinter();
        ^

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-18 16:21:35 +00:00
Mauro Rossi
a2a63ad262 android: radeon: fix LLVMInitializeAMDGPU* functions declaration
LLVMInitializeAMDGPU* functions need to be explicitly declared
and mesa expects them via <llvm-c/Target.h> header,
but LLVM needs to be instructed to invoke its own LLVM_TARGET(AMDGPU) macro,
or the functions will not be available.

A new llvm cflag (-DFORCE_BUILD_AMDGPU) serves this purpose,
the same mechanism is used also by other llvm targets e.g. FORCE_BUILD_ARM

A necessary prerequisite is to have AMDGPU target handled accordingly
in llvm config files i.e. {Target,AsmParser,AsmPrinter}.def
for llvm device build includes.

This avoids the following building errors:

external/mesa/src/gallium/drivers/radeon/radeon_llvm_emit.c:121:2: error: implicit declaration of function 'LLVMInitializeAMDGPUTargetInfo' [-Werror=implicit-function-declaration]
  LLVMInitializeAMDGPUTargetInfo();
  ^
external/mesa/src/gallium/drivers/radeon/radeon_llvm_emit.c:122:2: error: implicit declaration of function 'LLVMInitializeAMDGPUTarget' [-Werror=implicit-function-declaration]
  LLVMInitializeAMDGPUTarget();
  ^
external/mesa/src/gallium/drivers/radeon/radeon_llvm_emit.c:123:2: error: implicit declaration of function 'LLVMInitializeAMDGPUTargetMC' [-Werror=implicit-function-declaration]
  LLVMInitializeAMDGPUTargetMC();
  ^
external/mesa/src/gallium/drivers/radeon/radeon_llvm_emit.c:124:2: error: implicit declaration of function 'LLVMInitializeAMDGPUAsmPrinter' [-Werror=implicit-function-declaration]
  LLVMInitializeAMDGPUAsmPrinter();
  ^

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-18 16:21:28 +00:00
Emil Velikov
9c5003996c nouveau: remove always false argument in nouveau_fence_new()
No point in having the extra argument considering that it's effectively
unused since the function was introduced.

Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-01-18 16:01:15 +00:00
Emil Velikov
af4a298719 egl/wayland: resolve quirky try_damage_buffer() implementation
The implementation was added with commit d085a5dff5 and effectively
provided a hidden dependency.

Namely: the codepath used was determined solely during build time. Thus
if we built again new wayland and then run against older (yet still
within the requirements, as per the configure) one will get undefined
symbols.

As of earlier commit 36b9976e1f "egl/wayland: Avoid race conditions
when on non-main thread" the required version was bumped to one which
provides the API, thus we can drop the quirky solution.

Cc: Derek Foreman <derekf@osg.samsung.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Derek Foreman <derekf@osg.samsung.com>
2017-01-18 16:01:15 +00:00
Emil Velikov
687cf37bbe configure: error out when building static XOR shared
Current code warns out in such cases and falls-back to either static or
shared. That can be easily missed amongst the volume produced by our
configure script.

Replace the warning with an error such that one gets direct feedback
when they're doing something wrong.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-18 16:01:15 +00:00
Emil Velikov
da410e6afa configure: explicitly require shared glapi for enable-dri
We've been using and depending on it for at least a couple of years.
Make it obvious and error out, should one opt for it.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-18 16:01:15 +00:00
Emil Velikov
b628fdd6e7 configure: factor out commom egl/gbm checks
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-18 16:01:15 +00:00
Emil Velikov
e8044dd434 configure: remove HAVE_EGL_DRIVER_DRI[23]
We have them for local purposes in configure, where we can use their
direct dependency.

With the only remaining instance in the makefile(s) being always true,
as it can be seen in the configure snippet.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-18 16:01:15 +00:00
Emil Velikov
3b887f122f configure: forbid static EGL/GBM
Both libraries implicitly require shared GLAPI which in itself mandates
shared libraries.

Stop pretending that one can use it and error out at configure stage.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-18 16:01:15 +00:00
Emil Velikov
d4066216c6 configure: remove unused AC_SUBST variables
v2: Rebase.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
2017-01-18 16:01:15 +00:00
Emil Velikov
4380a2098b gallium: correctly manage libsensors link flags
We should be using LIBS rather than the LDFLAGS variable. Furthermore
try to keep the linking to the final stage, rather than intermetent
static library.

Cc: Steven Toth <stoth@kernellabs.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-18 16:01:14 +00:00
Emil Velikov
cb5e799448 egl/wayland: unify dri2_wl_create_surface implementations
Rather than having two almost identical codepaths (one for HW/wl_drm and
another for SW/wl_shm), just factorise and reuse in both places.

v2: Rebase
v3: Rebase

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com> (v2)
2017-01-18 16:01:14 +00:00
Emil Velikov
bfd6314350 egl/wayland: use the destroy_window_callback for swrast
As described in commit 690ead4a13 ("egl/wayland-egl: Fix for segfault
in dri2_wl_destroy_surface.") if we attempt to destroy a EGL surface
attached to already destroyed Wayland window we'll get a segfault.

v2: set the correct callback alongside the window->private. (Dan)

Cc: Daniel Stone <daniels@collabora.com>
Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-01-18 16:01:14 +00:00
Emil Velikov
3ecd6c6abd glx: unify GLX_SGIX_pbuffer aliased declarations
No point in having an identical code in two places.

Not to mention that the Apple one incorrectly uses GLXDrawable as pbuf
type. This change is both API and ABI safe since the header uses the
correct GLXPbufferSGIX and both types are a typedef of the same
primitive XID.

Cc: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jeremy Sequoia <jeremyhu@apple.com>
2017-01-18 16:01:14 +00:00
Emil Velikov
9898bcf3f4 glx: use GLX_ALIAS for glXGetProcAddress
Use the macro, rather than open-coding it.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-18 16:01:14 +00:00
Emil Velikov
dfc84c2296 mesa: make use of HAVE_FUNC_ATTRIBUTE_ALIAS macro
We must make sure that xserver has an equivalent one-line
change to its configure.ac as the glx/glapi headers get copied over.

Then again, xserver does _not_ seem to set HAVE_ALIAS to begin with so
one might want to look into that first.

Cc: Adam Jackson <ajax@redhat.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-18 16:01:14 +00:00
Emil Velikov
63c58dfc65 android: set HAVE_FUNC_ATTRIBUTE_ALIAS
Analogous to previous two commits.

Strictly speaking it's not be applicable for Android since we don't
build GLX and related code.

Regardless keep things consistent with the other build systems.

Cc: Rob Herring <robh@kernel.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-18 16:01:14 +00:00
Emil Velikov
52bf10cc4f scons: set HAVE_FUNC_ATTRIBUTE_ALIAS
Analogoust to the previous commit were we did so for autotools

Cc: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-18 16:01:14 +00:00
Emil Velikov
95d9eae427 configure: use standard check for attribure alias
Currently we have two macros - HAVE_ALIAS and GLX_ALIAS_UNSUPPORTED.
To make it even better former of which is explicitly cleared in some
cases while not in others.

Clear all that up by using a single macro properly set during configure.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-18 16:01:14 +00:00
Emil Velikov
f121ac68b0 glx: remove always false ifdef GLX_NO_STATIC_EXTENSION_FUNCTIONS
Quick search through git history (of both mesa and xserver) hows no
instances where this was ever set.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-18 16:01:14 +00:00
Wladimir J. van der Laan
b7ac0f5671 etnaviv: Fake occlusion query capability
This enables the PIPE_CAP_OCCLUSION_QUERY capability without adding an
occlusion query type.

This is necessary to get Mesa to report desktop GL 2.0 support (to run
exciting things such as ioq3's OpenGL 2 renderer), and should be valid
because exposing the capability does not guarantee that any
counters are actually implemented.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-01-18 16:58:18 +01:00
Christian Gmeiner
103c363e0a etnaviv: add flags parameter to texture barrier
Fixes compile warning introduced by commit a1c848.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-01-18 16:58:11 +01:00
Christian Gmeiner
3ef916c128 etnaviv: handle PIPE_CAP_TGSI_FS_FBFETCH
Fixes compile warning introduced by commit ee3ebe.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-01-18 16:58:05 +01:00
Roland Scheidegger
56441708cf gallivm: (trivial) fix copy/paste bug with big endian code
8bd67a35c5 introduced using undefined variable
on big endian archs due to copy/paste bug.
(compile hack tested only)
2017-01-18 16:30:50 +01:00
Jose Fonseca
34041968f8 configure.ac: Revert recent HAVE_LLVM changes.
This reverts changes 903eb09b5fb78d47d0f8a4bdf826a113ca2aff40..1a0aa468f354f0ee94dd383cd40ae915584624aa:

  Tobias Droste (5):
    configure.ac: Rename MESA_LLVM to FOUND_LLVM
    configure.ac: Only set LLVM_LIBS if LLVM is used
    configure.ac: Only define HAVE_LLVM if LLVM is used
    configure.ac: Set and use HAVE_GALLIUM_LLVM define
    configure.ac: Don't check LLVM version in gallium_require_llvm

They break scons build, and I'm not convinced this is the right fix.  In
particular changing HAVE_LLVM in the C code is something I'd rather
avoid no matter what.  So it's better to discuss without the pressure of
broken builds.
2017-01-18 14:46:54 +00:00
Elie TOURNIER
5034cf4e35 docs: Fix GLSL compiler link
The doc wasn't update since we moved the glsl compiler to src/compiler/glsl.
I also updated the description of the standalone compiler.

v2:
 - Mention that just-log argument removes headers/separators.
 - Mention that version argument is mandatory.
Since version argument is mandatory, add --version to the command line
example.

Signed-off-by: Elie Tournier <tournier.elie@gmail.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2017-01-18 14:15:31 +00:00
Emil Velikov
8d1712a065 vulkan: automake: do not use EXTRA_DIST in a conditional
Otherwise the file might not end up in the tarball.

Fixes: dbd677efb4 "vulkan: add API registry"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-18 13:41:32 +00:00
Tomasz Figa
2d14ae6bea configure.ac: Respect LLVM_CFLAGS in LLVM version detection
When compiling LLVM headers, including llvm-config.h, we need to respect
LLVM_CFLAGS. This is especially crucial if LLVM is located in a
non-standard location and it happens that llvm-config.h includes another
header. In such case the detection would fail due to missing header,
because the path is provided in LLVM_CFLAGS.

Let's add LLVM_CFLAGS to global CFLAGS for the time of detection and then
restore the original flags, as done in other places of the script.

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-18 13:25:17 +00:00
Tobias Droste
1a0aa468f3 configure.ac: Don't check LLVM version in gallium_require_llvm
This is actually not needed because the version is checked later.

Line 2609:
if test "x$enable_gallium_llvm" == "xyes"; then
    llvm_require_version $LLVM_REQUIRED_GALLIUM "gallium"
    llvm_add_default_components "gallium"

    HAVE_GALLIUM_LLVM=xyes
    DEFINES="${DEFINES} -DHAVE_GALLIUM_LLVM"
fi

Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-18 13:23:01 +00:00
Tobias Droste
4d0efb9683 configure.ac: Set and use HAVE_GALLIUM_LLVM define
Gallium code used HAVE_LLVM to check if it needs to compile code for
LLVM in header and source files.

With the new logic HAVE_LLVM is always set. Use extra define to figure
out if LLVM is used.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99010

Signed-off-by: Tobias Droste <tdroste@gmx.de>
2017-01-18 13:23:01 +00:00
Tobias Droste
b045d23c0b configure.ac: Only define HAVE_LLVM if LLVM is used
Make sure that HAVE_LLVM compiler define is only set if LLVM is
actually used.

Signed-off-by: Tobias Droste <tdroste@gmx.de>
v2 [Emil] fold within the existing conditional
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-18 13:23:01 +00:00
Tobias Droste
38e81293b0 configure.ac: Only set LLVM_LIBS if LLVM is used
This renames llvm_check_version_for to llvm_require_version and let it
set a variable to mark that LLVM will be used.

Use this to make a usefull configure output and to only check if the
libs are found in LLVM if it is actually used.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99010

Signed-off-by: Tobias Droste <tdroste@gmx.de>
2017-01-18 13:23:01 +00:00
Tobias Droste
add9066eb0 configure.ac: Rename MESA_LLVM to FOUND_LLVM
This renames MESA_LLVM to FOUND_LLVM and updates the config.log report
to say if LLVM is found or not, to make clear that this does not mean
that it is used.

There are no MESA_LLVM users so drop the AC_SUBST.

v2 [Emil]
 - Polish test: -a over && test, = over ==, unquiote xyes
 - other ?

Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-18 13:23:00 +00:00
Jose Fonseca
903eb09b5f gallivm: Cleanup USE_MCJIT.
Split USE_MCJIT macro dual nature into a separate constant time define
and a run-time variable.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-18 12:35:01 +00:00
Kenneth Graunke
aa291c3ba9 i965: Don't map/unmap in brw_print_program_cache on LLC platforms.
We have a persistent mapping.  Don't map it a second time or try to
unmap it.  Just use the pointer.

This most likely would wreak havoc except that this code is unused
(it's only called from an if (0) debug block).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2017-01-17 21:47:38 -08:00
Kenneth Graunke
ce89239294 i965: Move program cache printing to brw_program_cache.c.
It makes sense to put a function which prints out the entire contents
of the program cache in the file that implements the program cache.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2017-01-17 21:47:36 -08:00
Kenneth Graunke
f9edc550b2 i965: Make a helper for finding an existing shader variant.
We had five copies of the same "walk the cache and look for an
existing shader variant for this program" code.  Now we have one
helper function that returns the key.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2017-01-17 21:47:10 -08:00
Kenneth Graunke
e7d4008ebf glsl: Make copy propagation not panic when it sees an intrinsic.
A number of games have large arrays of constants, which we promote to
uniforms.  This introduces copies from the uniform array to the original
temporary array.  Normally, copy propagation eliminates those copies,
making everything refer to the uniform array directly.

A number of shaders in "Deus Ex: Mankind Divided" recently exposed a
limitation of copy propagation - if we had any intrinsics (i.e. image
access in a compute shader), we weren't able to get rid of these copies.

That meant that any variable indexing remained on the temporary array
rather being moved to the uniform array.  i965's scalar backend
currently doesn't support indirect addressing of temporary arrays,
which meant lowering it to if-ladders.  This was horrible.

According to Marek, on radeonsi/GCN, "F1 2015" uses 64% less
spilled-temp-array memory.

On i965/Skylake:

total instructions in shared programs: 13362954 -> 13329878 (-0.25%)
instructions in affected programs: 43745 -> 10669 (-75.61%)
helped: 12
HURT: 0

total cycles in shared programs: 248081010 -> 245949178 (-0.86%)
cycles in affected programs: 4597930 -> 2466098 (-46.37%)
helped: 12
HURT: 0

total spills in shared programs: 9493 -> 9507 (0.15%)
spills in affected programs: 25 -> 39 (56.00%)
helped: 0
HURT: 1

total fills in shared programs: 12127 -> 12197 (0.58%)
fills in affected programs: 110 -> 180 (63.64%)
helped: 0
HURT: 1

Helps Deus Ex: Mankind Divided.   The one shader with hurt spills/fills
is from Tomb Raider at Ultra settings, but that same shader has a
-39.55% reduction in instructions and -14.09% reduction in cycle counts,
so it seems like a win there as well.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-17 21:45:22 -08:00
Kenneth Graunke
9919542f1c i965: Make DCE set null destinations on messages with side effects.
(Co-authored by Matt Turner.)

Image atomics, for example, return a value - but the shader may not
want to use it.  We assigned a useless VGRF destination.  This seemed
harmless, but it can actually be quite harmful.  The register allocator
has to assign that VGRF to a real register.  It may assign the same
actual GRF to the destination of an instruction that follows soon after.

This results in a write-after-write (WAW) dependency, and stall.

A number of "Deus Ex: Mankind Divided" shaders use image atomics, but
don't use the return value.  Several of these were hitting WAW stalls
for nearly 14,000 (poorly estimated) cycles a pop.  Making dead code
elimination null out the destination avoids this issue.

This patch cuts one shader's estimated cycles by -98.39%!  Removing the
message response should also help with data cluster bandwidth.

On Skylake:

(instruction counts remain identical)

total cycles in shared programs: 255413890 -> 248081010 (-2.87%)
cycles in affected programs: 12019948 -> 4687068 (-61.01%)
helped: 24
HURT: 10

v2: Make can_omit_write independent of can_eliminate (Curro).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-17 21:45:04 -08:00
Kenneth Graunke
90bf39cd2b i965: Combine some dead code elimination NOP'ing code.
In theory we might have incorrectly NOP'd instructions that write the
flag, but where that flag value isn't used, and yet the instruction
either writes the accumulator or has side effects.

I don't believe any such instructions exist, so this is mostly a
code cleanup.

Curro pointed out that FS_OPCODE_FB_WRITE has a null destination and
actually writes the flag on Gen4-5 to dynamically decide whether to
write some payload data.  The hunk removed in this patch might have
NOP'd it, except that we don't actually mark flags_written() in the
IR, so it doesn't think the flag is touched at all.  That's sketchy,
but it means it wouldn't hit this today (though there are likely other
problems!).

v2: Properly replace the inst->regs_written() check in the second
    hunk with the flag being live (mistake caught by Curro).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-17 21:45:00 -08:00
Kenneth Graunke
be5f53e769 i965: Make DCE explicitly not eliminate any control flow instructions.
According to Matt, the dead code pass explicitly avoided IF and WHILE
because on Sandybridge, these could have conditional modifiers and
null destination registers.  Normally, those instructions use BAD_FILE
for the destination register.  Nowadays, we don't do that anymore, so
we could technically drop these checks.

However, it's clearer to explicitly leave control flow instructions
alone, so change it to the more generic !inst->is_control_flow().

This should have no actual change.

[This patch implements review feedback from Curro and Matt.]

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-17 21:44:29 -08:00
Dave Airlie
aac562f112 radv: disable vertex reuse when writing viewport index
This fixes some issues we'd hit later if using viewport
indexes.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-18 08:04:11 +10:00
Dave Airlie
7e0382fb35 radv: add support for layered clears (v2)
Just always use the layer clear pipelines,
the overhead of emitting the layer shouldn't be
too large.

v2: Bas suggested we always use it.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-18 06:21:22 +10:00
Dave Airlie
7886100811 radv/ac: split part of llvm compile into a separate function
This is needed to have common code for gs copy shader emission.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-18 06:21:05 +10:00
Dave Airlie
5dadd7ca27 radv/ac: switch an if to switch
makes it easier to add other shader stages.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-18 06:20:48 +10:00
Dave Airlie
6b635bbe16 radv: add support for writing layer/viewport index (v2)
This just adds the infrastructure to allow writing layer
and viewport index. It's just a first patch out of the geom
shader tree, and doesn't do much on its own.

v2: add missing if statement change (Bas)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-18 06:20:44 +10:00
Bas Nieuwenhuizen
3b4bf8aa63 ac/debug: Decrease num_dw for type 2 NOP's.
Otherwise we read past the end of the buffer.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-17 20:54:57 +01:00
Marek Olšák
57f18623fb radeonsi: for the tess barrier, only use emit_waitcnt on SI and LLVM 3.9+
Cc: 17.0 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-17 16:13:25 +01:00
Nayan Deshmukh
3a8f316e7b st/vdpau: remove the delayed rendering hack(v1.1)
the hack was introduced to avoid an extra copying
but now with dri3 we don't need it anymore

v1.1: rebasing

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Acked-by: Christian König <christian.koenig@amd.com>
2017-01-17 11:52:03 +01:00
Nayan Deshmukh
15bfdea99c st/vdpau: use dri3 to directly send the buffer to X(v2)
this avoids an extra copy which occurs in case of dri2

v1.1: fallback to dri2 if dri3 fails to initialize
v2: add PIPE_BIND_SCANOUT to output buffers as they will
    be send to X server directly (Michel)

Suggested-by: Christian König <christian.koenig@amd.com>
Tested-by: Andy Furniss <adf.lists@gmail.com>
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
2017-01-17 11:51:56 +01:00
Nayan Deshmukh
0ef17d76bb vl/dri3: use external texture as back buffers(v4)
dri3 allows us to send handle of a texture directly to X
so this patch allows a state tracker to directly send its
texture to X to be used as back buffer and avoids extra
copying

v2: use clip width/height to display a portion of the surface
v3: remove redundant variables, fix wrapping, rename variables
    handle vaapi path
v3.1: we need clip_width/height for every frame so we don't need
      to maintain it for each buffer instead use a global variable
v4: In case of single gpu we can cache the buffers as applications
    use constant number of buffer and we can avoid calls to present
    extension for every frame

Reviewed and Suggested-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Tested-by: Andy Furniss <adf.lists@gmail.com>
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
2017-01-17 11:51:50 +01:00
Iago Toral Quiroga
9fe9db8031 anv: set UAV coherence required bit when needed
The same we do in the OpenGL driver (comment copied from there).

This is required to ensure that we execute the fragment shader stage when
side-effects (such as image or ssbo stores) are present but there are no
color writes.

I found this while writing a test to check rendering to a framebuffer
without attachments where the fragment shader does not produce any
color outputs but writes to an image via imageStore(). Without this patch
the fragment shader does not execute and the image is not written,
which is not correct.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-17 07:57:04 +01:00
Samuel Iglesias Gonsálvez
ff0dd67d2f anv: increase ANV_MAX_STATE_SIZE_LOG2 limit to 1 MB
Fixes crash in dEQP-VK.ubo.random.all_shared_buffer.48 due to a
fragment shader code bigger than 128 kB.

This patch increases the allocation size limit to 1 MB.

v2:
- Increase it to 1 MB (Jason)
- Increase device->instruction_block_pool allocation size in
  anv_device.c (Jason)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-17 06:42:42 +01:00
Ilia Mirkin
19963231a3 nv50/ir: optimize shl + and
Address loading can often end up as shl + shr + shl combinations. The
latter two are equal shifts, which get converted into an and mask.
However if the previous shl is more than the mask is trying to remove
(in terms of low bits), we can just remove the and entirely. This
reduces some large shaders by as many as 3% of instructions (out of 2K).

total instructions in shared programs : 6495509 -> 6491076 (-0.07%)
total gprs used in shared programs    : 954621 -> 954623 (0.00%)

                local        gpr       inst      bytes
    helped           0           0        1014        1014
      hurt           0           2           0           0

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-01-16 21:13:09 -05:00
Ilia Mirkin
5ba380c226 nvc0: enable FBFETCH with a special slot for color buffer 0
We don't need to support all the color buffers for advanced blend, just
cb0. For Fermi, we use the special binding slots so that we don't
overlap with user textures, while Kepler+ gets a dedicated position for
the fb handle in the driver constbuf.

This logic is only triggered when a FBFETCH is actually present so it
should be a no-op most of the time.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-01-16 21:13:09 -05:00
Ilia Mirkin
6b7511c2f1 st/mesa: add support for advanced blend when fb can be fetched from
This implements support for emitting FBFETCH ops, using the existing
lowering pass for advanced blend logic, and disabling hw blend when
advanced blending is enabled.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-16 21:13:09 -05:00
Ilia Mirkin
a1c8484271 gallium: add flags parameter to texture barrier
This is so that we can differentiate between flushing any framebuffer
reading caches from regular sampler caches.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-16 21:13:09 -05:00
Ilia Mirkin
ee3ebe68f9 gallium: add PIPE_CAP_TGSI_FS_FBFETCH
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-16 21:13:09 -05:00
Ilia Mirkin
1393999541 gallium: add FBFETCH opcode to retrieve the current sample value
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-16 21:13:08 -05:00
Ilia Mirkin
376316e963 mesa: allow BlendBarrier to be used without support for full fb fetch
The extension spec is not currently published, so it's a bit premature
to require it for BlendBarrier usage.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-16 21:13:08 -05:00
Ilia Mirkin
2dd4cdeb4e glsl: avoid treating fb fetches as output reads to be lowered
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-16 21:13:08 -05:00
Dave Airlie
75f858cc33 radv/meta: split color renderpass creation out.
This is just prep work for layered clears, it doesn't change
anything.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-17 08:22:48 +10:00
Bas Nieuwenhuizen
5ae4de18d9 radv: Support multiple devices.
Pretty straightforward. Also deleted the big comment block as it
is a pretty standard pattern for filling in arrays.

Also removed the error message on non-existent devices, as getting
7 errors printed to the console each time you enumerate the
devices is pretty confusing.

v2: Add constant for number of DRM devices.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-01-16 22:15:22 +01:00
Bas Nieuwenhuizen
8406f79d6a radv: Get physical device from radv_device instead of the instance.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-01-16 22:15:22 +01:00
Ilia Mirkin
0baa639f76 nvc0: true up exposing of the HW_METRIC_QUERY_GROUP for maxwell
This had been updated in one place but not the other.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-01-16 16:04:55 -05:00
Dave Airlie
d4392a877c radv/ac: use ctx->voidt in more places. (v2)
Just noticed this while in the area.

v2: one replacement was incorrect.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-17 06:55:51 +10:00
Dave Airlie
3634dfd9e7 radv/meta: consolidate the depth stencil clear renderpasses
We only need one per samples (maybe not even that), reduce
all the unneeded ones.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-17 06:51:25 +10:00
Ilia Mirkin
5eeebca12f nv50/ir: handle new DDIV op which will be used for double divisions
The existing lowering is in place to lower that to RCP + MUL, or fancier
things down the line if necessary.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-01-16 14:45:46 -05:00
Nicolai Hähnle
6be4a40430 tgsi: add DDIV instruction
Double-precision division, to allow more precision than a DRCP + DMUL
sequence.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-16 20:17:22 +01:00
Nicolai Hähnle
5e94e5bb9b radeonsi: fix R600_DEBUG=nooptvariant
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
2017-01-16 20:16:18 +01:00
Kenneth Graunke
7a2b65a1d7 i965: Make BLORP disable the NP Z PMA stall fix.
This may fix GPU hangs on Gen8.  I don't know if it does though.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-16 10:15:06 -08:00
Kenneth Graunke
d2590eb65f i965: Enable OpenGL 4.5 on Haswell.
Everything is in place and the test results look solid.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-01-16 10:13:23 -08:00
Marek Olšák
d523415609 radeonsi: implement GL_FIXED vertex format
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-16 18:07:08 +01:00
Marek Olšák
018fb2ecb3 radeonsi: implement 32-bit SNORM/UNORM/SSCALED/USCALED vertex formats
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-16 18:07:08 +01:00
Marek Olšák
44e9b67229 radeonsi: make fix_fetch 64-bit
v2: add u_bit_consecutive64

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-16 18:07:08 +01:00
Thomas Hindoe Paaboel Andersen
8daf6de3de gallium/hud: avoid buffer overrun
Renaming data sources was added in
e8bb97ce30
It was possible to use a new name longer than
the name array in hud_graph of 128. This
patch truncates the name to fit the array.

CC: Marek Olšák <marek.olsak@amd.com>

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-01-16 18:07:08 +01:00
Marek Olšák
0d9a4efce9 gallium/radeon: add GPU-shaders-busy HUD query
It should be close to the GPU load, but it can be much lower if something
is stalling shader execution (e.g. CP DMA).

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-16 15:35:30 +01:00
Marek Olšák
aa0de724c7 gallium/radeon: make the GPU load / GRBM_STATUS monitoring extensible
The next patch will add SPI_BUSY monitoring.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-16 15:35:30 +01:00
Marek Olšák
935d58ac73 radeonsi: show average results per frame for perf counters in HUD
so that the graphs are independent from FPS.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-16 15:35:30 +01:00
Marek Olšák
1fe7c8d3c9 gallium/hud: disable queries during HUD draw calls
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-16 15:35:30 +01:00
Marek Olšák
5b2eddc40f gallium/hud: increase the vertex buffer size for background quads
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-16 15:35:30 +01:00
Nayan Deshmukh
4b0e9babc6 st/va: delay calling begin_frame until we have all parameters
If begin_frame is called before setting intra_matrix and
non_intra_matrix it leads to segmentation faults when
vl_mpeg12_decoder.c is used.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92634
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-01-16 15:09:01 +01:00
Kenneth Graunke
5597b2b243 i965: Use align1 mode for barrier messages.
In commit 7428e6f86a we switched the barrier SEND message's
destination type to UW to avoid problems in SIMD16 compute shaders.

Tessellation control shaders also use barriers, and in vec4 mode, we
were emitting them in align16 mode.  The simulator warns that only UD,
D, F, and DF are valid destination types - UW is technically illegal.

So, switch to align1 mode.  Either mode should work fine.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-01-15 16:49:58 -08:00
Ilia Mirkin
dd39e48726 nvc0/ir: emit FMZ flag when requested on FFMA
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-01-15 13:13:58 -05:00
Connor Abbott
c9b74f3f03 nir/gcm: fix a bug with metadata handling
We were using impl->num_blocks, but that isn't guaranteed to be
up-to-date until after the block_index metadata is required. If we were
unlucky, this could lead to overwriting memory.

Noticed by inspection.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-14 18:18:17 -05:00
Lionel Landwerlin
bf8e1f9e7b radv: generate entrypoints from vk.xml
v2: rework entry point iteration (Jason)
    cleanup unused imports

v3: don't drop header installation (Emil)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-14 19:29:44 +00:00
Lionel Landwerlin
c7fc310cd1 anv: generate entry points from vk.xml
v2: rework entry point iteration (Jason)
    cleanup unused imports

v3: don't drop header installation (Emil)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-14 19:29:44 +00:00
Lionel Landwerlin
dbd677efb4 vulkan: add API registry
Signed-off: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-14 19:29:44 +00:00
Lionel Landwerlin
60bc90cea8 include: update Vulkan headers
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-14 19:29:44 +00:00
Andres Rodriguez
1e1bddf15a radv: make device extension setup dynamic
Each physical device may have different extensions than one another.
Furthermore, depending on the software stack, some extensions may not be
accessible.

If an extension is conditional, it can be registered only when
necessary.

v2: removed unused function and fixed indentation

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-01-14 14:20:17 +01:00
Andres Rodriguez
5323efb685 radv: rename global extension properties structs
All extension arrays are global, but only one of them refers to instance
extensions.

The device extension array refers to extensions that are common across
all physical devices. This disctinction will be more imporant once we
have dynamic extension support for devices.

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-01-14 14:20:12 +01:00
Andres Rodriguez
0eb8b6a3e1 radv: use a winsys context per-queue, instead of per device v2
Queues are independent execution streams. The vulkan spec provides no
ordering guarantees for different queues.

By using a single context for all queues, we are forcing all commands
into an unecessary FIFO ordering.

This change is a preparation step to allow our-of-ordering scheduling of
certain work tasks.

v2: Fix a rebase error with radv_QueueSubmit() and trace_bo
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-01-14 14:19:41 +01:00
Timothy Arceri
772cd31048 nir: optimise min/max fadd combos
shader-db results BDW:

total instructions in shared programs: 13060410 -> 13060313 (-0.00%)
instructions in affected programs: 24533 -> 24436 (-0.40%)
helped: 88
HURT: 0

total cycles in shared programs: 256585692 -> 256586698 (0.00%)
cycles in affected programs: 647290 -> 648296 (0.16%)
helped: 35
HURT: 30

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-14 23:26:22 +11:00
Kenneth Graunke
0d5071db5e i965: Move Gen4-5 interpolation stuff to brw_wm_prog_data.
This fixes glxgears rendering, which had surprisingly been broken since
late October!  Specifically, commit 91d61fbf7c.

glxgears uses glShadeModel(GL_FLAT) when drawing the main portion of the
gears, then uses glShadeModel(GL_SMOOTH) for drawing the Gouraud-shaded
inner portion of the gears.  This results in the same fragment program
having two different state-dependent interpolation maps: one where
gl_Color is flat, and another where it's smooth.

The problem is that there's only one gen4_fragment_program, so it can't
store both.  Each FS compile would trash the last one.  But, the FS
compiles are cached, so the first one would store FLAT, and the second
would see a matching program in the cache and never bother to compile
one with SMOOTH.  (Clearing the program cache on every draw made it
render correctly.)

Instead, move it to brw_wm_prog_data, where we can keep a copy for
every specialization of the program.  The only downside is bloating
the structure a bit, but we can tighten that up a bit if we need to.
This also lets us kill gen4_fragment_program entirely!

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-13 17:25:48 -08:00
Grazvydas Ignotas
40a8f9e6f2 anv: remove some unused macros and functions
VK_ICD_WSI_PLATFORM_MAX is used, but a duplicate from wsi_common.h .

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-13 16:52:27 -08:00
Jason Ekstrand
3b80481965 anv: Default PointSize to 1.0 if not written by the shader
The Vulkan rules for point size are a bit whacky.  If you only have a
vertex shader and you use points, then you must write PointSize in your
vertex shader.  If you have a geometry or tessellation shader, then it's
dependent on the shaderTessellationAndGeometryPointSize device feature.
From the Vulkan 1.0.38 specification:

   "shaderTessellationAndGeometryPointSize indicates whether the
   PointSize built-in decoration is available in the tessellation
   control, tessellation evaluation, and geometry shader stages. If this
   feature is not enabled, members decorated with the PointSize built-in
   decoration must not be read from or written to and all points written
   from a tessellation or geometry shader will have a size of 1.0. This
   also indicates whether shader modules can declare the
   TessellationPointSize capability for tessellation control and
   evaluation shaders, or if the shader modules can declare the
   GeometryPointSize capability for geometry shaders. An implementation
   supporting this feature must also support one or both of the
   tessellationShader or geometryShader features."

In other words, if the feature is disbled (the client can disable
features!) then they don't write PointSize and we provide a 1.0 default
but if the feature is enabled, they do write PointSize and we use the
one they wrote in the shader.  There are at least two valid ways we can
implement this:

 1) Track whether or not shaderTessellationAndGeometryPointSize is
    enabled and set the 3DSTATE_SF bits based on that and what stages
    are enabled, ignoring the shader source.

 2) Just look at the last geometry stage VUE map and see if they wrote
    PointSize and set the 3DSTATE_SF accordingly.

The second solution is the easiest and the most robust against invalid
usage of the Vulkan API, so we choose to go with that one.

This fixes all of the dEQP-VK.tessellation.primitive_discard.*point_mode
tests.  The tests are also broken because they unconditionally enable
shaderTessellationAndGeometryPointSize if it's supported by the
implementation and then don't write PointSize in the evaluation shader.
However, since this is the "robust against invalid API usage" solution,
the tests happily pass. :-)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-13 16:31:17 -08:00
Jason Ekstrand
99d497c5b6 anv/pipeline: Replace get_fs_input_map with get_last_vue_prog_data
This lets us delete a helper from genX_pipeline.c

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-13 16:31:17 -08:00
Juan A. Suarez Romero
56ee2df4bf i965/vec4: Fix mapping attributes
This patch reverts 57bab6708f, which was
causing issues with ILK and earlier VS programs.

1. brw_nir.c: Revert "i965/vec4/nir: vec4 also needs to remap vs attributes"

   Do not perform a remap in vec4 backend. Rather, do it later when
   setup attributes

2. brw_vec4.cpp: This fixes mapping ATTRx to proper GRFn.

Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99391
[jordan.l.justen@intel.com: merge Juan's two patches from bugzilla]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-13 16:23:32 -08:00
Kenneth Graunke
fed4afc5bb anv: Move nir_lower_wpos_center after dead variable elimination.
When multiple shader stages exist in the same SPIR-V module, we compile
all entry points and their inputs/outputs, then dead code eliminate the
ones not related to the specific entry point later.

nir_lower_wpos_center was being run prior to eliminating those random
other variables, which made it trip up, thinking it found gl_FragCoord
when it actually found something else like gl_PerVertex[3].

Fixes dEQP-VK.spirv_assembly.instruction.graphics.module.same_module.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-13 15:00:38 -08:00
Kenneth Graunke
99c019e1d4 i965: Fix textureGather with RG32I/UI on Gen7.
According to the "Gather4 R32G32_FLOAT Bug" internal documentation
page, the R32G32_UINT and R32G32_SINT formats are affected by the
same bug as R32G32_FLOAT.  Applying the same workarounds should be
viable - apparently the R32G32_FLOAT_LD format shouldn't corrupt
integer data which is NaN or other sketchy floating point values.

One irritating caveat is that, because it's a FLOAT format, the
alpha channel or any set to SCS_ONE return 0x3f8 (1.0) rather than
integer 1.  So we need shader code to whack those channels to 1.

Fixes GL45-CTS.texture_gather.plain-gather-int-cube-rg on Haswell.

v2: Fix swizzle component zeroing (caught by Jordan Justen).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-01-13 11:57:06 -08:00
Bas Nieuwenhuizen
6d2fb04f09 radv: Support loader interface version 3.
Port of 1e41d7f7b0:
"anv: Support loader interface version 3 (patch v2)"

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-13 19:11:17 +01:00
Boyan Ding
ddd27ef462 mesa/get: Remove unused extra_ARB_viewport_array
Unused since 0a7691ee (mesa: Enable enums for OES_viewport_array).
Silence a warning of unused variable.

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-13 17:09:22 +00:00
Boyan Ding
dc18ec8b24 xlib: Unify the style of function pointer calls in structs
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
[Emil Velikov: handle the final case in glXCreateContextAttribsARB]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-13 16:24:40 +00:00
Boyan Ding
2d05425d3e radeon: Unify the style of function pointer calls in structs
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
[Emil Velikov: handle the all cases]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-13 16:24:32 +00:00
Boyan Ding
056cfa558c nouveau: Unify the style of function pointer calls in structs
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
2017-01-13 16:24:32 +00:00
Boyan Ding
0ee4c4a732 glX_proto_send.py: Unify the style of function pointer calls in structs
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
2017-01-13 16:24:32 +00:00
Boyan Ding
1411fbd50d loader/dri3: Unify the style of function pointer calls in structs
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
2017-01-13 16:24:32 +00:00
Boyan Ding
868ae3e31b egl/dri2: Unify the style of function pointer calls in structs
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
[Emil Velikov: address platform_surfaceless]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-13 16:24:22 +00:00
Derek Foreman
3698d71124 i915: Add XRGB8888 format to intel_screen_make_configs
This is a copy of commit 536003c11e
except for i915.

Original log for the i965 commit follows:

 Some application, such as drm backend of weston, uses XRGB8888 config as
 default. i965 doesn't provide this format, but before commit 65c8965d,
 the drm platform of EGL takes ARGB8888 as XRGB8888. Now that commit
 65c8965d makes EGL recognize format correctly so weston won't start
 because it can't find XRGB8888. Add XRGB8888 format to i965 just as
 other drivers do.

Signed-off-by: Derek Foreman <derekf@osg.samsung.com>
Acked-by: Boyan Ding <boyan.j.ding@gmail.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2017-01-13 15:53:42 +00:00
Derek Foreman
4f1d27a406 gbm/drm: Pick the oldest available buffer in get_back_bo
Applications may query the back buffer age to efficiently perform
partial updates. Generally the application will keep a fixed length
damage history, and use this to calculate what needs to be redrawn
based on the age of the back buffer it's about to render to.

If presented with a buffer that has an age greater than the
length of the damage history, the application will likely have
to completely repaint the buffer.

Our current buffer selection strategy is to pick the first available
buffer without considering its age.  If an application frequently
manages to fit within two buffers but occasionally requires a third,
this extra buffer will almost always be old enough to fall outside
of a reasonably long damage history, and require a full repaint.

This patch changes the buffer selection behaviour to prefer the oldest
available buffer.

By selecting the oldest available buffer, the application will likely
always be able to use its damage history, at a cost of having to
perform slightly more work every frame.  This is an improvement if
the cost of a full repaint is heavy, and the surface damage between
frames is relatively small.

It should be noted that since we don't currently trim our queue in
any way, an application that briefly needs a large number of buffers
will continue to receive older buffers than it would if it only ever
needed two buffers.

Reviewed-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Derek Foreman <derekf@osg.samsung.com>
Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
2017-01-13 15:52:11 +00:00
Jonas Ådahl
36b9976e1f egl/wayland: Avoid race conditions when on non-main thread
When EGL is used on some other thread than the thread that drives the
main wl_display queue, the Wayland EGL dri2 implementation is
vulnerable to a race condition related to display round trips and global
object advertisements.

The race that may happen is that after after a proxy is created, but
before the queue is set, events meant to be emitted via the yet to be
set queue may already have been queued on the wrong queue.

In order to make it possible to avoid this race, wayland 1.11
introduced new API that allows creating a proxy wrapper that may be used
as the factory proxy when creating new proxies via Wayland requests. The
queue of a proxy wrapper can be changed without effecting what queue
events emitted by the actual proxy will be queued on, while still
effecting what default queue proxies created from it will have.

By introducing a wl_display proxy wrapper and using this when performing
round trips (via wl_display_sync()) and retrieving the global objects (via
wl_display_get_registry()), the mentioned race condition is avoided.

Signed-off-by: Jonas Ådahl <jadahl@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-01-13 15:50:37 +00:00
Jonas Ådahl
361796651c egl/wayland: Cleanup private display connection when init fails
When failing to initializing the Wayland EGL driver, don't leak the
display server connection if it was us who created it.

Signed-off-by: Jonas Ådahl <jadahl@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-01-13 15:50:04 +00:00
Rhys Kidd
cba8086951 travis: Add the new drivers etnaviv and imx
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-13 15:46:37 +00:00
sguttula
9b14a828db st/va: flush pipeline after post processing
This will flush the pipeline,which will allow to share dma-buf based
buffers.

Signed-off-by: Suresh Guttula <Suresh.Guttula@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-01-13 14:21:29 +01:00
Alejandro Piñeiro
84e3e12b25 main/fbobject: throw invalid operation when get_attachment fails if needed
In most cases, if a call to get_attachment fails is because attachment
is a INVALID_ENUM. But for some specific cases, if COLOR_ATTACHMENTm
(where m >= MAX_COLOR_ATTACHMENTS) is used, it should raise an
INVALID_OPERATION exception instead.

Fixes:
GL45-CTS.direct_state_access.framebuffers_get_attachment_parameter_errors
GL45-CTS.direct_state_access.framebuffers_renderbuffer_attachment_errors

v2: extra new line before quote block. Include "color attachment" on both
    new message errors (Nicolai).

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-13 08:52:14 -02:00
Alejandro Piñeiro
c6eb3aeba5 main/fboject: return if it is color_attachment on get_attachment
Some callers would need that info to know if they should raise
INVALID_ENUM or INVALID_OPERATION. An alternative would be the caller
to check if the attachment is a GL_COLOR_ATTACHMENTm, but that seems
redundant as get_attachment is already doing that.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-13 08:52:07 -02:00
Nicolai Hähnle
963311b71f mesa/main: fix version/extension checks in _mesa_ClampColor
Add a proper check for feature support, and raise an invalid enum for
GL_CLAMP_VERTEX/FRAGMENT_COLOR unconditionally in core profiles, since
those enums were explicitly removed after the extension was promoted
to core functionality (not in the profile sense) with OpenGL 3.0.

This matches the behavior of the AMD closed source driver and fixes
GL45-CTS.gtf30.GL3Tests.half_float.half_float_textures.

Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-01-13 11:03:11 +01:00
Samuel Pitoiset
e1ea70d9f3 radeonsi: replace si_shader_context::soa by bld_base
We no longer need to use lp_build_tgsi_soa_context.

No regressions founds with full piglit run.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-13 10:41:08 +01:00
Samuel Pitoiset
ecf04b84e5 radeonsi: replace ctx->soa.outputs by ctx->outputs
The plan is to replace si_shader_context::soa with its parent
structure (ie. bld_base).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-13 10:41:06 +01:00
Samuel Pitoiset
f04088a7ba radeonsi: move si_shader_context::soa::addr to si_shader_context
The plan is to replace si_shader_context::soa with its parent
structure (ie. bld_base).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-13 10:41:02 +01:00
Samuel Pitoiset
6f0d955b6d radeonsi: allocate the array of immediates dynamically
Currently, we can store up to 256 immediates in a static array,
but this is not always enough. Instead, allocate a dynamic array
like what we currently do for temps.

This fixes a segfault with
dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23

No regressions found with full piglit run.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-13 10:40:57 +01:00
Grazvydas Ignotas
cb89d19dbb radv: remove some unused macros and functions
These seem unlikely to be used.
Also remove irrelevant comment about SKL.

v2: forgot to rebase on master

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
2017-01-13 08:42:33 +01:00
Nanley Chery
64272d4f1b anv: Avoid some resolves for samplable HiZ buffers
v2: Simplify nested ifs (Jason Ekstrand)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 20:52:21 -08:00
Nanley Chery
71334f494a anv: Enable sampling from HiZ
v2: Restrict ISL_AUX_USAGE_HIZ to depth aspects

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 20:52:21 -08:00
Nanley Chery
5e0902cd2a anv/blorp: Don't fast depth clear samplable HiZ buffers on BDW
Avoid the resolves that would be required if fast depth clears were
allowed for such buffers.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 20:52:21 -08:00
Nanley Chery
3ac01ad2ac anv: Add a helper to determine sampling with HiZ
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 20:52:20 -08:00
Nanley Chery
bcf880a9c8 isl/surface_state: Handle ISL_AUX_USAGE_HIZ
v2: Remove redundant x/y offset asserts (Jason Ekstrand)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 20:52:20 -08:00
Nanley Chery
58af615636 anv: Perform HiZ resolves only on layout transitions
This is a better mapping to the Vulkan API and improves performance in
all tested workloads.

v2: Remove unnecessary image view aspect checks (Jason Ekstrand)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 20:52:20 -08:00
Nanley Chery
2852efcda4 anv: Disable HiZ for input attachments
v2 (Jason Ekstrand):
- Add spec citation
- Drop conditional

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 20:52:20 -08:00
Nanley Chery
b62d8ad2ae anv: Avoid resolves incurred by fast depth clears
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 20:52:20 -08:00
Nanley Chery
968ffd6c86 anv: Prepare for transitioning to the requested final layout
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 20:52:20 -08:00
Nanley Chery
104ce1dbab anv: Store depth stencil layouts
Store the current and requested depth stencil layouts so that we can
perform the appropriate HiZ resolves for a given transition while
recording a render pass.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 20:52:20 -08:00
Nanley Chery
2e2cf78a51 anv: Add helpers to handle depth buffer layout transitions
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 20:52:20 -08:00
Nanley Chery
0ce8b37a8e anv: Delete anv's HiZ op emit function
This is no longer used.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 20:52:20 -08:00
Nanley Chery
462a4c9648 anv: Use the gen8 BLORP HiZ resolving function
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 20:52:20 -08:00
Nanley Chery
d16871d958 anv/blorp: Add a gen8 HiZ op resolve function
Add an entry point for resolving using BLORP's gen8 HiZ op function.

v2: Manually add the aux info

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 20:52:20 -08:00
Nanley Chery
3b7106c181 anv: Use gen8 BLORP HiZ clearing functions
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 20:52:20 -08:00
Nanley Chery
f357af0c90 intel/blorp_clear: Add gen8 HiZ clearing functions
Add an entry point for the optimized gen8 BLORP HiZ sequence. commit
c9eaf12de2 fixed a bug that was
unknowingly worked around by forcing additional clear rectangle
alignment restrictions not specified in the PRMs. Now that the bug is no
longer present, omit the additional alignment restrictions.

v2: Adjust code comment about padding

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 20:52:19 -08:00
Nanley Chery
64fb5b0d51 anv: Enable HiZ support for multiple subpasses
We'll be using layout transitions later on in the series which can occur
within and between subpasses. Turn this on now to simplify the change
later.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 20:52:19 -08:00
Nanley Chery
168985fca1 anv: Use ::anv_attachment_state for toggling HiZ per subpass
We're about to enable HiZ support for multiple subpasses. Use this field
to keep track of whether or not subpass operations should treat the
depth buffer as having an auxiliary HiZ buffer.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 20:52:19 -08:00
Nanley Chery
055ff2ec52 anv: Replace anv_image_has_hiz() with ISL_AUX_USAGE_HIZ
The helper doesn't provide additional functionality over the current
infrastructure.

v2: Add comment to anv_image::aux_usage (Jason Ekstrand)
v3: Clarify comment for aux_usage (Jason Ekstrand)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 20:52:19 -08:00
Nanley Chery
160a54810e anv/blorp: Handle ISL_AUX_USAGE_HIZ
Prevent assert failures that would occur in the next patch.

v2: Don't remove asserts from blorp/blit (Jason Ekstrand)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 20:52:19 -08:00
Nanley Chery
09948151ab intel/blorp: Add the BDW+ optimized HZ_OP sequence to BLORP
We'll be switching to layout-transition based resolves which can occur
outside of a render pass. Add this sequence to BLORP, as using BLORP
will enable emitting depth stencil state outside of a render pass (among
other benefits). The depth buffer extent is ignored to enable eventual
usage in VkCmdClearAttachments().

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 20:52:19 -08:00
Emil Velikov
f0bdd13fdb get-typod-pick-list.sh: add new script
Typos do happen as people nominate patches for stable. This script aims
to catch most of those.

Due to the subtle nature of things, one has to pay special attention to
the output, similar to get-extra-pick-list.sh.

At the moment only the following is handled:
 grep -i "CC:.*mesa-dev"

Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-13 03:07:48 +00:00
Emil Velikov
5abd0a7583 ac: automake: ensure that ./common is generated
Depending on the autoconf (or friends) version one may or may not have
the ./common folder created. Thus in the latter case we'll fail to
generate the file.

Reviewed-by: Thierry Reding <treding@nvidia.com>
Tested-by: Darren Salt <devspam@moreofthesa.me.uk>
Reported-by: Darren Salt <devspam@moreofthesa.me.uk>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-13 03:07:48 +00:00
Ilia Mirkin
f897036978 nvc0/ir: only try to check for zero LOD if we aren't already forcing it
There's a levelZero flag which forces texturing to pick level zero (and
not consume an explicit LOD argument). This is set for MS targets, but
could also be set for any other incoming instruction. As that is what
determines whether a LOD argument is present, check that rather than the
more indirect isMS logic.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-01-12 21:08:42 -05:00
Ilia Mirkin
eb60a89bc3 nouveau: take extra push space into account for pushbuf_space calls
Ever since a long time ago when I messed around with fences, I ensure
that after a PUSH_SPACE call there is enough space to write a fence out
into the pushbuf.

However the PUSH_SPACE macro is not all-knowing, and so sometimes we
have to invoke nouveau_pushbuf_space manually with the relocs/pushes
args set. If we don't take the extra allocation from PUSH_SPACE into
account, then we will end up accidentally flushing when the code was not
expecting a flush. This can lead to various runtime and rendering
failures.

The amount of extra allocation isn't that important - it has to be at
least 8 based on the current nouveau_winsys.h setting, but even more
won't hurt. I just rounded up to powers of 2.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99354
Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Ben Skeggs <bskeggs@redhat.com>
2017-01-12 20:39:19 -05:00
Grazvydas Ignotas
8945836658 mapi: update the asm code to support x32
Fixes crashes when both glx-tls and asm are enabled on x32.

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94512
Bugzilla: https://bugs.gentoo.org/show_bug.cgi?id=575458
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-01-13 00:59:32 +01:00
Nicolai Hähnle
1007047ca1 ac/nir: use ac_emit_fdiv throughout
... and eliminate emit_fdiv and nir_to_llvm_context::fpmath_md_*, which
are now unused.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-13 00:39:22 +01:00
Nicolai Hähnle
38c67f77ed ac/nir: use ac_build_gather_values[_extended] throughout
... and eliminate the non-ac copies. Mostly straight-forward
search & replace.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-13 00:39:20 +01:00
Nicolai Hähnle
2c9d26a356 ac/nir: use ac_emit_llvm_intrinsic throughout
... by straight-forward search & replace, and eliminate
emit_llvm_intrinsic.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-13 00:39:17 +01:00
Nicolai Hähnle
fccf29373d radeonsi: remove unused si_prepare_cube_coords
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-13 00:39:13 +01:00
Nicolai Hähnle
a0ce09b4b2 amd/common: unify cube map coordinate handling between radeonsi and radv
Code is taken from a combination of radv (for the more basic functions,
to avoid gallivm dependencies) and radeonsi (for the new and improved
derivative calculations).

v2: add 0.5 offset to tex coords only after derivative calculation

v3:
- really only touch the first three coordinates
- rebase on the removal of the 1.5 --> 0.5 offset change

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v2)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-13 00:39:10 +01:00
Nicolai Hähnle
0ee1ee5fbb radeonsi: only touch first three coordinates in si_prepare_cube_coords
Sourcing coords_arg[4] is actually never correct, since bias is handled
differently in tex_fetch_args anyway.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-13 00:39:07 +01:00
Nicolai Hähnle
9f590ee9d9 radeonsi: remove unused si_llvm_cube_to_2d_coords
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-13 00:39:03 +01:00
Nicolai Hähnle
205ad5234a radeonsi: restrict cube map derivative computations to the correct plane
As remarked by the comment in the original code, the old algorithm fails when
(tc + deriv) points at a different cube face. Instead, simply project the
derivative directly to the plane of the selected cube face.

The new code is based on exactly differentiating (using the chain rule)
the projection onto a plane corresponding to a fixed cube map face (which
is still selected in the usual way based on the texture coordinate itself).
The computations end up fairly involved, but we do save two reciprocal
computations.

Fixes GL45-CTS.texture_cube_map_array.sampling.

v2: add 0.5 offset to tex coords only after derivative calculation
v3: go back to 1.5 offset

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v2)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-13 00:38:59 +01:00
Nicolai Hähnle
e01deee42f radeonsi: communicate cube map coordinates more explicitly
v2: fix compile error that snuck in during rebase

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-13 00:38:34 +01:00
Grazvydas Ignotas
c728051131 ac/debug: move .gitignore for sid_tables.h too
b838f642 "ac/debug: Move sid_tables.h generation to common code." moved
sid_tables.h but forgot the corresponding .gitignore.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-13 00:37:52 +01:00
Jason Ekstrand
08eced3cfd nir/gcm: Fix a typo in a comment
Reported-by: Matt Turner <mattst88@gmail.com>
2017-01-12 14:56:55 -08:00
Jason Ekstrand
087e172179 nir/gcm: Rework the schedule late loop
This fixes a bug in code motion that occurred when the best block is the
same as the schedule early block.  In this case, because we're checking
(lca != def->parent_instr->block) at the top of the loop, we never get to
the check for loop depth so we wouldn't move it out of the loop.  This
commit reworks the loop to be a simple for loop up the dominator chain and
we place the (lca != def->parent_instr->block) check at the end of the
loop.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-12 14:56:55 -08:00
Chuck Atkins
e9a4ec4bd8 glx: Add missing glproto dependency for gallium-xlib glx
Cc: mesa-stable@lists.freedesktop.org
Cc: Bruce Cherniak <bruce.cherniak@intel.com>
Signed-of-by: Chuck Atkins <chuck.atkins@kitware.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-12 22:01:55 +00:00
Emil Velikov
c90f921273 ac, radeonsi: automake: add missing builddir include
The generated file is correctly stored in the builddir as of earlier
commit. Yet the commit forgot to add the respective include flag thus
the compiler would error out failing to find sid_tables.h

Bugzila: https://bugs.freedesktop.org/show_bug.cgi?id=99389
Fixes: d1dc22eb46 "ac: automake: rework sid_tables.h generation"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-12 22:01:55 +00:00
Bas Nieuwenhuizen
8aaca3820c radv: Call NIR passes using NIR_PASS_V.
Port of faa1edeeb7
"anv/pipeline: Call NIR passes using NIR_PASS_V"

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-12 21:39:52 +01:00
Bas Nieuwenhuizen
65cbb993d3 radv: Call nir_lower_constant_initializers.
Port of c5d664f9dc
"anv/pipeline: Call nir_lower_constant_initializers"

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-12 21:39:46 +01:00
Bas Nieuwenhuizen
18e70edd8c radv: Only call remove_dead_variables once.
Port of 43e0b0d4b2
"anv/pipeline: Only call remove_dead_variables once"

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-12 21:39:41 +01:00
Axel Davy
970556292b st/nine: Protect dtors with mutex
When the flag D3DCREATE_MULTITHREAD is set, a global mutex is used
to protect nine calls.
However for performance reasons, AddRef and Release didn't hold the mutex,
and instead used atomics.

Unfortunately at item release, the item can be destroyed, and that
destruction path should be protected by a mutex (at least for
some objects).

Without this patch, it is possible an app thread is in a dtor
while another thread is making gallium nine calls. It is possible
that two threads are using the same gallium pipe, which is forbiden.
The problem has been made worse with csmt, because it can cause hang,
since nine_csmt_process is not threadsafe.

Fixes Hitman hang, and possibly others.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2017-01-12 20:33:11 +01:00
Axel Davy
5f4359ea0e st/nine: Flush the queue at device dtor
Flush the queue to get refcounts right, and properly
release the items, instead of throwing away all pending
commands.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2017-01-12 20:33:11 +01:00
Axel Davy
4e922c81f6 st/nine: Process pending commands on Reset
Some nine_state_* and nine_context_* functions
used for Reset() require all pending commands are
flushed.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2017-01-12 20:33:11 +01:00
Axel Davy
6b87a2a77a st/nine: Flush pending commands if needed for surface9 changes
nine_context uses NineSurface9 fields, thus we need to flush
pending commands using the surface before changing the fields.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2017-01-12 20:33:11 +01:00
Axel Davy
f895ab8e22 st/nine: Rework CreatePipeSurface
Create both surfaces in one call.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2017-01-12 20:33:11 +01:00
Axel Davy
d43bc05e8b st/nine: Remove duplicated checks
There is no need to check on csmt_active before
calling nine_csmt_process, because the function
checks already.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2017-01-12 20:33:11 +01:00
Masanori Kakura
9b5f5de9e9 st/nine: Don't call u_box_union_* when dirty region is empty
When dirty region is empty, u_box_union_* incorrectly expands
the new region.

This fixes broken font rendering issue in WOLF RPG Editor v2.10 games.

Signed-off-by: Masanori Kakura <kakurasan@gmail.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2017-01-12 20:33:11 +01:00
Emil Velikov
a5f0cdb36f winsys/etnaviv: automake: introduce Makefile.sources
... and list the public header within it.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-12 19:30:15 +00:00
Emil Velikov
0467700536 etnaviv: automake: include all files in the sources lists
Note: the currently mentioned etnaviv_utils.h is typo.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-12 19:30:09 +00:00
Emil Velikov
d1dc22eb46 ac: automake: rework sid_tables.h generation
Drop $(srcdir)/ prefix analogous to before the file (and rule) movement
and move it outside of the NEED_RADEON_LLVM conditional.

Otherwise the build may fail as below.

make[3]: *** No rule to make target 'common/sid_tables.h', needed by 'distdir'.  Stop.

Fixes: b838f64237 "ac/debug: Move sid_tables.h generation to common
code."
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-12 19:29:28 +00:00
Emil Velikov
23dcce0c03 automake: use shared llvm libs for make distcheck
Cc: "12.0 13.0" <mesa-dev@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-12 19:29:22 +00:00
Emil Velikov
024b4c35bc automake: add the new drivers etnaviv and imx to make distcheck
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-12 19:29:20 +00:00
Christian Gmeiner
e8626e3b31 imx: gallium driver for imx-drm scanout driver
Changes from V1 -> V2:
 - updated Copyright
 - added $(top_srcdir)/src/gallium/winsys to include path (suggested by Emil)
 - adapted driver to new renderonly API

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-12 19:27:11 +00:00
The etnaviv authors
c9e8b49b88 etnaviv: gallium driver for Vivante GPUs
This driver supports a wide range of Vivante IP cores like GC880,
GC1000, GC2000 and GC3000.

Changes from V1 -> V2:
 - added missing files to actually integrate the driver into build system.
 - adapted driver to new renderonly API

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-12 19:27:11 +00:00
Christian Gmeiner
848b49b288 gallium: add renderonly library
This a very lightweight library to add basic support for renderonly
GPUs. A kms gallium driver must specify how a renderonly_scanout
objects gets created. Also it must provide file handles to the used
kms device and the used gpu device.

This could look like:
struct renderonly ro = {
   .create_for_resource = renderonly_create_gpu_import_for_resource,
   .kms_fd = fd,
   .gpu_fd = open("/dev/dri/renderD128", O_RDWR | O_CLOEXEC)
};

The renderonly_scanout object exits for two reasons:
 - Do any special treatment for a scanout resource like importing the
   GPU resource into the scanout hw.
 - Make it easier for a gallium driver to detect if anything special
   needs to be done in flush_resource(..) like a resolve to linear.

A GPU gallium driver which gets used as renderonly GPU needs to be
aware of the renderonly library.

This library will likely break android support and hopefully will get
replaced with a better solution based on gbm2.

Changes from V1 -> V2:
 - reworked the lifecycle of renderonly object (suggested by Nicolai Hähnle)
 - killed the midlayer (suggested by Thierry Reding)
 - made the API more explicit regarding gpu and kms fd's
 - added some docs

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Alexandre Courbot <acourbot@nvidia.com>
2017-01-12 19:27:11 +00:00
Jason Ekstrand
27a1c7ffbd spirv: Handle patch decorations up-front
Once again, SPIR-V is insane... It allows you to place "patch"
decorations on structure members.  Presumably, this is so that you can
do something such as

out struct S {
   layout(location = 0) patch vec4 thing1;
   layout(location = 0) vec4 thing2;
} str;

And have your I/O "nicely" organized.  While this is a bit silly, it's
allowed and well-defined so whatever.  Where it really gets interesting
is when you have an array of struct.  SPIR-V says nothing about not
allowing you to have those qualifiers on the members of a struct that's
inside an array and GLSLang does this.  Specifically, if you have

layout(location = 0) out patch struct S {
   vec4 thing1;
   vec4 thing2;
} str[2];

then GLSLang will place the "patch" decorations on the struct members.
This is ridiculous there is no way that having some of them be patch and
some not would be well-defined given that patch and non-patch outputs
are in effectively different storage classes.  This commit moves around
the way we handle the "patch" decoration so that we can detect even the
crazy cases and handle them.

Fixes: dEQP-VK.tessellation.user_defined_io.per_patch_block_array.*

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-12 10:41:34 -08:00
Chad Versace
1e41d7f7b0 anv: Support loader interface version 3 (patch v2)
This patch implements vk_icdNegotiateLoaderICDInterfaceVersion(), which
brings us to loader interface v3.

v2:
  - Drop the pragmas. [emil]
  - Advertise v3 instead of v2. Anvil supported more than I
    thought.  [jason]
  - s/Surface/SurfaceKHR/ in comments. [emil]

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
Cc: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 09:42:32 -08:00
Chad Versace
98cf089849 vulkan: Update vk_icd.h to interface version 3
Import from commit f2aeefec on branch 'master'
of https://github.com/KhronosGroup/Vulkan-LoaderAndValidationLayers.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
2017-01-12 09:42:32 -08:00
Chad Versace
c085bfcec9 vulkan: Add new cast macros for VkIcd types
We can't import the latest vk_icd.h because the new header breaks the
Mesa build. This patch defines new casting macros,
ICD_DEFINE_NONDISP_HANDLE_CASTS() and ICD_FROM_HANDLE(), which can
handle both the old and new vk_icd.h, and will prevent the build from
breaking when we update the header.

In the old vk_icd.h, types were defined as:

  typedef struct _VkIcdFoo {
    ...
  } VkIcdFoo;

Commit 6ebba1f6 in the Vulkan loader changed the above to

  typedef {
    ...
  } VkIcdFoo;

because the old definitions violated the C and C++ specs. According to
the specs, identifiers that begins with an underscore followed by an
uppercase letter are reserved. (It's pedantic, I know), See the Github
issue referenced below.

References: https://github.com/KhronosGroup/Vulkan-LoaderAndValidationLayers/issues/7
References: 6ebba1f630
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
2017-01-12 09:42:32 -08:00
George Kyriazis
a61528fa33 Always defer memory free in swr_resource_destroy
Defer delete on regular resources.  This ensures that any work being done
on the resource is completed before freeing up the resource's memory.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-01-12 09:10:15 -06:00
Juan A. Suarez Romero
ce44501ea8 nir/i965: assert first is always less than 64
This fixes a defect detected by Coverity Scan.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-01-12 15:08:05 +00:00
Samuel Pitoiset
f0997e2aa8 nvc0: enable GL 4.3 on gm107+
Although, arb_shader_image_load_store-atomicity will most likely
hang your box, I think it's now quite reasonable to enable GL 4.3
on Maxwell/Pascal GPUs. I suspect that test to be wrong because
it doesn't even work on the NVIDIA blob.

I have tested a bunch of benchmarks (UE4 demos) and real games
like Shadow of Mordor and they all work fine.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-01-12 15:22:21 +01:00
Samuel Pitoiset
38ff9980d7 nvc0: use sched control codes for gm107 MP counters code
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2017-01-12 15:22:15 +01:00
Samuel Pitoiset
75e6992379 nvc0: use sched control codes for gm107 blitter shader
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-01-12 15:22:07 +01:00
Samuel Pitoiset
90537d6a89 nv50/ir: use sched control codes for gm107 builtins
Yes, IMUL/IMAD require dependency barriers and we should
definitely replace these instructions by XMAD but the
different flags need to be figured out. Note that XMAD only
supports 16-bits integers.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2017-01-12 15:22:01 +01:00
Samuel Pitoiset
f519c47f7d nv50/ir: improve instruction pipelining on gm107
This makes use of scheduling control codes which are very useful
for improving the instruction pipelining.

This patch will increase performance on Maxwell GPUs by, at least,
x1.5 up to x3.5 for some benchmarks.

Although this has been fairly well tested, I would not be suprised
if someone hit a corner case somewhere. That way, the scheduler
is enabled by default but it can be deactivated by using
NV50_PROG_SCHED=0.

Thanks to Scott Gray for the reverse engineering work available from
https://github.com/NervanaSystems/maxas/wiki/Control-Codes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Pierre Moreau <pierre.morrow@free.fr>
Tested-by: Alexandre Courbot <acourbot@nvidia.com>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2017-01-12 15:21:54 +01:00
Samuel Pitoiset
1b3b4196f0 nv50/ir: do not insert texture barriers on gm107
It's actually useless to insert those texture barriers post RA
because the current control code (ie. st 0x0) will wait for all
dependencies before issuing a new instruction.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2017-01-12 15:21:47 +01:00
Juan A. Suarez Romero
75968a668e i965/gen7: expose OpenGL 4.2 on Haswell when supported
GL_ARB_vertex_attrib_64bit was the last piece missing.

v2: update docs (Jordan)

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-01-12 12:56:57 +01:00
Samuel Iglesias Gonsálvez
77077986eb i965: enable ARB_shader_precision to HSW+
v2: update docs (Jordan)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-01-12 12:56:57 +01:00
Samuel Iglesias Gonsálvez
1d1ddbaa56 i965: unify the code to enable of ARB_gpu_shader_fp64 and ARB_vertex_attrib_64bit for HSW+
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-01-12 12:56:56 +01:00
Alejandro Piñeiro
485955be9c i965: Enable ARB_vertex_attrib_64bit for Haswell
v2: update docs (Jordan)

Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-01-12 12:56:56 +01:00
Juan A. Suarez Romero
6bb4255f8e i965: check for dual slot attributes on any gen
Those not supporting 64 bit input vertex attributes will have the
dual_slot value as false.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-01-12 12:56:56 +01:00
Juan A. Suarez Romero
f51a5b51ab i965/vec4: emit correctly load_inputs for 64bit data
For dvec3 and dvec4 types, a single GRF do not have enough space to
allocate two inputs from two different vertices (SIMD4x2).

So the GRF only contains first two components for the two vertices, and
the next GRF has the remaining components.

We want to put all the components for the same vertex in the same
register. Thus, we do a shuffle to reorder the data.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-01-12 12:56:56 +01:00
Alejandro Piñeiro
58fdb85f0f i965/vec4: take into account doubles when creating attribute mapping
Doubles needs more that one slot per attribute. So when filling the
attribute_map we check if it is a double in order to allocate one
extra register.

Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-01-12 12:56:56 +01:00
Alejandro Piñeiro
57bab6708f i965/vec4/nir: vec4 also needs to remap vs attributes
Doubles need extra space, so we would need to do a remapping for vec4
too in order to take that into account. We reuse the already
existing remap_vs_attrs, but passing is_scalar, so they could
remap accordingly.

v2: code-format remap_vs_attrs_params initialization (Matt)

Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-01-12 12:56:56 +01:00
Alejandro Piñeiro
f8310189f4 i965/vec4: use attribute slots for first non payload GRF
As part of the payload setup, setup_attributes is called with the first
GRF that can be used for the attributes (first ones are used for
uniforms for example) and returns the first GRF that is not part of the
payload. Before this patch, it adds directly the number of attributes.
But as with 64-bit attributes can consume more than one slot, that is
not valid anymore. This patch change the addition to use the number of
slots consumed.

gen >= 8 would not be affected, as they use the scalar mode. For that
case, the vs configuration is done at fs_visitor::assign_vs_urb_setup.

v2: add explanation in commit log (Jordan)

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-01-12 12:56:35 +01:00
Alejandro Piñeiro
329cbe363d i965: downsize *64*PASSTHRU formats to equivalent *32*FLOAT formats on gen < 8
gen < 8 doesn't support *64*PASSTHRU formats when emitting
vertices. So in order to provide the equivalent functionality, we need
to downsize the format to equivalent *32*FLOAT, and in some cases
(R64G64B64 and R64G64B64A64) submit two 3DSTATE_VERTEX_ELEMENTS for
each vertex element.

Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-01-12 12:56:12 +01:00
Alejandro Piñeiro
717f99b34a i965: return PASSTHRU surface types also on gen7
Although gen7 doesn't include surface types as a valid conversion format,
we return it, as it reflects what we want to achieve, even if we need
to workaround it on gen < 8.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-01-12 12:56:12 +01:00
Alejandro Piñeiro
f354cd5c69 main/buffers: take into account FRONT_AND_BACK on ReadBuffer
From OpenGL 3.1 spec, section 4.3.1 "Reading Pixels", page 190 (203 PDF)

  "When READ FRAMEBUFFER BINDING is zero, i.e. the default
   framebuffer, src must be one of the values listed in table 4.4,
   including NONE . FRONT_AND_BACK , FRONT , and LEFT refer to the
   front left buffer."

There is an equivalent text on OpenGL 4.5 spec, section 18.2.1
"Selecting Buffers for Reading", page 502 (524 PDF), so the behaviour
is still the same.

Part of the fix for:
GL45-CTS.direct_state_access.framebuffers_draw_read_buffers_errors

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-01-12 08:21:03 -02:00
Alejandro Piñeiro
d54bc7e01f main/buffers: update error handling on DrawBuffers for 4.5
Before 4.5, GL_BACK was not allowed as a value of bufs. Since 4.5 it
is allowed under some circumstances:

From the OpenGL 4.5 specification, Section 17.4.1 "Selecting Buffers
for Writing", page 493 (page 515 of the PDF):
 "An INVALID_ENUM error is generated if any value in bufs is FRONT,
  LEFT, RIGHT, or FRONT_AND_BACK . This restriction applies to both
  the de- fault framebuffer and framebuffer objects, and exists
  because these constants may themselves refer to multiple buffers, as
  shown in table 17.4."

And on page 492 (page 514 of the PDF):
 "If the default framebuffer is affected, then each of the constants
  must be one of the values listed in table 17.6 or the special value
  BACK . When BACK is used, n must be 1 and color values are written
  into the left buffer for single-buffered contexts, or into the back
  left buffer for double-buffered contexts."

This patch keeps the same behaviour if OpenGL version is < 4. We
assume that for 4.x this is the intended behaviour, so a fix, but for
3.x the intended behaviour is the already in place.

Part of the fix for:
GL45-CTS.direct_state_access.framebuffers_draw_read_buffers_errors

v2: remove forgot printf
v3: remove spaces before commas on spec quote, split line too
    long (Anuj)

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-01-12 08:21:03 -02:00
Nicolai Hähnle
e33910b0d9 radeonsi: num_records is in units of stride for swizzled buffers even on VI
The old setting didn't hurt, but this is cleaner.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-12 11:02:56 +01:00
Juan A. Suarez Romero
883ca597df docs: document INTEL_PRECISE_TRIG envvar
v2: use more generic description (Jordan)

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-01-12 09:58:23 +00:00
Iago Toral Quiroga
5bcafc933c spirv: fix typo in warning message
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-01-12 08:18:00 +01:00
Rafael Antognolli
ea7e4b1e51 i965: Enable predicate support on gen >= 8.
Predication needs cmd parser only on gen7. For newer platforms, it
should be available without it.

v2 (Ken): rebase on recent changes.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-11 21:44:59 -08:00
Timothy Arceri
0252ba26c5 util: fix list_is_singular()
Currently its dependant on the user calling and checking the result
of list_empty() before using the result of list_is_singular().

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 13:58:32 +11:00
Nanley Chery
5857858aa6 anv/image: Disable HiZ for depth buffer arrays
We currently don't perform clears or resolves on multiple array layers
with HiZ.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-11 17:35:59 -08:00
Nanley Chery
9f1d3a0c97 anv/cmd_buffer: Fix programmed HiZ qpitch
Match the comment above the field by using units of pixels and not HiZ
blocks.

Cc: mesa-stable@lists.freedesktop.org
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-11 17:35:59 -08:00
Nanley Chery
61992e0afe anv/cmd_buffer: Fix arrayed depth/stencil attachments
Enable multiple layers of the depth/stencil buffers to be accessible.

Fixes the crucible test, func.depthstencil.arrayed_clear.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-11 17:35:59 -08:00
Pierre Moreau
4e0d171d7e clover: Check for executables before enqueueing a kernel
Without this check, the kernel::bind() method would fail with a
std::out_of_range exception, letting an exception escape from the
library into the client, rather than returning the corresponding error
code CL_INVALID_PROGRAM_EXECUTABLE.

Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-01-11 16:50:56 -08:00
Kenneth Graunke
c17b2f5724 spirv: Shut up unhandled enumeration value warnings.
We don't want to do anything for the other cases.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-11 15:16:27 -08:00
Timothy Arceri
de8b03f5fb nir: don't turn ieq/ine into inot if used by an if
Otherwise we will end up with an extra instruction to compare the
result of the inot.

On BDW:

total instructions in shared programs: 13060620 -> 13060481 (-0.00%)
instructions in affected programs: 103379 -> 103240 (-0.13%)
helped: 127
HURT: 0

total cycles in shared programs: 256590950 -> 256587408 (-0.00%)
cycles in affected programs: 11324730 -> 11321188 (-0.03%)
helped: 114
HURT: 21

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 09:47:29 +11:00
Timothy Arceri
7acc865226 nir: add late opt to turn inot/b2f combos back to bcsel
We turn these from bcsel into inot/b2f combos in order for other
optimisation passes to get further. Once we have finished turn
the ones that remain and are used in more than a single expression
back into a bcsel.

On BDW:

total instructions in shared programs: 13060965 -> 13060297 (-0.01%)
instructions in affected programs: 835701 -> 835033 (-0.08%)
helped: 670
HURT: 2

total cycles in shared programs: 256599536 -> 256598006 (-0.00%)
cycles in affected programs: 114655488 -> 114653958 (-0.00%)
helped: 419
HURT: 240

LOST:   0
GAINED: 1

The 2 HURT is because inserting bcsel creates the only use of
const 1.0 in two shaders from tri-of-friendship-and-madness.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 09:47:29 +11:00
Timothy Arceri
8f37fc7066 nir: add imprecise flrp optimisation
On BDW:

total instructions in shared programs: 13061890 -> 13061877 (-0.00%)
instructions in affected programs: 2441 -> 2428 (-0.53%)
helped: 13
HURT: 0

total cycles in shared programs: 256612254 -> 256611784 (-0.00%)
cycles in affected programs: 16418 -> 15948 (-2.86%)
helped: 10
HURT: 2

V2: don't use ffma directly

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 09:47:29 +11:00
Kenneth Graunke
b4c44ff08c i965: Use the nir_move_comparisons pass.
While the below stats are encouraging this pass will also become
very usefull for avoiding regression once
brw_do_channel_expressions() and brw_do_vector_splitting() are
disabled.

On Broadwell:

total instructions in shared programs: 13078787 -> 13060898 (-0.14%)
instructions in affected programs: 1809827 -> 1791938 (-0.99%)
helped: 4527
HURT: 157

total cycles in shared programs: 256562762 -> 256590424 (0.01%)
cycles in affected programs: 159749392 -> 159777054 (0.02%)
helped: 5583
HURT: 2289

total spills in shared programs: 14929 -> 14923 (-0.04%)
spills in affected programs: 62 -> 56 (-9.68%)
helped: 1
HURT: 0

total fills in shared programs: 20144 -> 20141 (-0.01%)
fills in affected programs: 253 -> 250 (-1.19%)
helped: 1
HURT: 3

LOST:   0
GAINED: 2

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 09:47:29 +11:00
Kenneth Graunke
b5e682a1ef i965: Move nir_lower_locals_to_regs a bit later.
I'm going to add a boolean scheduling pass that I want run late, but
after copy propagation and dead code elimination.  Yet, I don't want
to have to think about registers.  So, move the register conversion
a little later.

No impact on shader-db.  Suggested by Jason Ekstrand.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-12 09:47:29 +11:00
Kenneth Graunke
fd957b1751 nir: Introduce a nir_opt_move_comparisons() pass.
This tries to move comparisons (a common source of boolean values)
closer to their first use.  For GPUs which use condition codes,
this can eliminate a lot of temporary booleans and comparisons
which reload the condition code register based on a boolean.

V2: (Timothy Arceri)
 - fix move comparision for phis so we dont end up with:

    vec1 32 ssa_227 = phi block_34: ssa_1, block_38: ssa_240
    vec1 32 ssa_235 = feq ssa_227, ssa_1
    vec1 32 ssa_230 = phi block_34: ssa_221, block_38: ssa_235

 - add nir_op_i2b/nir_op_f2b to the list of comparisons.

V3: (Timothy Arceri)
 - tidy up suggested by Jason.
 - add inot/fnot to move comparison list

V4: (Jason Ekstrand)
 - clean up move_comparison_source
 - get rid of the tuple
 - rework phi handling

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 09:47:29 +11:00
Timothy Arceri
e8328e55e7 nir/algebraic: add support for conditional helper functions to expressions
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 09:47:29 +11:00
Jason Ekstrand
a7e399de59 anv/TODO: Check off a bunch of stuff 2017-01-11 10:28:18 -08:00
Jason Ekstrand
c472568b4e nir/search: Only allow matching SSA values
This is more correct and should also be a tiny bit faster since we're
just comparing pointers instead of calling nir_src_equal.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2017-01-11 10:28:18 -08:00
Derek Foreman
534ea2b5ba egl/dri2: add image_loader_extension back into loader extensions for wayland
before commit f871946594
image_loader_extension was always present in dri2_dpy->extensions,
after that commit it is only present for render nodes.

Its removal broke partial render based on buffer age on (at least)
raspberry pi.

Fixes: f871946594 "egl/dri2: rework dri2_egl_display::extensions storage"
Signed-off-by: Derek Foreman <derekf@osg.samsung.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-11 15:58:14 +00:00
Li Qiang
6205c53303 gallium/tgsi: fix overflow in parse property
In parse_identifier, it doesn't stop copying '*pcur'
untill encounter the NULL. As the 'ret' has a
fixed-size buffer, if the '*pcur' has a long string,
there will be a buffer overflow. This patch avoid this.

Signed-off-by: Li Qiang <liq3ea@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2017-01-11 12:40:38 +01:00
Mauro Rossi
2c0d849e2d st/dri: remove trailing whitespace
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-11 10:16:19 +02:00
Mauro Rossi
eca79e84b9 android: st/mesa: fix building error in libmesa_st_mesa
Fixes building error due to dependency on nir generated headers

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-11 10:16:19 +02:00
Dave Airlie
e9d3cbca31 radv: fix multi-viewport emission
This set context req seq was in the wrong place.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-01-11 09:08:51 +01:00
Tapani Pälli
f97f938650 nir: change asserts to unreachable in nir_type_conversion_op
this is to avoid following compilation error on Android:

   error: control may reach end of non-void function [-Werror,-Wreturn-type]

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-01-11 10:08:13 +02:00
Iago Toral Quiroga
a9f497c678 spirv: gl_PrimitiveID in the fragment shader is handled as an input
Geometry and Tessellation stages do handle this as a system value instead.

Fixes:
dEQP-VK.geometry.basic.primitive_id

Reviewed-by: Dave Airlie <ailried@redhat.com>
2017-01-11 08:59:28 +01:00
Rob Clark
99e9dca149 freedreno: add "nogrow" debug param
Sometimes it is useful to disable the "growable" cmdstream buffers for
debugging.  (See 419a154d in libdrm)

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-01-10 19:40:00 -05:00
Rob Clark
a43f3b895c freedreno/a5xx: remove hack for glamor
Now that issues glamor was hitting w/ glsl>=130 (aka missing INSTANCED
bit in vertex attribute state) is fixed, remove hack.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-01-10 19:40:00 -05:00
Rob Clark
3c71853c9a freedreno/a5xx: fixed instanced
Add missing bit, now that we know where it is.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-01-10 19:40:00 -05:00
Rob Clark
b48fde1576 freedreno/a5xx: use the non-_ZERO_BASE for vertexid
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-01-10 19:40:00 -05:00
Rob Clark
730c3047f0 freedreno/a5xx: add texture MIPLVLS
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-01-10 19:40:00 -05:00
Rob Clark
1a5d0818df freedreno/a5xx: fix fragcoord related hangs
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-01-10 19:40:00 -05:00
Rob Clark
ff81c3c9fd freedreno: update generated headers
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-01-10 19:40:00 -05:00
Kenneth Graunke
23a36c2811 anv: Enable tessellation shaders.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-10 13:27:31 -08:00
Kenneth Graunke
ebd88b5aa3 anv: Initialize physical device limits for tessellation
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-10 13:27:31 -08:00
Kenneth Graunke
dcca706b4e anv: Clamp depth buffer dimensions to be at least 1.
When there are no framebuffer attachments, fb->width and fb->height will
be 0.  Subtracting 1 results in 4294967295 which is too large for the
field, causing genxml assertions when trying to create the packet.

In this case, we can just program it to 1.

Caught by dEQP-VK.tessellation.tesscoord.triangles_equal_spacing.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-10 13:27:31 -08:00
Kenneth Graunke
e50d4807a3 anv: Compile TCS/TES shaders.
v2: Merge more TCS/TES info.
v3: Fix caching keys.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-10 13:27:31 -08:00
Kenneth Graunke
de05ecba9f anv: Emit 3DSTATE_HS/TE/DS packets.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-10 13:27:31 -08:00
Kenneth Graunke
08b5713068 anv: Handle patch primitives.
v2: Use anv_pipeline_has_stage rather than tess_info != NULL.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-10 13:27:10 -08:00
Kenneth Graunke
5297267a1c nir: Add a pass to lower TES patch_vertices intrinsics to a constant.
In Vulkan, we always have both the TCS and TES available in the same
pipeline, so we can simply use the TCS OutputVertices execution mode
value as the TES PatchVertices built-in.

For GLSL, we handle this in the linker.  But we could use this pass
in the case when both TCS and TES are linked together, if we wanted.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-10 13:21:53 -08:00
Kenneth Graunke
944e8b08cd spirv: Silence unsupported tessellation capability warnings.
...when the capability bit is set.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-10 13:21:38 -08:00
Kenneth Graunke
1e5b09f42f spirv: Tidy some repeated if checks by using a switch statement.
Iago suggested tidying this.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-10 13:21:31 -08:00
Kenneth Graunke
bb04b84114 spirv: Add tessellation varying and built-in support.
We need to:
- handle the extra array level for per-vertex varyings
- handle the patch qualifier correctly
- assign varying locations

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-10 13:21:28 -08:00
Kenneth Graunke
23710e17f8 spirv: Handle tessellation execution modes.
v2: Use info->tess.
v3: Handle more things in either TCS/TES.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com> [v1]
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-10 13:21:24 -08:00
Kenneth Graunke
5edc338162 compiler: Merge shader_info's tcs and tes structs.
Annoyingly, SPIR-V lets you specify all of these fields in either the
TCS or TES, which means that we need to be able to store all of them
for either shader stage.  Putting them in a union won't work.

Combining both is an easy solution, and given that the TCS struct only
had a single field, it's pretty inexpensive.

This patch renames the combined struct to "tess" to indicate that it's
for tessellation in general, not one of the two stages.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-10 13:21:21 -08:00
Kenneth Graunke
195bf8f027 genxml: Rename 3DSTATE_HS::Enable to "Function Enable".
"Function Enable" is what the other stages use.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-10 13:20:33 -08:00
Lionel Landwerlin
860d91ec5b anv: set input_slots_valid on brw_wm_prog_key
With shaders using a lot of inputs/outputs, like this (from Gtk+) :

layout(location = 0) in vec2 inPos;
layout(location = 1) in float inGradientPos;
layout(location = 2) in flat int inRepeating;
layout(location = 3) in flat int inStopCount;
layout(location = 4) in flat vec4 inClipBounds;
layout(location = 5) in flat vec4 inClipWidths;
layout(location = 6) in flat ColorStop inStops[8];

layout(location = 0) out vec4 outColor;

we're missing the programming of the input_slots_valid field leading
to an assert further down the backend code.

v2: Use valid slots of the geometry or vertex stage (Jason)

v3: Use helper to find correct vue map (Jason)

v4: Set the valid slots off the previous stages (Jason)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-10 18:16:45 +00:00
Lionel Landwerlin
4b44ca7225 anv: add helper to get vue map for fragment shader
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-10 18:14:36 +00:00
Lionel Landwerlin
59fe3796a8 anv: add get_.*_prog_data for tesselation stages
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-10 18:14:33 +00:00
Lionel Landwerlin
6122b4ee96 anv: make get_.*_prog_data take a const pipeline
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-10 18:14:09 +00:00
Vinson Lee
01d80bed1f nir: Fix anonymous union initialization with older GCC.
Fix this build error with GCC 4.4.7.

  CC     nir/nir_opt_copy_prop_vars.lo
nir/nir_opt_copy_prop_vars.c: In function ‘copy_prop_vars_block’:
nir/nir_opt_copy_prop_vars.c:765: error: unknown field ‘deref’ specified in initializer
nir/nir_opt_copy_prop_vars.c:765: warning: missing braces around initializer
nir/nir_opt_copy_prop_vars.c:765: warning: (near initialization for ‘(anonymous).<anonymous>’)
nir/nir_opt_copy_prop_vars.c:765: warning: initialization from incompatible pointer type

Fixes: 62332d139c ("nir: Add a local variable-based copy propagation pass")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 23:25:32 -08:00
Samuel Iglesias Gonsálvez
17eac30e90 docs: add Vulkan Float64 capability support for anv driver
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-10 06:42:44 +01:00
Dave Airlie
ada66480b2 radv/ac: add support for multi sample image coords
This just adds the nir->llvm support, enabling
the extension causes some failures on llvm 3.9 at least,
but this code seems fine.

NIR passes the sampler in src[1].x, and we LLVM/SI requires
it as the last parameters in the coords (coord[2] for 2D,
coord[3] for 2DArray).

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-10 12:59:31 +10:00
Boyan Ding
41b1d9a558 glsl: Do not allow scalar types in vector relational functions
According to OpenGL Shading Language 4.50 spec, Section 8.7 "Vector
Relational Functions", functions of this type do not operate on scalar
types, so remove scalar types from signature definitions to make the
behavior consistent with glslangValidator and other drivers.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
2017-01-09 17:58:33 -08:00
Thomas Hindoe Paaboel Andersen
5b4fa21d53 nir: remove duplicated foreach loop
The foreach loop was called both in the else case and right after. The
indentation seems to indicate that the extra call was from a previous
version with an else section with out curly brackets.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 17:04:47 -08:00
Kenneth Graunke
2bae2fa094 i965: Fix number of slots in SSO mode when there are no user varyings.
We want vue_map->num_slots to be one more than the final slot.

When assigning fixed slots, built-in slots, and non-SSO user varyings,
we do slot++.  This leaves "slot" as one past the most recently assigned
slot.  But for SSO user varyings, we computed slot based on the varying
location value...and left it at that slot value.

To work around this inconsistency, I made num_slots be "slot + 1" if
separate and "slot" otherwise.  The problem is...if there are no user
varyings in SSO mode...then we would have done slot++ when assigning
built-ins, so it would be off by one.  This resulted in loops from 0
to vue_map->num_slots hitting a bonus BRW_VARYING_SLOT_PAD at the end.

This used to break the SIMD8 VS/TES backends, but I fixed that in
commit 480d6c1653.  It's probably safe
at this point, but we should fix it anyway.

To fix this, do slot++ in all cases.  For SSO mode, we overwrite slot
for every varying, so this increment only matters on the last varying.
Because we process varyings in order, this will set slot to 1 more
than the highest assigned slot.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-01-09 16:52:16 -08:00
Kenneth Graunke
203c128781 spirv: Move cursor before calling vtn_ssa_value() in phi 2nd pass.
vtn_ssa_value() can produce variable loads, and the cursor might
be after a return statement, causing nir_builder assert failures
about not inserting instructions after a jump.

This fixes:
dEQP-VK.spirv_assembly.instruction.graphics.barrier.in_if
dEQP-VK.spirv_assembly.instruction.graphics.barrier.in_switch

Cc: "13.0 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 16:52:02 -08:00
Marek Olšák
230b756f86 mesa: set GLSL 1.20 for the fixed-function fragment shader
This fixes broken depth texturing after:

commit 22639a6e19
Author: Timothy Arceri <timothy.arceri@collabora.com>
Date:   Mon Nov 21 00:29:29 2016 +1100

    st/mesa: get Version from gl_program rather than gl_shader_program

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-01-10 01:03:32 +01:00
Bas Nieuwenhuizen
8bc39e251b radv: Create single RADV_DEBUG env var.
Also changed RADV_SHOW_QUEUES to a no compute queue option. That
would make more sense later when the compute queue is established,
but the transfer queue still experimental.

v2: Don't include the trace flag.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-01-09 21:44:14 +01:00
Bas Nieuwenhuizen
8cb60c7dd3 ac/debug: Dump indirect buffers.
This is for handling chained command buffers and secondary command
buffers. It doesn't handle the trace id for secondary command buffers
yet, but I don't think that is possible in general with just writes,
as we could call a secondary command buffer multiple times.

I think this is good enough for now, as the most useful case is the
chaining when we grow an IB.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-01-09 21:44:08 +01:00
Bas Nieuwenhuizen
97dfff5410 radv: Dump command buffer on hang.
v2:
  - Now use the filename specified by RADV_TRACE_FILE env var.
  - Use the same var to enable tracing.

I thought we could as well always set the filename explicitly
instead of having some arbitrary defaults, and at that point
we don't need a separate feature enable.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-01-09 21:44:03 +01:00
Bas Nieuwenhuizen
0ef1b4d5b1 ac/debug: Move IB decode to common code.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-01-09 21:43:59 +01:00
Bas Nieuwenhuizen
b838f64237 ac/debug: Move sid_tables.h generation to common code.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-01-09 21:43:54 +01:00
Jason Ekstrand
1c5dcecd51 relnotes: Claim OpenGL 4.5 rather than 4.4
Acked-by: Matt Turner <mattst88@gmail.com>
2017-01-09 10:55:57 -08:00
Jason Ekstrand
5b4aeb331a mesa: Bump the version to 17.0
Acked-by: Matt Turner <mattst88@gmail.com>
2017-01-09 10:55:39 -08:00
Marek Olšák
cac74a9bcc radeonsi: fix the Witcher 2 black transitions
v2: do it properly

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98238

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-09 12:01:30 +01:00
Marek Olšák
5b85a6b3f7 radeonsi: set si_shader_context::input_decls for ranged decls correctly
This has no effect because no code uses those members with ranged decls.

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-09 12:01:30 +01:00
Marek Olšák
6f356d15be radeonsi: cleanly communicate whether si_shader_dump should check R600_DEBUG
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-09 12:01:30 +01:00
Iago Toral Quiroga
030e5f07a5 isl: render target cube maps should be handled as 2D images, not cubes
This fixes layered rendering Vulkan CTS tests with cube (arrays). We
also do this in the GL driver, see this code from gen8_depth_state.c
for example:

case GL_TEXTURE_CUBE_MAP_ARRAY:
case GL_TEXTURE_CUBE_MAP:
   /* The PRM claims that we should use BRW_SURFACE_CUBE for this
    * situation, but experiments show that gl_Layer doesn't work when we do
    * this.  So we use BRW_SURFACE_2D, since for rendering purposes this is
    * equivalent.
    */
   surftype = BRW_SURFACE_2D;
   depth *= 6;
   break;

So I guess we simply forgot to port this workaround to Vulkan.

v2: tweak the conditions so the special case is cube texture sampling
    rather than anything else (Jason)

Fixes:
dEQP-VK.geometry.layered.cube*

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 11:43:07 +01:00
Iago Toral Quiroga
566a0c43f0 anv: don't skip the VUE header if we are reading gl_Layer in a fragment shader
This is the same we do in the GL driver: the hardware provides gl_Layer
in the VUE header, so when the fragment shader reads it we can't skip it.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 11:43:07 +01:00
Samuel Iglesias Gonsálvez
0449c93638 anv: enable shaderFloat64 feature
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 10:44:07 +01:00
Samuel Iglesias Gonsálvez
465204695f anv: enable float64 feature on supported platforms
v2:
- Remove image_ms_array initialization (Jason)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 10:44:07 +01:00
Samuel Iglesias Gonsálvez
88c8121ec9 spirv: enable SpvCapabilityFloat64 only to supported platforms
v2 (Jason):
- Use nir_spirv_supported_extensions to check if the feature is enabled.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 10:44:07 +01:00
Juan A. Suarez Romero
c2acf97fcc nir/i965: use two slots from inputs_read for dvec3/dvec4 vertex input attributes
So far, input_reads was a bitmap tracking which vertex input locations
were being used.

In OpenGL, an attribute bigger than a vec4 (like a dvec3 or dvec4)
consumes just one location, any other small attribute. So we mark the
proper bit in inputs_read, and also the same bit in double_inputs_read
if the attribute is a dvec3/dvec4.

But in Vulkan, this is slightly different: a dvec3/dvec4 attribute
consumes two locations, not just one. And hence two bits would be marked
in inputs_read for the same vertex input attribute.

To avoid handling two different situations in NIR, we just choose the
latest one: in OpenGL, when creating NIR from GLSL/IR, any dvec3/dvec4
vertex input attribute is marked with two bits in the inputs_read bitmap
(and also in the double_inputs_read), and following attributes are
adjusted accordingly.

As example, if in our GLSL/IR shader we have three attributes:

layout(location = 0) vec3  attr0;
layout(location = 1) dvec4 attr1;
layout(location = 2) dvec3 attr2;

then in our NIR shader we put attr0 in location 0, attr1 in locations 1
and 2, and attr2 in location 3 and 4.

Checking carefully, basically we are using slots rather than locations
in NIR.

When emitting the vertices, we do a inverse map to know the
corresponding location for each slot.

v2 (Jason):
- use two slots from inputs_read for dvec3/dvec4 NIR from GLSL/IR.

v3 (Jason):
- Fix commit log error.
- Use ladder ifs and fix braces.
- elements_double is divisible by 2, don't need DIV_ROUND_UP().
- Use if ladder instead of a switch.
- Add comment about hardware restriction in 64bit vertex attributes.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 10:42:22 +01:00
Samuel Iglesias Gonsálvez
3551a2d3ad isl: fix VA64 support for double and dvecN vertex attributes
We use *64*_PASSTHRU formats to upload vertex attributes of 64 bits
to avoid conversions. From the BDW PRM, Volume 2d, page 586
(VERTEX_ELEMENT_STATE):

     "When SourceElementFormat is set to one of the *64*_PASSTHRU
     formats, 64-bit components are stored in the URB without any
     conversion. In this case, vertex elements must be written as 128
     or 256 bits, with VFCOMP_STORE_0 being used to pad the output
     as required. E.g., if R64_PASSTHRU is used to copy a 64-bit Red
     component into the URB, Component 1 must be specified as
     VFCOMP_STORE_0 (with Components 2,3 set to VFCOMP_NOSTORE)
     in order to output a 128-bit vertex element, or Components 1-3 must
     be specified as VFCOMP_STORE_0 in order to output a 256-bit vertex
     element. Likewise, use of R64G64B64_PASSTHRU requires Component 3
     to be specified as VFCOMP_STORE_0 in order to output a 256-bit vertex
     element."

v2,v3 (Jason):
- Don't delete unused formats.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 09:10:13 +01:00
Juan A. Suarez Romero
1c9483f48e anv/pipeline: get map for double input attributes
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 09:10:13 +01:00
Samuel Iglesias Gonsálvez
cc4ff6c2a0 spirv: add support for doubles to OpSpecConstant
v2 (Jason):
- Fix indent in radv change
- Add vtn_u64_literal() helper to take 64 bits (Jason)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 09:10:13 +01:00
Samuel Iglesias Gonsálvez
fc1708948b spirv/nir: add (un)packDouble2x32() translation
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 09:10:13 +01:00
Samuel Iglesias Gonsálvez
c332432bae spirv/nir: implement DF conversions
SPIR-V does not have special opcodes for DF conversions. We need to identify
them by checking the bit size of the operand and the result.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 09:10:13 +01:00
Samuel Iglesias Gonsálvez
27cf6a369f nir: add nir_type_conversion_op()
This function returns the nir_op corresponding to the conversion between
the given nir_alu_type arguments.

This function lacks support for integer-based types with bit_size != 32
and for float16 conversion ops.

v2:
- Improve readiness of the code and delete cases that don't happen now (Jason)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 09:10:13 +01:00
Samuel Iglesias Gonsálvez
3a571fcc43 nir: add nir_get_nir_type_for_glsl_type()
v2 (Jason):
- Refactor nir_get_nir_type_for_glsl_type() to avoid using unneeded helpers (Jason)

v3:
- Use return directly (Jason)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 09:10:13 +01:00
Samuel Iglesias Gonsálvez
59944a77ae spirv: add support for doubles on OpComposite{Insert,Extract}
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 09:10:13 +01:00
Samuel Iglesias Gonsálvez
e6bebb9982 spirv: Enable double floating points when copying variables in _vtn_variable_copy()
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 09:10:13 +01:00
Samuel Iglesias Gonsálvez
9d71cfeff8 spirv: add double support to _vtn_block_load_store()
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 09:10:13 +01:00
Samuel Iglesias Gonsálvez
0cd0c32c06 spirv: add double support to _vtn_variable_load_store
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 09:10:13 +01:00
Samuel Iglesias Gonsálvez
8076c8b59f spirv: add double support to SpvOpCompositeExtract
v2 (Jason):
- Add asserts.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 09:10:13 +01:00
Samuel Iglesias Gonsálvez
a966387883 spirv: fix SpvOpSpecConstantOp with SpvOpVectorShuffle working with double-based vecs
We need to pick two 32-bit values per component to perform the right shuffle operation.

v2 (Jason):
- Add assert to check matching bit sizes (Jason)
- Simplify the code to pick components (Jason)

v3:
- Switch on bit_size once (Jason)
- Add comment to explain the constant value for unused components (Erik)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 09:10:13 +01:00
Samuel Iglesias Gonsálvez
ec686ff62c spirv: add DF support to SpvOp*ConstantComposite
v2 (Jason):
- Add assert.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 09:10:13 +01:00
Samuel Iglesias Gonsálvez
2bf4d0ba7a spirv: add DF support to vtn_const_ssa_value()
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 09:10:13 +01:00
Samuel Iglesias Gonsálvez
d77ffc3d87 spirv: add support for loading DF constants
v2 (Jason):
- Add assert.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 09:10:13 +01:00
Samuel Iglesias Gonsálvez
9602c7c02f spirv: add definition of double based data types
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 09:10:13 +01:00
Samuel Iglesias Gonsálvez
d1bbe2c94e spirv: fix typo in spec_constant_decoration_cb()
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 09:10:13 +01:00
Dave Airlie
41969f0d06 radv: drop unused fields in physical device.
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-01-09 16:48:14 +10:00
Tapani Pälli
8b43f42011 i965: call intel_prepare_render always when reading pixels
Currently we do this only in the fallback code (when tiled memcpy
version failed) but it needs to be done always so that we have
correct read and write buffer in place. No regressions seen in CI.

Fixes:
	dEQP-EGL.functional.buffer_age.*

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98330
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-01-09 07:44:53 +02:00
Timothy Arceri
953e4e4417 st/mesa: pass gl_program to st_bind_ubos()
We no longer need anything from gl_linked_shader.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-09 15:27:35 +11:00
Timothy Arceri
270e584a86 st/mesa: pass gl_program to st_bind_images()
We no longer need anything from gl_linked_shader.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-09 15:27:35 +11:00
Timothy Arceri
59ac77b410 st/mesa: stop passing gl_linked_shader to set_affected_state_flags()
We now get everything we need from the gl_program param.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-09 15:27:35 +11:00
Timothy Arceri
ae632afe4f st/mesa/glsl: set num_images directly in shader_info
This change also removes the now duplicate NumImages field.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-09 15:27:35 +11:00
Timothy Arceri
4b30011d34 st/mesa: pass gl_program to st_bind_ssbos()
We no longer need to pass gl_shader_program.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-09 15:27:34 +11:00
Timothy Arceri
1130f82a88 nir: add another comparison simplification
On BDW:

total instructions in shared programs: 13061877 -> 13060965 (-0.01%)
instructions in affected programs: 133569 -> 132657 (-0.68%)
helped: 566
HURT: 0

total cycles in shared programs: 256611784 -> 256599536 (-0.00%)
cycles in affected programs: 861016 -> 848768 (-1.42%)
helped: 379
HURT: 73

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 12:32:16 +11:00
Kenneth Graunke
3371de38f2 nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences.
On BDW:

total instructions in shared programs: 13074882 -> 13068703 (-0.05%)
instructions in affected programs: 1823116 -> 1816937 (-0.34%)
helped: 4187
HURT: 537

total cycles in shared programs: 256622718 -> 256425382 (-0.08%)
cycles in affected programs: 123790120 -> 123592784 (-0.16%)
helped: 3823
HURT: 2037

total spills in shared programs: 15276 -> 14929 (-2.27%)
spills in affected programs: 9446 -> 9099 (-3.67%)
helped: 352
HURT: 1

total fills in shared programs: 20496 -> 20144 (-1.72%)
fills in affected programs: 13040 -> 12688 (-2.70%)
helped: 352
HURT: 1

LOST:   2
GAINED: 21

v2: Rely on 'a' being a well-formed boolean (Connor, Eric).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 12:32:16 +11:00
Kenneth Graunke
1c50d31c26 nir: Convert ineg(b2i(a)) to a if it's a boolean.
On BDW:

total instructions in shared programs: 13071119 -> 13070371 (-0.01%)
instructions in affected programs: 83424 -> 82676 (-0.90%)
helped: 505
HURT: 45 (all TCS, all hurt by a single instruction)

total cycles in shared programs: 256601322 -> 256588932 (-0.00%)
cycles in affected programs: 819410 -> 807020 (-1.51%)
helped: 450
HURT: 57

total loops in shared programs: 2950 -> 2942 (-0.27%)
loops in affected programs: 8 -> 0
helped: 7
HURT: 0

v2: Drop unnecessary 'a@bool' annotation (Connor, Eric).
    Add a comment explaining the rule (Ian).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-09 12:32:16 +11:00
Kenneth Graunke
86b9be777f i965: Move TES input VUE map calculation out a layer.
In Vulkan, we'll compile the TCS and TES at the same time, so I can just
pass the TCS output VUE map to brw_compile_tes as the TES input VUE map.

So, we only need to do this in GL.  Move it to the GL-specific layer.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-07 22:24:10 -08:00
Kenneth Graunke
6e8ac0641f i965: Pass NULL for gl_program when compiling TES.
This isn't needed, and Vulkan doesn't have one.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-07 22:24:10 -08:00
Kenneth Graunke
08f8f1bcd5 i965: Move TES spacing/domain/topology setup to brw_compile_tes().
Moving this down a layer lets us share code between Vulkan and GL.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-07 22:24:10 -08:00
Kenneth Graunke
cc2df4bb81 i965: Access TES shader info via NIR.
NIR exists in both GL and Vulkan, but gl_program is GL specific.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-07 22:24:10 -08:00
Kenneth Graunke
a4fd84ef5f mesa: Introduce a compiler enum for tessellation spacing.
It feels weird using GL_* enums in a Vulkan driver.

v2: Fix the TESS_SPACING -> PIPE_TESS_SPACING conversion.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-07 22:22:28 -08:00
Kenneth Graunke
9bb89175e6 compiler: Change shader_info->tes.vertex_order into a ccw boolean.
The vertex order is either clockwise or counterclockwise.  We can just
store a "ccw" boolean rather than GLenum values.  I don't want to use
GLenums in a Vulkan driver, and even in GL a simple boolean works fine.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-07 20:42:32 -08:00
Jason Ekstrand
faa1edeeb7 anv/pipeline: Call NIR passes using NIR_PASS_V
This lets us get validation without having to do it manually.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-07 15:45:09 -08:00
Jason Ekstrand
43e0b0d4b2 anv/pipeline: Only call remove_dead_variables once
It can handle multiple modes at a time now so there's no reason to call
it repeatedly.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-07 15:45:09 -08:00
Kenneth Graunke
957ec00243 Revert recent GLSL slot counting fiasco.
I apparently broke mark_whole_variable in ir_set_program_inouts.
It was passing a type that wasn't var->type, so the wrapper didn't
work out.  It's all broken, revert it and start over.

Fixes all kinds of things on other drivers.

Revert "glsl: Make is_fixed_function_array actually check for varyings."

This reverts commit 42699e1271.

Revert "glsl: Mark whole variable used for ClipDistance and TessLevel*."

This reverts commit 5c580e64cc.

Revert "glsl: Override the # of varying slots for ClipDistance and TessLevel*."

This reverts commit 8b5749f65a.

Revert "glsl: Create and use a new ir_variable::count_attribute_slots() wrapper."

This reverts commit 6aa5cb34d0.
2017-01-07 15:15:08 -08:00
Kenneth Graunke
42699e1271 glsl: Make is_fixed_function_array actually check for varyings.
We can't check VARYING_SLOT_* locations until we've determined that
the variable is actually a varying.

Fixes assert failures in drivers which actually use this path,
such as radeonsi and i915.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99314
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-07 13:05:37 -08:00
Kai Wasserbäch
5a165b4086 drirc: Allow extension midshader for Divinity: Original Sin (EE)
See also <https://bugs.freedesktop.org/show_bug.cgi?id=93551#c27> where
this was first observed as a requirement.

Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-01-07 15:47:35 +01:00
Timothy Arceri
1edc53a66b glsl: fix opt_minmax redundancy checks against baserange
Marking operations as redundant if they are equal to the base
range is fine when the tree structure is something like this:

        max
      /     \
     max     b
    /   \
   3    max
       /   \
      3     a

But the opt falls apart with a tree like this:

        max
     /       \
    max     max
   /   \   /   \
  3    a   b    3

The problem is that both branches are treated the same: descending in
the left branch will prune the constant, and then descending the right
branch will prune the constant there as well, because limits[0] wasn't
updated to take the change on the left branch into account, and so we
still get [3,\infty) as baserange.

In order to fix the bug we just disable the marking of redundant expressions
when they match the baserange.

NIR algebraic opt will clean up the first tree for anyway, hopefully
other backends are smart enough to do this also.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-07 21:46:36 +11:00
Jason Ekstrand
45912fb908 i965/compiler: Use the new nir_opt_copy_prop_vars pass
We run this after nir_lower_vars_to_ssa so that as many load/store_var
intrinsics as possible before copy_prop_vars executes.  This is because the
pass isn't particularly efficient (it does a lot of linear walks of a
linked list) so we'd like as much of the work as possible to be done before
copy_prop_vars runs.

Shader DB results on Sky Lake:

   total instructions in shared programs: 12020290 -> 12013627 (-0.06%)
   instructions in affected programs: 26033 -> 19370 (-25.59%)
   helped: 16
   HURT: 13

   total cycles in shared programs: 137772848 -> 137549012 (-0.16%)
   cycles in affected programs: 6955660 -> 6731824 (-3.22%)
   helped: 217
   HURT: 237

   total loops in shared programs: 3208 -> 3208 (0.00%)
   loops in affected programs: 0 -> 0
   helped: 0
   HURT: 0

   total spills in shared programs: 4112 -> 4057 (-1.34%)
   spills in affected programs: 483 -> 428 (-11.39%)
   helped: 2
   HURT: 0

   total fills in shared programs: 5519 -> 5102 (-7.56%)
   fills in affected programs: 993 -> 576 (-41.99%)
   helped: 2
   HURT: 0

   LOST:   0
   GAINED: 0

Broadwell had similar results.  On older hardware, the impact isn't as
large because they don't advertise GL 4.5.  Of the hurt programs, all but
one are hurt by a single instruction and the one is hurt by 3 instructions.
All of the helped programs, on the other hand, are helped by at least 3
instructions and one kerbal space program shader is helped by 44.59%.

The real star of the show, however, is the Gl43CSDof synmark2 benchmark
which has two shaders which are cut by 28% and 40% and the over-all runtime
performance of the benchmark on my Sky Lake laptop is improved by around
25-30% (it's a bit hard to be exact due to thermal throttling).

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-06 16:44:29 -08:00
Jason Ekstrand
62332d139c nir: Add a local variable-based copy propagation pass
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-06 16:44:28 -08:00
Jason Ekstrand
830dca74fe nir/builder: Add a helper for getting the most recently added instruction
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-06 16:44:28 -08:00
Jason Ekstrand
75a6707984 nir/builder: Add a load_deref_var helper
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-06 16:44:28 -08:00
Jason Ekstrand
13a2f20740 nir/dead_variables: Remove shader-local variables that are only written
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-06 16:44:28 -08:00
Jason Ekstrand
58fe5c57cd nir/dead_variables: Removed shared variables when requested
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-06 16:44:28 -08:00
Jason Ekstrand
2d7bed6158 anv/formats: Use the real format for B4G4R4A4_UNORM_PACK16 on gen8
Because border color is handled pre-swizzle, when we move the alpha
channel around in the format, the OPAQUE_BLACK border colors don't work
correctly on B4G4R4A4_UNORM_PACK16 with the hack.  This fixes the
following Vulkan CTS tests on Broadwell:

dEQP-VK.pipeline.sampler.view_type.2d_array.format.b4g4r4a4_unorm_pack16.address_modes.all_mode_clamp_to_border_opaque_black
dEQP-VK.pipeline.sampler.view_type.1d_array.format.b4g4r4a4_unorm_pack16.address_modes.all_mode_clamp_to_border_opaque_black
dEQP-VK.pipeline.sampler.view_type.2d.format.b4g4r4a4_unorm_pack16.address_modes.all_mode_clamp_to_border_opaque_black
dEQP-VK.pipeline.sampler.view_type.1d.format.b4g4r4a4_unorm_pack16.address_modes.all_mode_clamp_to_border_opaque_black
dEQP-VK.pipeline.sampler.view_type.3d.format.b4g4r4a4_unorm_pack16.address_modes.all_mode_clamp_to_border_opaque_black

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2017-01-06 16:44:15 -08:00
Jason Ekstrand
4e7958fb13 isl: Mark A4B4G4R4_UNORM as supported on gen8
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0" <mesa-dev@lists.freedesktop.org>
2017-01-06 16:44:15 -08:00
Pierre-Loup A. Griffais
f6d3af2af6 radv: fix depth transitions with layerCount = VK_REMAINING_ARRAY_LAYERS
Interpreting layerCount literally would try to create billions of image
views in radv_process_depth_image_inplace().

Signed-off-by: Pierre-Loup A. Griffais <pgriffais@valvesoftware.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-01-07 01:26:08 +01:00
Kenneth Graunke
e6ae19944d i965: Rework gl_TessLevel*[] handling to use NIR compact arrays.
Treating everything as scalar arrays allows us to drop a bunch of
special case input/output munging all throughout the backend.
Instead, we just need to remap the TessLevel components to the
appropriate patch URB header locations in remap_patch_urb_offsets().

We also switch to treating the TES input versions of these as ordinary
shader inputs rather than system values, as remap_patch_urb_offsets()
just makes everything work out without special handling.

This regresses one Piglit test:
arb_tessellation_shader-large-uniforms/GL_TESS_CONTROL_SHADER-array-at-limit

The compiler starts promoting the constant arrays assigned to gl_TessLevel*
to uniform arrays.  Since the shader also has a uniform array that uses
the maximum number of uniform components, this puts it over the uniform
component limit enforced by the linker.  This is arguably a bug in the
constant array promotion code (it should avoid pushing us over limits),
but is unlikely to penalize any real application.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-06 15:55:48 -08:00
Kenneth Graunke
31d9de58ab i965: Inline store_output helper in quads workaround code.
It's only used in one place, it ignores the offset parameter currently,
and I want to add more parameters...at which point, passing in a bunch
of integers seems less obvious than writing it out.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-06 15:55:47 -08:00
Kenneth Graunke
311b1f0a98 nir: Make glsl_to_nir compact scalar TessLevel arrays.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-06 15:55:46 -08:00
Kenneth Graunke
496693d466 i965: Make unify_interfaces not spread VARYING_BIT_TESS_LEVEL_*.
This is harmless today because gl_TessLevelInner/Outer in the TES is
currently treated as system values.  However, when we move to treating
them as inputs, this would cause a bug: with no TCS present, it would
propagate TES reads of VARYING_SLOT_TESS_LEVEL into the VS output VUE
map slots.  This is totally bogus - those don't even exist in the VS.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-06 15:55:42 -08:00
Kenneth Graunke
a46bd79ee1 glsl: Support gl_TessLevelInner/Outer[] as TES input variables.
Upcoming reworks in i965 are going to make it easy to handle this
like any other input.  Having it as a system value will just require
additional code for no benefit.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-06 15:55:41 -08:00
Kenneth Graunke
5c580e64cc glsl: Mark whole variable used for ClipDistance and TessLevel*.
There's no point in trying to mark partial array access for
gl_ClipDistance, gl_TessLevelOuter, or gl_TessLevelInner - they're
special built-in variables that control fixed function hardware,
and will likely be used in an all-or-nothing fashion.

Since these arrays only occupy 1-2 varying slots, we have to avoid
our normal processing which increments the slot value by the array
index.

(I wrote this code before i965 switched from ir_set_program_inouts
to nir_shader_gather_info.  It's not used by anyone today, and I'm
not sure how valuable it is...the alternative to GLSL IR lowering
is NIR compact arrays, at which point you should use nir_gather_info.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-06 15:55:39 -08:00
Kenneth Graunke
8b5749f65a glsl: Override the # of varying slots for ClipDistance and TessLevel*.
Right now, this shouldn't have any effect, as all drivers use
LowerClipDist and LowerTessFactors to turn the float[] arrays into
vectors.

However, it should help make it possible for drivers to avoid that
lowering.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-06 15:55:37 -08:00
Kenneth Graunke
6aa5cb34d0 glsl: Create and use a new ir_variable::count_attribute_slots() wrapper.
This wraps glsl_type::count_attribute_slots(), but will soon contain a
couple of overrides for a couple of GLSL built-ins variables.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-06 15:55:25 -08:00
Marek Olšák
aead6a1e94 gallium/radeon: use the internal clear_buffer callback to fix r600g
r600g doesn't set pipe_context::clear_buffer.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99303

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2017-01-06 23:32:25 +01:00
Roland Scheidegger
f4821daed1 llvmpipe: do transpose/untwiddle after conversion for 8bit formats
Generally we should do tranpose after conversion, if the format has less than
32 bits per channel (if it has 32 bits, conversion is going to be a no-op
anyway...). This is obviously because there's less vectors to deal with.
Though the advantage for 16 bit formats isn't that big, and in fact with AVX
there isn't really any (as the 32bit unpacks can be done with 256bit, but
the smaller ones cannot, although that would change again with proper AVX2
support).
Only makes sense for 2d and not 1d cases. And to keep things easy, only handle
1,2 and 4 channels (rgbx is just fine).
For rgba unorm8 format the backend conversion sums up to these instruction
totals (not counting the movs for SSE2 due to 2-op syntax - generally every 2
unpacks need an additional mov).
                     SSE2                    AVX
transpose:           32 unpack               16 unpack
untwiddle:           0                       8 (128bit low/high permutes)
convert:             16 mul + 16 cvt         8 mul + 8 cvt
32->8bit:            12 pack                 8 (128bit extract) + 12 pack

When doing transpose/untwiddle afterwards we get:
convert:             16 mul + 16 cvt         8 mul + 8 cvt
32->8bit:            12 pack                 8 (128bit extract) + 12 pack
transpose/untwiddle  12 unpack               12 unpack

So for SSE2, this drops 20 unpacks (total instruction count 76->56)
whereas for AVX it replaces the 16 256bit unpacks with 8 128bit ones
and drops the 8 lo/hi permutes (in total 60->48). (Albeit to be fair,
the permutes could be dropped even when doing the transpose first,
they are extremely pointless but we'd need to be able to tell
lp_build_conv to reorder the vectors, for AVX2 we're going to need to
be able to tell lp_build_conv about ordering in any case.)

(With different ordering going into conversion, it would be possible
to do 4 unpacks + 4 pshufbs instead of 12 unpacks, but that might not
be better, and not all cpus can do it. Proper AVX2 support should eliminate
the 8 128bit extracts, reduce these 12 packs to 6 and the 12 unpacks to 2
pshufb + 2 permq ideally (+ 2 final 128bit extracts).)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-06 23:13:34 +01:00
Roland Scheidegger
6e7ce1ef55 gallivm: generalize 4x4f->1x16ub special case conversion
This special packing path can be easily extended to handle not just
float->unorm8 but also float->snorm8 and uint32->uint8 and int32->int8
(i.e. all interesting cases for llvmpipe fs backend code).
The packing parts all stay the same (only the last step packing will
be signed->signed instead of signed->unsigned but luckily even sse2 can do
both).
While here also note some bugs with that (we keep the bugs identical to
what we did before on x86, albeit other archs may differ). In particular
float->unorm8 too large values will still get clamped to 0, not 255, and for
float->snorm8 NaNs will end up as -1, not 0 (but we do the clamp against 1.0
there to prevent too large values ending up as -1.0 - this is inconsistent
to unorm8 handling but is what we ended up before, I'm not sure we can get
away without it). This is quite fishy in any case as we depend on
arch-dependent behavior of the iround (my understanding is in fact with
altivec the conversion would actually saturate although I've no idea about
NaNs, so probably wouldn't need to do anything for snorm).
(There are only minimal piglit tests for unorm clamping behavior AFAICT, in
particular nothing seems to test values which are too large to be handled by
the float->int conversion.)
For uint32->uint8 we also do a min against MAX_INT, since the source for
the packs is always signed (again, on x86 - should probably be able to
express these arch-dependent bits better some day).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-06 23:13:34 +01:00
Roland Scheidegger
04480a04b1 llvmpipe: use alpha from already converted color if possible
For rgbx formats, there is no point in doing alpha conversion again (and
with different tranpose even, so llvm can't eliminate it).
Albeit it looks like there's some minimal changes needed in the blend code
(found by code inspection, no test seemed to complain) if we do this -
the blend factors are already sanitized if we have no destination alpha,
however for src_alpha_saturate it looks like it still might make a
difference (note that we forced has_alpha to true before for some formats
and nothing complained, but this seems safer).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-06 23:13:34 +01:00
Roland Scheidegger
53c2d24a24 llvmpipe: use scalar load instead of vectors for small vectors in fs backend
llvm has _huge_ problems trying to load things like <4 x i8> vectors and
stitching such loads together to form 128bit vectors. My understanding
of the problem is that the type legalizer tries to extend that to
really a <4 x i32> vector and not a <16 x i8> vector with the 4 elements
first then followed by padding, so the shuffles for then combining things
together are more or less impossible - you can in fact see the pmovzxd
llvm generates. Pre-4.0 llvm just gives up on it completely and does a 30+
pextrb/pinsrb sequence instead.
It looks like current llvm has fixed this behavior (my guess would be
due to better shuffle combination and load/shuffle folds), but we can
avoid this by just loading as <1 x i32> values, combine that and only
cast at the end. (I suspect it might also work if we'd pad the loaded
vectors immediately before shuffling them together, instead of directly
stitching 2 such vectors together pairwise before combining the pair.
But this _might_ lose the ability to load the values directly into
their right place in the vector with pinsrd.). But using 32bit values
is probably easier for llvm as it will never give it funny ideas how
the vector should look like.
(This is possibly only a problem for 1x8bit formats, since 2x8bit will
end up fetching 64bit hence only two vectors are stitched together,
not 4, but we use the same strategy anyway.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-06 23:13:34 +01:00
Ian Romanick
1472ff3591 i965: Enable several GLES 3.1 extensions on HSW+
The only reason we didn't previously enable this was the dependency on
OpenGL ES 3.1.  These should have been enabled as soon as HSW got
stencil texturing.  We also needed to fixup setting MaxViewports.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-06 12:42:43 -08:00
Ian Romanick
90c51ccf82 i965: Always set MaxViewports and related limits
Since 9d6ca7c3, there should be no performance hit for having
MaxViewports > 1.  Always set this context state.  This eliminates the
need to update this conditional as we add support for OES_viewport_array
on older GPUs.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-06 12:42:43 -08:00
Marek Olšák
b7699ce07c winsys/amdgpu: fix a race condition between fence updates and IB submissions
The CS thread is needed to ensure proper ordering of operations and can't
be disabled (without complicating the code).

Discovered by Nine CSMT, which ended up in a deadlock.

Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák
ece6e1f658 radeonsi: add TC L2 prefetch for shaders and VBO descriptors
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák
a131dacb14 radeonsi: add CP DMA flags for greater control over synchronization
for L2 prefetch

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák
8ac1715d02 radeonsi: cleanly communicate which CP DMA packet is first
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák
2b621c47aa gallium/radeon: add new HUD query num-SDMA-IBs
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák
6b8a371e00 gallium/radeon: rename the num-ctx-flushes query to num-GFX-IBs
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák
5871ebd7f1 radeonsi: add HUD queries for cache flush stats
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák
aac07bb79c radeonsi: don't count fast clears and prefetches into CP DMA stats
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák
3b98a5dc47 radeonsi: don't wait for compute shaders in texture_barrier
it doesn't interact with compute shaders in any way

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák
4b93ba542c radeonsi: assume that a TES without POSITION precedes GS
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák
53648050a5 radeonsi: unduplicate VS color export code
it's exactly the same as the other ones

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák
42920c0fb9 radeonsi: clean up more HAVE_LLVM #ifdefs
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Marek Olšák
a8374c3d22 gallium/radeon: clean up HAVE_LLVM #ifdefs in r600_get_llvm_processor_name
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-06 21:05:48 +01:00
Kenneth Graunke
2138347a45 i965: Properly flush in hsw_pause_transform_feedback().
Fixes a number of transform feedback tests when run with Linux 4.8,
which allows us to use the MI_LOAD_REGISTER_REG command, at which point
we started using this new broken path.

ES3-CTS.functional.transform_feedback.array_element.interleaved.lines.*
and Piglit's arb_transform_feedback2/draw-auto are both fixed by this
patch, for example.

Thanks to Chris Wilson for catching this mistake!

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99030
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-01-06 12:01:53 -08:00
Kenneth Graunke
4295af646f i965: Fix texturing in the vec4 TCS and GS backends.
We were failing to zero m0.2 of the sampler message header for TCS and
GS messages in the simple case.  fs_generator has done this for about
a year now, but we missed it in vec4_generator.

Fixes ES31-CTS.core.texture_cube_map_array.sampling,
GL45-CTS.texture_cube_map_array.sampling, and many
dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler subtests:
- dynamically_uniform.tessellation_control.isampler3d
- dynamically_uniform.tessellation_control.isamplercube
- dynamically_uniform.tessellation_control.sampler2d
- dynamically_uniform.tessellation_control.usamplercube
- dynamically_uniform.tessellation_control.sampler2darray
- dynamically_uniform.tessellation_control.isampler2darray
- dynamically_uniform.tessellation_control.usampler3d
- dynamically_uniform.tessellation_control.usampler2darray
- dynamically_uniform.tessellation_control.usampler2d
- dynamically_uniform.tessellation_control.sampler3d
- dynamically_uniform.tessellation_control.samplercube
- dynamically_uniform.tessellation_control.isampler2d
- uniform.tessellation_control.isampler3d
- uniform.tessellation_control.isamplercube
- uniform.tessellation_control.usampler2d
- uniform.tessellation_control.usampler3d
- uniform.tessellation_control.sampler2darray
- uniform.tessellation_control.isampler2darray
- uniform.tessellation_control.usampler2darray
- uniform.tessellation_control.sampler2d
- uniform.tessellation_control.usamplercube
- uniform.tessellation_control.sampler3d
- uniform.tessellation_control.samplercube
- uniform.tessellation_control.isampler2d

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-06 11:49:53 -08:00
Tim Rowley
c93efb0a4f swr: [rasterizer core] rename OutputMerger functions
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-01-06 10:05:08 -06:00
Tim Rowley
fa7c5e242f swr: [rasterizer core] fix SIMD16 Transpose_16_16
Fix incorrect swizzling in SIMD16 Transpose_16_16 breaking the
two-channel 16-bpc formats like R16G16_FLOAT.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-01-06 10:05:02 -06:00
Tim Rowley
e62b6d2f0f swr: [rasterizer core] fix SIMD16 output merger
Honor the colorHottileEnable mask when accessing colorBuffer pointers.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-01-06 10:04:56 -06:00
Tim Rowley
1a77e0c48d swr: [rasterizer core] fix SIMD16 PackTraits pack() and unpack()
Fix routines for 8-bit and 16-bit formats used by optimized tile store.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-01-06 10:04:50 -06:00
Tim Rowley
bd22c3d411 swr: [rasterizer core] fix SIMD16 transpose functions
Fixed Transpose_16 methods of following formats:

Transpose8_8_8_8
Transpose8_8
Transpose32_32
Transpose16_16_16_16
Transpose16_16_16
Transpose16_16

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-01-06 10:04:41 -06:00
Tim Rowley
e6eede81af swr: [rasterizer core] whitespace adjustments
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-01-06 10:04:28 -06:00
Kenneth Graunke
a4d6f4d954 i965: Don't set EmitNoMainReturn.
A while ago, we stopped using Luca's GLSL IR lower_jumps pass in favor
of nir_lower_returns().  Marek's commit d3cb79e043
put it in do_common_optimization, which resulted in us calling it again.

Dropping the EmitNoMainReturn setting makes us skip that pass again.

Apparently that pass doesn't work properly, because this fixes Piglit's
tests/spec/glsl-1.10/execution/vs-nested-return-sibling-loop.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99287
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-05 23:15:39 -08:00
Eric Anholt
69da8c32c7 vc4: Rewrite T image handling based on calling the LT handler.
The T images are composed of effectively swizzled-around blocks of LT (4x4
utile) images, so we can reduce the t_utile_address() calls by 16x by
calling into the simpler LT loop.

This also adds support for calling down with non-utile-aligned
coordinates, which will be part of lifting the utile alignment requirement
on our callers and avoiding the RMW on non-utile-aligned stores.

Improves 1024x1024 TexSubImage by 2.55014% +/- 1.18584% (n=46)
Improves 1024x1024 GetTexImage by 2.242% +/- 0.880954% (n=32)
2017-01-05 17:19:54 -08:00
Eric Anholt
3a3a0d2d6c vc4: Move the utile_width/height functions to header inlines.
I want these inlined in the callers, particularly with the tiling
changes coming up, but we're not building with lto so some caller
would suffer.
2017-01-05 17:19:54 -08:00
Eric Anholt
6cf9ff8a6c vc4: Make the load/store utile functions static.
They don't have any other callers outside of this file, and I'm hoping
they get inlined soon.
2017-01-05 17:19:54 -08:00
Eric Anholt
e64b1169d3 vc4: Simplify the load/store utile functions.
They now have less of a dependency on the cpp, and don't have to do a
divide.

Hacking up mesa-demos teximage to do only one subtest and not draw
points, I saw 1024x1024 glTexSubImage2D() improve by 4.86939% +/-
1.40408% (n=30) and glGetTexImage() by 2.18978% +/- 0.140268% (n=5).
2017-01-05 17:19:48 -08:00
Eric Anholt
7b8c67b3cc vc4: Reuse a list function to simplify bufmgr code. 2017-01-05 16:23:32 -08:00
Eric Anholt
ebf33e577a vc4: Flush the job early if we're referencing too many BOs.
If we get up toward 256MB (or whatever the CMA area size is),
VC4_GEM_CREATE will start throwing errors.  Even if we don't trigger
that, when we flush the kernel's BO allocation for the CLs or bin
memory may end up throwing an error, at which point our job won't get
rendered at all.

Just flush early (half of maximum CMA size) so that hopefully we never
get to that point.
2017-01-05 16:23:32 -08:00
Timothy Arceri
076ab157ff st/mesa/glsl: move SamplerTargets to gl_program
This will help allow us to simplify the handling of samplers by
storing them in a single location rather than duplicating them in
both gl_linked_shader and gl_program.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-06 11:21:42 +11:00
Timothy Arceri
937523971f st/mesa/glsl: set SamplersUsed directly in gl_program
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-06 11:21:42 +11:00
Timothy Arceri
53a509723f mesa/glsl: set sampler units directly in gl_program
Now that we create gl_program earlier there is no need to mess about
copying things to gl_linked_shader then to gl_program.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-06 11:21:42 +11:00
Timothy Arceri
7cc61cf706 mesa: simplify sampler setting code
There is no need to loop over active samplers the code above this
would have already exited if the sampler was inactive, or errored
if the count was larger than the uniforms array size.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-06 11:21:42 +11:00
Timothy Arceri
4807a83da0 mesa/glsl: set num_textures per stage directly in shader_info
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-06 11:21:42 +11:00
Timothy Arceri
c46a630000 mesa: make _CurrentFragmentProgram a gl_program struct pointer
Making this point to a gl_program struct rather than a gl_shader_program
struct will allow use to later also make the CurrentProgram array hold
gl_program structs which in turn will allow for code simpilifcation.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-06 11:21:42 +11:00
Timothy Arceri
6e3f6097c9 i965: stop passing gl_shader_program to the precompile and codegen functions
We no longer need it.

While we are at it we mark the vs, gs, and wm codegen functions as static.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-06 11:21:42 +11:00
Timothy Arceri
5ceedefd6c mesa/glsl: remove hack to reset sampler units to zero
Now that we have the is_arb_asm flag we can just skip the
initialisation.

V2: remove hack from standalone compiler where it was never
needed since it only compiles glsl shaders.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-06 11:21:42 +11:00
Timothy Arceri
238486884e i965: make use of new is_arb_asm flag
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-06 11:21:42 +11:00
Timothy Arceri
f584f38214 st/mesa/glsl: add new is_arb_asm flag in gl_program
Set the flag via the _mesa_init_gl_program() and NewProgram()
helpers.

In i965 we currently check for the existance of gl_shader_program
to decide if this is an ARB assembly style program or not.

Adding a flag makes the code clearer and will help removes a
dependency on gl_shader_program in the i965 codegen functions.

Also this will allow use to skip initialising sampler units for
linked shaders, we currently memset it to zero again during linking.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-06 11:21:42 +11:00
Timothy Arceri
2784128398 i965: pass gl_program directly to brw_compile_tes()
This is the only thing we use from gl_shader_program so pass it directly.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-06 11:21:41 +11:00
Timothy Arceri
2a4d169735 i965: stop passing gl_shader_program to brw_nir_setup_glsl_uniforms()
We can now just get the data needed from the gl_shader_program_data
pointer in gl_program.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-06 11:21:41 +11:00
Timothy Arceri
d3b2ee6b49 i965: pass gl_program to brw_upload_ubo_surfaces()
There is no need to pass gl_linked_shader anymore.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-06 11:21:41 +11:00
Timothy Arceri
9ca14f583c i965: stop passing gl_shader_program to brw_assign_common_binding_table_offsets()
We now get everything we need directly from gl_program so there is
no need for this.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-06 11:21:41 +11:00
Timothy Arceri
f5bc127b2f st/mesa/glsl/i965: move ShaderStorageBlocks to gl_program
Having it here rather than in gl_linked_shader allows us to simplify
the code.

Also it is error prone to depend on the gl_linked_shader for programs
in current use because a failed linking attempt will free infomation
about the current program. In i965 we could be trying to recompile
a shader variant but may have lost some required fields.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-06 11:21:41 +11:00
Timothy Arceri
f62eb6c7eb st/mesa/glsl/i965: set num_ssbos directly in shader_info
Here we also remove the duplicate field in gl_linked_shader and always
get the value from shader_info instead.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-06 11:21:41 +11:00
Timothy Arceri
0e7eec1ab5 st/mesa/glsl/i965: move per stage UniformBlocks to gl_program
This will help allow us to store pointers to gl_program structs in the
CurrentProgram array resulting in a bunch of code simplifications.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-06 11:21:41 +11:00
Timothy Arceri
b792c38979 st/mesa/glsl/i965: set num_ubos directly in shader_info
This also removes the duplicate field in gl_linked_shader, and
gets num_ubos from shader_info instead.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-06 11:21:41 +11:00
Timothy Arceri
a1da57c19c st/mesa/glsl/i965: move ImageUnits and ImageAccess fields to gl_program
Having it here rather than in gl_linked_shader allows us to simplify
the code.

Also it is error prone to depend on the gl_linked_shader for programs
in current use because a failed linking attempt will free infomation
about the current program. In i965 we could be trying to recompile
a shader variant but may have lost some required fields.

We drop the memset on ImageUnits because gl_program is already
created using rzalloc().

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-06 11:21:40 +11:00
Timothy Arceri
3d2485f011 i965: get InfoLog and LinkStatus via the pointer in gl_program
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-06 11:21:40 +11:00
Timothy Arceri
be9a6a7eb7 i965: get shared_size from shader_info rather than gl_shader_program
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-06 11:21:40 +11:00
Timothy Arceri
234211ec8d i965: stop depending on gl_shader_program for brw_compute_vue_map() params
This removes another dependency on gl_shader_program from the codegen functions,
this will help allow us to use gl_program for the CurrentProgram array rather
than gl_shader_program.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-06 11:21:40 +11:00
Timothy Arceri
6f76ca300b i965: pass gl_program to the brw_*_debug_recompile() functions
Rather then passing gl_shader_program.

The only field use was Name which is the same as the Id field in
gl_program.

For wm and vs we also make the functions static and move them before
the codegen functions.

This change reduces the codegen functions dependency on gl_shader_program.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-06 11:21:40 +11:00
Roland Scheidegger
caf18a8434 gallivm: (trivial) fix typo bug with small AoS format unpacking
Fix typo using wrong (uninitialized) build context introduced by
4634cb5921. (This only affects very rare
small packed formats which have a PIPE_SWIZZLE_0 channel, such as
r4a4, which is never used by mesa/st. Nevertheless it broke lp_test_format.)
2017-01-06 00:46:15 +01:00
Roland Scheidegger
4634cb5921 gallivm: implement aos unpack (to unorm8) for small unorm formats
Using bit replication. This path now resembles something which might make
sense. (The logic was mostly copied from llvmpipe fs backend.)
I am not convinced though it is actually faster than SoA sampling (actually
I'm quite certain it's always a loss with AVX).
With SoA it's just shift/mask/cvt/mul for getting the colors, whereas
there's still roughly 3 shifts, 3 or/and per channel for AoS
(i.e. for SoA it's exactly the same as it would be for a rgba8 format,
whereas the extra effort for AoS is significant). The filtering
might still be faster (albeit with FMA the instruction count gets down
quite a bit there on the SoA float filtering path on new cpus). And those
small unorm formats often don't have an alpha channel (which makes things
worse relatively for AoS path).
(This also fixes a trivial bug in the llvmpipe fs code this was derived
from, albeit it was only relevant for 4-bit channels.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-05 23:59:38 +01:00
Roland Scheidegger
bc86e829a5 gallivm: optimize lp_build_unpack_arith_rgba_aos slightly
This code uses a vector shift which has to be emulated on x86 unless
there's AVX2. Luckily in some cases we can actually avoid the shift
altogether, so do that.
Also make sure we hit the fast lp_build_conv() path when applicable,
albeit that's quite the hack...
That said, this path is taken for AoS sampling for small unorm (smaller
than rgba8) formats, and it is completely hopeless even with those
changes, with or without AVX.
(Probably should have some code similar to the one in the llvmpipe fs
backend code, using bit replication to extend to rgba8888 - rounding
is not quite 100% accurate but if it's good enough there it should be
here as well.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-05 23:59:38 +01:00
Roland Scheidegger
a03a2ac6fd gallivm: use 2 srcs for 32->16bit conversions in lp_bld_conv_auto
If we only feed one source vector at a time, we cannot use pack intrinsics
(as we only have a 64bit destination dst vector). lp_bld_conv_auto is
specifically designed to alter the length and number of destination vectors,
so this works just fine (if we use single source vectors at a time, afterwards
we immediately reassemble the vectors).
For AVX though this isn't really possible, since we expect 128bit output
already for a single 256bit input. (One day we should handle AVX2 which again
would need multiple inputs, however there's the problem that we get different
ordered output there and we don't want to reorder, so would need to be able
to tell build_conv to handle upper and lower halfs independently.)
A similar strategy would probably work for 32->8bit too (if it doesn't hit
the special case) but I'm going to try something different for that...

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-05 23:59:38 +01:00
Roland Scheidegger
db7e786a25 llvmpipe: (trivial) minimally simplify mask construction
simd instruction sets usually have comparisons for equal, not unequal.
So use a different comparison against the mask itself - which also means
we don't need a all-zero as well as a all-one (for the pxor) reg.

Also add code to avoid scalar expansion of i1 values which we definitely
shouldn't do. There's problems with this though with llvm select
interaction, so it's disabled (basically using llvm select instead of
intrinsics may still produce atrocious code, even in cases where we
figured it should not, albeit I think this could probably be fixed
with some better selection of optimization passes, but I have zero
idea there really).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-05 23:59:38 +01:00
Lionel Landwerlin
a8eeb089c0 anv: fix multiple creation with internal failure
The specification section 9.4 says :

   When an application attempts to create many pipelines in a single
   command, it is possible that some subset may fail creation. In that
   case, the corresponding entries in the pPipelines output array will
   be filled with VK_NULL_HANDLE values. If any pipeline fails
   creation (for example, due to out of memory errors), the
   vkCreate*Pipelines commands will return an error code. The
   implementation will attempt to create all pipelines, and only
   return VK_NULL_HANDLE values for those that actually failed.

Fixes :

   dEQP-VK.api.object_management.alloc_callback_fail_multiple.graphics_pipeline
   dEQP-VK.api.object_management.alloc_callback_fail_multiple.compute_pipeline

v2: C is hard let's go shopping (Lionel)

v3: Remove unnecessary condition in for loops (Lionel)

v4: Document why we return on first failure (Eduardo)
    Move i declaration inside for() (Eduardo)

v5: Move array cleanup out of loop (Jason)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-05 21:09:09 +00:00
Tim Rowley
33fa4c99f7 swr: [rasterizer core/common/jitter] gl_double support
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99214
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-01-05 14:10:36 -06:00
Fredrik Höglund
b6670157d7 dri3: Fix MakeCurrent without a default framebuffer
In OpenGL 3.0 and later it is legal to make a context current without
a default framebuffer.

This has been broken since DRI3 support was introduced.

Cc: "13.0 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-05 20:52:01 +01:00
Marek Olšák
e16245b339 radeonsi: turn SDMA IBs into de-facto preambles of GFX IBs
Draw calls no longer flush SDMA IBs. r600_need_dma_space is
responsible for synchronizing execution between both IBs.

Initial buffer clears and fast clears will stay unflushed in the SDMA IB
(up to 64 MB) as long as the GFX IB isn't flushed either.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:24 +01:00
Marek Olšák
cba9d59362 radeonsi: implement SDMA-based buffer clearing for SI
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:24 +01:00
Marek Olšák
29d6a367a6 radeonsi: do all math in bytes in SI DMA code
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:24 +01:00
Marek Olšák
9e1aa81dfe gallium/radeon: prevent SDMA stalls by detecting RAW hazards in need_dma_space
Call r600_dma_emit_wait_idle only when there is a possibility of
a read-after-write hazard. Buffers not yet used by the SDMA IB don't
have to wait.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:24 +01:00
Marek Olšák
3be8336440 gallium/radeon: move unrelated code from dma_emit_wait_idle to need_dma_space
r600_dma_emit_wait_idle is going away in its current form.
The only difference is that the moved code is executed before DMA calls
instead of after them.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:24 +01:00
Marek Olšák
973d7cd90a radeonsi: inline cik_sdma_do_copy_buffer
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:23 +01:00
Marek Olšák
067a3237b9 radeonsi: also wait for SDMA in the clear_buffer CPU fallback
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:23 +01:00
Marek Olšák
f6a1c2d883 radeonsi: simplify r600_resource typecasts in si_clear_buffer
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:23 +01:00
Marek Olšák
a31a92e7ef radeonsi: always use SDMA for big buffer clears and first buffer uses
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:23 +01:00
Marek Olšák
69f489dfa1 radeonsi: use SDMA in rvid_buffer_clear on CIK-VI
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:23 +01:00
Marek Olšák
9a3296bf1c radeonsi: use SDMA for initial clearing of DCC/CMASK/HTILE on CIK-VI
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:23 +01:00
Marek Olšák
d4c0ad4de8 radeonsi: implement SDMA-based buffer clearing for CIK-VI
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:23 +01:00
Marek Olšák
431742dbba gallium/hud: increase the vertex buffer size for text
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:30:00 +01:00
Marek Olšák
6d54cd75a8 gallium/hud: add an option to sort items below graphs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:30:00 +01:00
Marek Olšák
80b8b9c8a4 gallium/hud: add an option to reset the color counter
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:30:00 +01:00
Marek Olšák
a57e071e9e gallium/hud: allow more data sources per pane
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:30:00 +01:00
Marek Olšák
e8bb97ce30 gallium/hud: add an option to rename each data source
useful for radeonsi performance counters

v2: allow specifying both : and =

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:30:00 +01:00
Marek Olšák
d995115b17 gallium: remove TGSI_OPCODE_SUB
It's redundant with the source modifier.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:30:00 +01:00
Marek Olšák
a4ace98a97 gallium: remove TGSI_OPCODE_ABS
It's redundant with the source modifier.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:30:00 +01:00
Axel Davy
09d09b219e st/nine: Remove all usage of ureg_SUB in nine_shader
This is required to drop gallium SUB.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-01-05 18:30:00 +01:00
Axel Davy
67cda68bba st/nine: Remove all usage of ureg_SUB in nine_ff
This is required to remove gallium SUB.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-01-05 18:30:00 +01:00
Axel Davy
caf93f5311 st/nine: Do not map SUB and ABS to their gallium equivalent.
This is required for gallium SUB and ABS to be removed.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-01-05 18:30:00 +01:00
Eric Anholt
dbe0dd11b9 configure: Fix another bashism.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-05 09:24:28 -08:00
Marek Olšák
3477f67057 st/mesa: fix a segfault when prog->sh.data is NULL
Broken by:
   st/mesa: get Version from gl_program rather than gl_shader_program

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-01-05 17:11:03 +01:00
Emil Velikov
37f9262064 docs: add news item and link release notes for 13.0.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-05 16:07:53 +00:00
Emil Velikov
934792b846 docs: add sha256 checksums for 13.0.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit c8ece92ded)
2017-01-05 16:07:53 +00:00
Emil Velikov
5cd9660302 docs: add release notes for 13.0.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit bec04114d2)
2017-01-05 16:07:53 +00:00
Nayan Deshmukh
ee4b4791ab st/va: fix incorrect argument in vl_compositor_cleanup
This fixes the mistake introduced in commit
b6737a8bcd

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-01-05 16:40:06 +01:00
Tim Rowley
68ddcc6c28 swr: remove unneeded llvm version check
Old test caused breakage with llvm-svn (4.0.0svn), and not needed as
the minimum required llvm version for swr is 3.6.

Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
2017-01-05 07:31:19 -06:00
George Kyriazis
36ad826548 swr: fix windows build break
wrap lp_bld_type.h around extern "C".
Windows decorates global variables, so when used from .cpp files, need
to use an undecorated version.

Also, removed related and unneeded code from swr_screen.cpp

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-01-05 07:30:18 -06:00
Marek Olšák
3753dc896d radeonsi: update clip_regs if clip_disable changes to fix a hang
This seems to fix the GPU hangs caused by:

commit ed3190b3f3
Author: Marek Olšák <marek.olsak@amd.com>
Date:   Sun Nov 13 18:41:43 2016 +0100

    radeonsi: don't export ClipVertex and ClipDistance[] if clipping is disabled

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99219

Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-01-05 14:01:18 +01:00
Marek Olšák
c7affbf687 st/mesa: enable GLSLOptimizeConservatively for drivers that want it
GLSL compilation now takes 24% less time with the Gallium noop driver.
I used my shader-db for the measurement. The difference for the whole
radeonsi driver can be ~10%.

The generated TGSI is mostly the same. For example, the compilation success
rate with a TGSI->GCN bytecode converter without any optimizations is
the same. Note that glsl_to_tgsi does its own copy propagation and simple
register allocation.

shader-db GCN report:
- Talos spills fewer SGPRs.
- DOTA 2 spills more SGPRs.
- The average shader-db score is better, but it's just due to randomness.

29045 shaders in 17564 tests
Totals:
SGPRS: 1325929 -> 1325017 (-0.07 %)
VGPRS: 1010808 -> 1010172 (-0.06 %)
Spilled SGPRs: 1432 -> 1399 (-2.30 %)
Spilled VGPRs: 93 -> 92 (-1.08 %)
Private memory VGPRs: 688 -> 688 (0.00 %)
Scratch size: 2540 -> 2484 (-2.20 %) dwords per thread
Code Size: 39336732 -> 39342936 (0.02 %) bytes
Max Waves: 217937 -> 217969 (0.01 %)

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-05 13:07:12 +01:00
Marek Olšák
96fe8834f5 glsl_to_tgsi: do fewer optimizations with GLSLOptimizeConservatively
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-05 13:07:12 +01:00
Marek Olšák
0a5018c1a4 mesa: add gl_constants::GLSLOptimizeConservatively
to reduce the amount of GLSL optimizations for drivers that can do better.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-05 13:07:12 +01:00
Marek Olšák
e51baeb6c1 gallium: add PIPE_CAP_GLSL_OPTIMIZE_CONSERVATIVELY
Drivers with good compilers don't need aggressive optimizations before TGSI.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-05 13:07:12 +01:00
Marek Olšák
d3cb79e043 glsl: run do_lower_jumps properly in do_common_optimizations
so that backends don't have to run it manually

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-05 13:07:12 +01:00
Kenneth Graunke
7c6b714cd0 i965: Print VS output VUE map in Vulkan too.
We need to move this to the shared layer.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-05 01:55:27 -08:00
Kenneth Graunke
480d6c1653 i965: Fix last slot calculations
If the VUE map has slots at the end which the shader does not write,
then we'd "flush" (constructing an URB write) on the last output it
actually wrote.  Then, we'd construct another SEND with EOT, but with
no actual payload data.  That's not legal.

For example, SSO programs have clip distance slots allocated no matter
what, but the shader may not write them.  If it doesn't write any user
defined varyings, then the clip distance slots will be the last ones.

Found while debugging
dEQP-VK.tessellation.shader_input_output.gl_position_vs_to_tcs_to_tes

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-05 01:54:52 -08:00
Iago Toral Quiroga
8dc92a5613 docs: Mark GL_ARB_gpu_shader_fp64 and OpenGL 4.0 as done for i965/hsw+
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-05 09:34:36 +01:00
Iago Toral Quiroga
580c503ca2 docs: add GL_ARB_gpu_shader_fp64 and OpenGL 4.0 support for Intel Haswell.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-01-05 09:34:14 +01:00
Iago Toral Quiroga
a98f2e53e1 i965: add a kernel_features bitfield to intel screen
We can use this to track various features that may or may not be supported
by the hw / kernel. Currently, we usually do this by checking the generation
and supported command parser versions in various places thoughtout the driver
code. With this patch, we centralize all these checks in just once place at
screen creation time, then we just query the bitfield wherever we need to
check if a particular feature is supported.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-05 08:43:46 +01:00
Iago Toral Quiroga
e3123c8ca2 i965/gen7: Enable OpenGL 4.0 in Haswell when supported
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-05 08:43:46 +01:00
Iago Toral Quiroga
1f1b8def48 i965: get rid of brw->can_do_pipelined_register_writes
Instead, check the screen field directly.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-05 08:43:46 +01:00
Chris Wilson
02a44484f0 i965: Move the pipelined test for SO register access to the screen
Moving the test to the screen places it alongside the other global HW
feature tests that want to be shared between contexts.

Also, we need to know if we support pipelined register writes at
screen creation time so that we can tell if we can expose OpenGL 4.0
in gen7.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-05 08:43:46 +01:00
Samuel Iglesias Gonsálvez
ab1ec7de93 i965/disasm: remove printing hstride and width in align16 DF source regions
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-05 07:29:23 +01:00
Samuel Iglesias Gonsálvez
301fdfd838 vec4: use DIM instruction when loading DF immediates in HSW
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-05 07:29:13 +01:00
Carl Worth
3fbdac28d5 glcpp: Remove illegal characters from tests
Some of the existing tests were using '@' and '"' incidentally within the test
body. Neither of these characters are actually legal for GLSL. And since we
are planning to start generating errors for illegal characters, we need to
first make the test suite clean.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-04 14:40:48 -08:00
Carl Worth
5363518705 glcpp: Exhaustively test all legal characters in GLSL
Here, each legal character (as defined by GLSL Language Specification version
4.30.6, section 3.1) appears at least once in the input file. Obviously,
characters with special meaning (like '#' and '\') aren't treated exhaustively
with respect to all their possible uses. We have many other tests for that.

Here, we're simply ensuring that the test suite sees every legal character at
least once.

v2 (by Ken): Fix expectations, move to src/compiler, renumber tests.

   Carl's .expected:            Updated .expected:

   ..                           ..

   . .                          . .
   . .                          . .
   . .                          . .
   . .                          . .
   .                            ..
   .                            .
   .                            .
   .

(For some reason, the original test expected ".." to produce two lines.
glcpp, cpp, and mcpp all follow my updated behavior, so I believe it to
be correct.)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-04 14:40:48 -08:00
Carl Worth
16b480547f glcpp: Allow vertical tab and form feed characters in GLSL
Of course, these aren't really useful for anything, but the GLSL language
specification does allow them:

	The source character set used for the OpenGL shading languages,
	outside of comments, is a subset of UTF-8. It includes the following
	characters:
	...

	White space: the space character, horizontal tab, vertical tab, form
	feed, carriage-return, and line- feed.

	[GLSL Language Specification 4.30.6, section 3.1]

So treat vertical tab ('\v' or ^K) and form-feed ('\f' or ^L) as horizontal
space characters.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-04 14:40:48 -08:00
Carl Worth
6c8762400d glcpp: Add testing for no space between macro name and replacement list
GCC's preprocessor accepts a macro definition where there is no space between
the macro's identifier name and the replacementlist. (GCC does emit a "missing
space" warning that we don't, but that's fine.)

This is an exhaustive test that verifies that all legal GLSL characters that
could possibly be interpreted as separating the macro name from the
replacement list are interpreted as such. So the testing here includes all
valid GLSL symbols except for:

  * Characters that can be part of an identifier (a-z, A-Z, 0-9, _)

  * Backslash, (allowed only as line continuation)

  * Hash, (allowed only to introduce pre-processor directive, or as part of a
           paste operator in a replacement list---but not as first token of
           replacement list)

  * Space characters (since the point of the testing is to have missing space)

  * Left parenthesis (which would indicate a function-like macro)

v2 (Ken): Move to src/compiler, renumber tests.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-04 14:40:48 -08:00
Lionel Landwerlin
36b5f1d200 spirv: compute push constant access offset & range
v2: Move relative push constant relative offset computation down to
    _vtn_load_store_tail() (Jason)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-04 21:14:17 +00:00
Lionel Landwerlin
0089085038 spirv: move block_size() definition
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-04 21:08:44 +00:00
Marek Olšák
89975e29d3 va: call texture_get_handle while the mutex is being held
The context may be used by texture_get_handle.

Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: 13.0 <mesa-stable@lists.freedesktop.org>
2017-01-04 17:27:41 +01:00
Marek Olšák
dbba4e03b1 vdpau: call texture_get_handle while the mutex is being held
The context may be used by texture_get_handle.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99158

Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: 13.0 <mesa-stable@lists.freedesktop.org>
2017-01-04 17:27:41 +01:00
Samuel Pitoiset
7d48a84b16 radeonsi: capitalize VM hex addr when dumping buffer list
Useful when debugging with R600_DEBUG=vm,check_vm to match
addr in both outputs.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-04 10:14:22 +01:00
Tapani Pälli
0f991e8434 i965: remove unused brwInitVtbl declaration
function was removed by b3360d23ac

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-04 09:46:34 +02:00
Iago Toral Quiroga
1a8f2629e6 i965: remove brw_context dependency from intel_batchbuffer_init()
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-04 08:14:30 +01:00
Iago Toral Quiroga
ba30e0ca20 i965: make intel_batchbuffer_free() take a batchbuffer as argument
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-04 08:14:26 +01:00
Iago Toral Quiroga
1daa31d8a8 i965: make intel_batchbuffer_emit_dword() take a batchbuffer as argument
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-04 08:14:21 +01:00
Iago Toral Quiroga
f03bac1fc7 i965: Make intel_bachbuffer_reloc() take a batchbuffer argument
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-04 08:14:04 +01:00
Timothy Arceri
4b7dfd8812 nir: fix loop iteration count calculation for floats
Fixes performance regression in SynMark PSPom caused by loops with float
counters not always unrolling.

For example:

   for (float i = 0.02; i < 0.9; i += 0.11)
      ...

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-04 14:48:36 +11:00
Edmondo Tommasina
abcaba497d gallium/hud: add a path separator between dump directory and filename
It's more user friendly and it avoids to write files in unexpected
places.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-01-03 22:08:41 +01:00
Heiko Przybyl
e933246013 r600/sb: Fix loop optimization related hangs on eg
Make sure unused ops and their references are removed, prior to entering
the GCM (global code motion) pass, to stop GCM from breaking the loop
logic and thus hanging the GPU.

Turns out, that sb has problems with loops and node optimizations
regarding associative folding:

- the global code motion (gcm) pass moves ops up a loop level/basic block
until they've fulfilled their total usage count
- if there are ops folded into others, the usage count won't be
fulfilled and thus the op moved way up to the top
- within GCM the op would be visited and their deps would be moved
alongside it, to fulfill the src constaints
- in a loop, an unused op is moved out of the loop and GCM would move
the src value ops up as well
- now here arises the problem: if the loop counter is one of the src
values it would get moved up as well, the loop break condition would
never get hit and the shader turn into an endless loop, resulting in the
GPU hanging and being reset

A reduced (albeit nonsense) piglit example would be:

[require]
GLSL >= 1.20

[fragment shader]

uniform int SIZE;
uniform vec4 lights[512];

void main()
{
    float x = 0;
    for(int i = 0; i < SIZE; i++)
        x += lights[2*i+1].x;
}

[test]
uniform int SIZE 1
draw rect -1 -1 2 2

Which gets optimized to:

===== SHADER #12 OPT ================================== PS/BARTS/EVERGREEN =====
===== 42 dw ===== 1 gprs ===== 2 stack =========================================
ALU 3 @24
     1      y: MOV                R0.y,  0
            t: MULLO_UINT         R0.w,  [0x00000002 2.8026e-45].x, R0.z

LOOP_START_DX10 @22
PUSH @6
ALU 1 @30 KC0[CB0:0-15]
     2 M    x: PRED_SETGE_INT     __.x,  R0.z, KC0[0].x
JUMP @14 POP:1
LOOP_BREAK @20
POP @14 POP:1
ALU 2 @32
     3      x: ADD_INT            R0.x,  R0.w, [0x00000002 2.8026e-45].x

TEX 1 @36
               VFETCH             R0.x___, R0.x,   RID:0   MFC:16 UCF:0 FMT[..]
ALU 1 @40
     4      y: ADD                R0.y,  R0.y, R0.x
LOOP_END @4
EXPORT_DONE        PIXEL 0     R0.____  EOP
===== SHADER_END ===============================================================

Notice R0.z being the loop counter/break condition relevant register
and being never incremented at all. Also some of the loop content
has been moved out of it, to fulfill the requirements for the one unused
op.

With a debug build of mesa this would produce an error like
error at : PRED_SETGE_INT     __, __, EM.2,    R1.x.2||FP@R0.z, C0.x
  : operand value R1.x.2||FP@R0.z was not previously written to its gpr
and the compilation would fail due to this. On a release build it gets
passed to the GPU.

When using this patch, the loop remains intact:

===== SHADER #12 OPT ================================== PS/BARTS/EVERGREEN =====
===== 48 dw ===== 1 gprs ===== 2 stack =========================================
ALU 2 @24
     1      y: MOV                R0.y,  0
            z: MOV                R0.z,  0
LOOP_START_DX10 @22
PUSH @6
ALU 1 @28 KC0[CB0:0-15]
     2 M    x: PRED_SETGE_INT     __.x,  R0.z, KC0[0].x
JUMP @14 POP:1
LOOP_BREAK @20
POP @14 POP:1
ALU 4 @30
     3      t: MULLO_UINT         T0.x,  [0x00000002 2.8026e-45].x, R0.z

     4      x: ADD_INT            R0.x,  T0.x, [0x00000002 2.8026e-45].x

TEX 1 @40
               VFETCH             R0.x___, R0.x,   RID:0   MFC:16 UCF:0 FMT[..]
ALU 2 @44
     5      y: ADD                R0.y,  R0.y, R0.x
            z: ADD_INT            R0.z,  R0.z, 1
LOOP_END @4
EXPORT_DONE        PIXEL 0     R0.____  EOP
===== SHADER_END ===============================================================

Piglit: ./piglit summary console -d results/*_gpu_noglx
        name: unpatched_gpu_noglx patched_gpu_noglx
        ----  ------------------- -----------------
        pass:               18016             18021
        fail:                 748               743
        crash:                  7                 7
        skip:                1124              1124
        timeout:                0                 0
        warn:                  13                13
        incomplete:             0                 0
        dmesg-warn:             0                 0
        dmesg-fail:             0                 0
        changes:                0                 5
        fixes:                  0                 5
        regressions:            0                 0
        total:              19908             19908

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94900
Tested-by: Heiko Przybyl <lil_tux@web.de>
Tested-on: Barts PRO HD6850
Signed-off-by: Heiko Przybyl <lil_tux@web.de>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-01-03 21:58:52 +01:00
Eric Anholt
dd12119706 editorconfig: Fix up the tab rendering width.
Our editorconfig file looked sensible, saying that we wanted to indent
with spaces and use 3/4/whatever space indentation.  However, the spec has
this little surprise:

    "tab_width: a whole number defining the number of columns used to
     represent a tab character. This defaults to the value of indent_size
     and doesn't usually need to be specified."

so once my editor started respecting editorconfig, the files that have
tabs left in them started getting rendered wrong, showing up like this in
brw_program.c:

   case GL_COMPUTE_PROGRAM_NV: {
      struct brw_program *prog = rzalloc(NULL, struct brw_program);
      if (prog) {
    prog->id = get_new_program_id(brw->screen);

    return _mesa_init_gl_program(&prog->program, target, id);
      }
      else
    return NULL;
   }

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-01-03 10:38:53 -08:00
Chad Versace
c4b87f129e meta: Disable dithering during glGenerateMipmap
Fixes tests 'dEQP-GLES3.functional.texture.mipmap.*.generate.rgba5551*' on
Intel Broadwell 0x1616.

The GL 4.5 spec describes the algorithm of glGenerateMipmap as:

    The contents of the derived images are computed by repeated, filtered
    reduction of the level base image.  [...] No particular filter algorithm is
    required, though a box filter is recommended as the default filter.

Consider a texture for which all pixels are identical at level 0.
From the spec's description above, one may reasonably assume that the "filtered
reduction" of level 0 produces a new miplevel for which again all pixels are
identical. For any 2x2 subspan of identical pixels, it is difficult to see how
the "filtered reduction" of that subspan can produce a pixel that differs from
the source pixels.

Dithering during _mesa_meta_GenerateMipmap() violated that reasonable
assumption.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99210
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
2017-01-03 08:22:23 -08:00
Romain Failliot
8d8ed437a5 doc/features.txt: update for freedreno
I lost track of who created initial patch (Ilia?).. Romain rebased it.
I pushed it.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95460
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-01-03 10:46:13 -05:00
Robert Bragg
96c9ec9c27 i965: Remove perf monitor/query backend
In its current state the unified i965 backend for
AMD_performance_monitor and INTEL_performance_query isn't able to report
meaningful Observation Architecture metrics since we haven't so far had
the necessary kernel support to fully configure the OA unit, nor the
corresponding support for normalizing the counters into a form that can
be usefully interpreted by application developers (as opposed to raw
values that may, for example, scale by the number of EUs there are).

So that we can focus on implementing just one of these extensions fully
and since we anticipate some significant backend changes as we look to
use a new kernel interface to configure the OA unit, this patch removes
the current backend. This will simplify our ability to update the
frontend infrastructure and backend interface before updating our
support for performance counters.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-03 07:27:07 -08:00
Christian König
ac57bcda1e vl/zscan: fix "Fix trivial sign compare warnings"
The variable actually needs to be signed, otherwise converting it to a
float doesn't work as expected.

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=98914
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Fixes: 1fb4179f92 ("vl: Fix trivial sign compare warnings")
2017-01-03 12:18:14 +01:00
Nayan Deshmukh
b6737a8bcd st/va: error handling
handle the cases when vl_compositor_set_csc_matrix(),
vl_compositor_init_state() and vl_compositor_init() fail

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-01-03 12:02:15 +01:00
Nayan Deshmukh
29aad4e8bd st/vdpau: error handling
handle the cases when vl_compositor_set_csc_matrix(),
vl_compositor_init_state() and vl_compositor_init() fail

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-01-03 12:02:15 +01:00
Nayan Deshmukh
cee5af93ee vl/compositor: implement error handling
pipe_buffer_map and pipe_buffer_create may return NULL

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-01-03 12:02:15 +01:00
Iago Toral Quiroga
1a83e9892d i965/vec4: enable ARB_gpu_shader_fp64 for Haswell
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
6c350e34ee i965/vec4: adjust spilling costs for 64-bit registers.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
3cd38b6898 i965/vec4: prevent spilling of DOUBLE_TO_SINGLE destination
FROM_DOUBLE opcodes are setup so that they use a dst register
with a size of 2 even if they only produce a single-precison
result (this is so that the opcode can use the larger register to
produce a 64-bit aligned intermediary result as required by the
hardware during the conversion process). This creates a problem for
spilling though, because when we attempt to emit a spill for the
dst we see a 32-bit destination and emit a scratch write that
allocates a single spill register, making the intermediary writes
go beyond the size of the allocation.

Prevent this by avoiding to spill the destination register of these
opcodes.

Alternatively, we can avoid this by splitting the opcode in two: one
that produces a 64-bit aligned result and one that takes the 64-bit
aligned result as input and produces a 32-bit result from it.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
8843c43f7e i965/vec4: avoid spilling of registers that mix 32-bit and 64-bit access
When 64-bit registers are (un)spilled, we need to execute data shuffling
code before writing to or after reading from memory. If we have instructions
that operate on 64-bit data via 32-bit instructions, (un)spills for the
register produced by 32-bit instructions will not do data shuffling at all
(because we only see a normal 32-bit istruction seemingly operating on
32-bit data). This means that subsequent reads with that register using DF
access will unshuffle data read from memory that was never adequately
shuffled when it was written.

Fixing this would require to identify which 32-bit instructions write
64-bit data and emit spill instructions only when the full 64-bit
data has been written (by multiple 32-bit instructions writing to different
offsets of the same register) and always emit 64-bit unspills whenever
64-bit data is read, even when the instruction uses a 32-bit type to read
from them.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
82c69426a5 i965/vec4: support basic spilling of 64-bit registers
The current spilling code can't spill vgrf allocations larger than 1
but SIMD4x2 doubles require 2 vgrfs, so we need to permit this case (which
is handled properly for DF data types by emitting 2 scratch messages and
doing data shuffling). We accomplish this by not auto-disabling spilling
for vgrf allocations with a size of 2, and then disable spilling on any
register with an offset != 0B (which indicates array access).

Disable spilling of partial DF reads/writes because these don't read/write
data for both logical threads and our scratch messages for 64-bit data
need data for both threads to be present.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
c762809e49 i965/vec4: run scalarize_df() after spilling
Spilling of 64-bit data requires data shuffling for the corresponding
scratch read/write messages. This produces unsupported swizzle regions
and writemasks that we need to scalarize.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
73610384a8 i965/vec4: prevent src/dst hazards during 64-bit register allocation
8-wide compressed DF operations are executed as two separate 4-wide
DF operations. In that scenario, we have to be careful when we allocate
register space for their operands to prevent the case where the first
half of the instruction overwrites the source of the second half.

To do this we mark compressed instructions as having hazards to make
sure that ther register allocators assigns a register regions for the
destination that does not overlap with the region assigned for any
of its source operands.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
2b57adad00 i965/vec4/scalarize_df: support more swizzles via vstride=0
By exploiting gen7's hardware decompression bug with vstride=0 we gain the
capacity to support additional swizzle combinations.

This also fixes ZW writes from X/Y channels like in:

mov r2.z:df r0.xxxx:df

Because DF regions use 2-wide rows with a vstride of 2, the region generated
for the source would be r0<2,2,1>.xyxy:DF, which is equivalent to r0.xxzz, so
we end up writing r0.z in r2.z instead of r0.x. Using a vertical stride of 0
in these cases we get to replicate the XX swizzle and write what we want.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
c3edacaa28 i965/vec4/scalarize_df: do not scalarize swizzles that we can support natively
Certain swizzles like XYZW can be supported by translating only the first two
64-bit swizzle channels to 32-bit channels. This happens with swizzles such
that the first two logical components, when translated to 32-bit channels and
replicated across the second dvec2 row, select the same channels specified by
the 3rd and 4th logical swizzle components.

Notice that this opens up the possibility that some instructions are not
scalarized and can end up with XY or ZW 32-bit writemasks. Make sure we always
scalarize in such cases.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
2f0bc54e2b i965/vec4: split instructions that read 64-bit interleaved attributes
Stages that use interleaved attributes generate regions with a vstride=0
that can hit the gen7 hardware decompression bug.

v2:
- Make static the function and fix indent (Matt)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
0579c85e5c i965/vec4: dump subnr for FIXED_GRF
This came in handy when debugging the payload setup for Tess Eval,
since it prints correct subnr for attributes that can be loaded
in the second half of a register.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
8e92b40203 i965/vec4/tes: consider register offsets during attribute setup
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
49d4d0268b i965/vec4/tes: fix setup_payload() for 64bit data types
Use a width of 2 with 64-bit attributes.

Also, if we have a dvec3/4 attribute that gets split across two registers
such that components XY are stored in the second half of a register and
components ZW are stored in the first half of the next, we need to fix
regioning for any instruction that reads components Z/W of the attribute.
Notice this also means that we can't support sources that read cross-dvec2
swizzles (like XZ for example).

v2: don't assert that we have a single channel swizzle in the case that we
    have to fix up Z/W access on the first half of the next register. We
    can handle any swizzle that does not cross dvec2 boundaries, which
    the double scalarization pass should have prevented anyway.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
183cd8ab94 i965/vec4/tes: fix input loading for 64bit data types
v2: use byte_offset() instead of offset()

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
3e294ab893 i965/vec4/tcs: fix outputs for 64-bit data
v2: use byte_offset() instead of offset()

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
639e92ea3c i965/vec4/tcs: fix input loading for 64-bit data
v2: use byte_offset() instead of offset()

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Samuel Iglesias Gonsálvez
74fd0c590b i965/vec4/gs: fix input loading for 64bit data
v2 (Iago):
   - Adapt 64-bit path to component packing changes.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
b76f2206f5 i965/vec4: fix store output for 64-bit types
We need to shuffle the data before it is written to the URB. Also,
dvec3/4 need two vec4 slots.

v2: use byte_offset() instead of offset().

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
5fe8d567d8 i965/vec4: fix attribute setup for doubles
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
6a01259d8a i965/vec4: fix indentation in lower_attributes_to_hw_regs()
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
ae400e38d9 i965/vec4: make emit_pull_constant_load support 64-bit loads
This way callers don't need to know about 64-bit particularities and
we reuse some code.

v2:
  - use byte_offset() instead of offset()
  - only mark the surface as used once

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
df6e3aa6ae i965/vec4: fix move_push_constants_to_pull_constants() for 64-bit data
v2: adapt to changes in offset()

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
eee2c0d785 i965/vec4: fix indentation in move_push_constants_to_pull_constants()
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
10694be522 i965/vec4: fix move_uniform_array_access_to_pull_constant() for 64-bit data
v2: adapt to changes in offset()

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
52fb22b646 i965/vec4: fix scratch writes for 64bit data
Mostly the same stuff as usual: we ned to shuffle the data before we
write and we need to emit two 32-bit write messages (with appropriate
32-bit writemask channels set) for a full dvec4 scratch write.

v2: use byte_offset() instead of offset().

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
dcc36f8b29 i965/vec4: fix scratch reads for 64bit data
v2: Setup for a 64-bit scratch read by checking the type size of the
    correct register

v3: Use byte_offset() instead of offset()

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
e4d9ab609f i965/vec4: fix scratch offset for 64bit data
A vec4 is 16 bytes and a dvec4 is 32 bytes so for doubles we have
to multiply the reladdr by 2. The reg_offset part is in units of 16
bytes and is used to select the low/high 16-byte chunk of a full
dvec4, so we don't want to multiply that part of the address.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
07bc6a35d3 i965/vec4: do not split scratch read/write opcodes
64-bit scratch read/writes require to shuffle data around so we need
to have access to the full 64-bit data. We will do the right thing
for these when we emit the messages.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
2a857104e4 i965/vec4: Do not use DepCtrl with 64-bit instructions
The BDW PRM says that it is not supported, but it seems that gen7 is also
affected, since doing DepCtrl on double-float instructions leads to
GPU hangs in some cases, which is probably not surprising knowing that
this is not supported in new hardware iterations. The SKL PRMs do not
mention this restriction, so it is probably fine.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
506154f704 i965/vec4: extend the DWORD multiply DepCtrl restriction to all gen8 platforms
v2:
   - Add Broxton as Intel's internal PRMs says that it is needed (Matt).

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Samuel Iglesias Gonsálvez
b9cd3f5b49 i965/vec4: don't copy propagate misaligned registers
This means we would copy propagate partial reads or writes and that can affect
the result.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
93eae0d2a4 i965/vec4: don't propagate single-precision uniforms into 4-wide instructions
Otherwise we end up producing code that violates the register region
restriction that says that when execsize == width and hstride != 0
the vstride can't be 0.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
6637312847 i965/vec4: Prevent copy propagation from violating pre-gen8 restrictions
In gen < 8 instructions that write more than one register need to read
more than one register too. Make sure we don't break that restriction
by copy propagating from a uniform.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
70cc6b0a02 i965/vec4: prevent copy-propagation from values with a different type size
Because the meaning of the swizzles and writemasks involved is different,
so replacing the source would lead to different semantics.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Connor Abbott
0fec5e9867 i965/vec4: don't constant propagate 64-bit immediates
v2: Also check if the instruction source target is 64-bit. (Samuel)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
8eea41e75d i965/vec4: Fix SSBO stores for 64-bit data
In this case we need to shuffle the 64-bit data before we write it
to memory, source from reg_offset + 1 to write components Z and W
and consider that each DF channel is twice as big.

v2: use byte_offset() instead of offset().

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
9998d55afd i965/vec4: Fix SSBO loads for 64-bit data
Same requirements as for UBO loads.

v2:
  - use byte_offset() instead of offset() (Iago)
  - keep the const. offset as an immediate like the original code did (Juan)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
4486c90aae i965/vec4: Fix UBO loads for 64-bit data
We need to emit 2 32-bit load messages to load a full dvec4. If only
1 or 2 double components are needed dead-code-elimination will remove
the second one.

We also need to shuffle the result of the 32-bit messages to form
valid 64-bit SIMD4x2 data.

v2:
 - use byte_offset() instead of offset() (Iago)
 - keep the const. offset as an immediate like the original code did (Juan)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
d8e123cc5d i965/vec4: Add a shuffle_64bit_data helper
SIMD4x2 64bit data is stored in register space like this:

r0.0:DF  x0 y0 z0 w0
r1.0:DF  x1 y1 z1 w1

When we need to write data such as this to memory using 32-bit write
messages we need to shuffle it in this fashion:

r0.0:DF  x0 y0 x1 y1
r0.1:DF  z0 w0 z1 w1

and emit two 32-bit write messages, one for r0.0 at base_offset
and another one for r0.1 at base_offset+16.

We also need to do the inverse operation when we read using 32-bit messages
to produce valid SIMD4x2 64bit data from the data read. We can achieve this
by aplying the exact same shuffling to the data read, although we need to
apply different channel enables since the layout of the data is reversed.

This helper implements the data shuffling logic and we will use it in
various places where we read and write 64bit data from/to memory.

v2 (Curro):
  - Use the writemask helper and don't assert on the original writemask
    being XYZW.
  - Use the Vec4 IR builder to simplify the implementation.

v3 (Iago):
  - Use byte_offset() instead of offset().

v3:
  - Fix typo (Matt)
  - Clarify the example and fix indention (Matt).

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
017c8df35b i965/vec4: support multiple dispatch widths and groups in the IR builder.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
b3a7d0ee9d i965/vec4: Lower 64-bit MAD
The previous patch made sure that we do not generate MAD instructions
for any NIR's 64-bit ffma, but there is nothing preventing i965 from
producing MAD instructions as a result of lowerings or optimization
passes. This patch makes sure that any 64-bit MAD produced inside the
driver after translating from NIR is also converted to MUL+ADD before
we generate code.

v2:
  - Use a copy constructor to copy all relevant instruction fields from
    the original mad into the add and mul instructions

v3:
  - Rename the lowering and fix commit log (Matt)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
82e9dda8bf i965/vec4/nir: do not emit 64-bit MAD
RepCtrl=1 does not work with 64-bit operands so we need to use RepCtrl=0.

In that situation, the regioning generated for the sources seems to be
equivalent to <4,4,1>:DF, so it will only work for components XY, which
means that we have to move any other swizzle to a temporary so that we can
source from channel X (or Y) in MAD and we also need to split the instruction
(we are already scalarizing DF instructions but there is room for
improvement and with MAD would be more restricted in that area)

Also, it seems that MAD operations like this only write proper output for
channels X and Y, so writes to Z and W also need to be done to a temporary
using channels X/Y and then move that to channels Z or W of the actual dst.

As a result the code we produce for native 64-bit MAD instructions is rather
bad, and much worse than just emitting MUL+ADD. For reference, a simple case
of a fully scalarized dvec4 MAD operation requires 15 instructions if we use
native MAD and 8 instructions if we emit ADD+MUL instead. There are some
improvements that we can do to the emission of MAD that might bring the
instruction count down in some cases, but it comes at the expense of a more
complex implementation so it does not seem worth it, at least initially.

This patch makes translation of NIR's 64-bit FMMA instructions produce MUL+ADD
instead of MAD. Currently, there is nothing else in the vec4 backend that emits
MAD instructions, so this is sufficient and it helps optimization passes see
MUL+ADD from the get go.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
83dcd14602 i965/vec4: Skip swizzle to subnr in 3src instructions with DF operands
We make scalar sources in 3src instructions use subnr instead of
swizzles because they don't really use swizzles.

With doubles it is more complicated because we use vstride=0 in
more scenarios in which they don't produce scalar regions. Also
RepCtrl=1 is not allowed with 64-bit operands, so we should avoid
this.

v2: Fix typo (Matt)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
49be3abbe7 i965/vec4: fix indentation in pack_uniform_registers
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
bdf5498c6b i965/vec4: fix pack_uniform_registers for doubles
We need to consider the fact that dvec3/4 require two vec4 slots.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
23278a75ce i965/vec4: teach register coalescing about 64-bit
Specifically, at least for now, we don't want to deal with the fact that
channel sizes for fp64 instructions are twice the size, so prevent
coalescing from instructions with a different type size.

Also, we should check that if we are coalescing a register from another
MOV we should be writing the same amount of data in both operations, otherwise
we end up wiring more or less than the original instruction. This can happen,
for example, when we have split fp64 MOVs with an exec size of 4 that only
write one register each and then a MOV with exec size of 8 that reads both.
We want to avoid the pass to think that it can coalesce from the first split
MOV alone. Ideally we would like the pass to see that it can coalesce from both
split MOVs instead, but for now we keep it simple.

Finally, the pass doesn't support coalescing of multiple registers but in the
case of normal SIMD4x2 double-precision instructions they naturally write two
registers (one per vertex) and there is no reason why we should not allow
coalescing in this case. Change the restriction to bail if we see instructions
that write more than 8 channels, where the channels can be 32-bit or 64-bit.

v2:
 - Make sure that scan_inst and inst write the same amount of data.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
7c5bf597ef i965/disasm: fix subreg for dst in Align16 mode
There is a single bit for this, so it is a binary 0 or 1 meaning
offset 0B or 16B respectively.

v2:
  - Since brw_inst_dst_da16_subreg_nr() is known to be 1, remove it
    from the expression (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
ac5a06ff83 i965/vec4: implement access to DF source components Z/W
The general idea is that with 32-bit swizzles we cannot address DF
components Z/W directly, so instead we select the region that starts
at the the 16B offset into the register and use X/Y swizzles.

The above, however, has the caveat that we can't do that without
violating register region restrictions unless we probably do some
sort of SIMD splitting.

Alternatively, we can accomplish what we need without SIMD splitting
by exploiting the gen7 hardware decompression bug for instructions
with a vstride=0. For example, an instruction like this:

mov(8) r2.x:DF r0.2<0>xyzw:DF

Activates the hardware bug and produces this region:

Component: x0   y0   z0   w0   x1   y1   z1   w1
Register:  r0.2 r0.3 r0.2 r0.3 r1.2 r1.3 r1.2 r1.3

Where r0.2 and r0.3 are r0.z:DF for the first vertex of the SIMD4x2
execution and r1.2 and r1.3 are the same for the second vertex.

Using this to our advantage we can select r0.z:DF by doing
r0.2<0,2,1>.xyxy and r0.w by doing r0.2<0,2,1>.zwzw without needing
to split the instruction.

Of course, this only works for gen7, but that is the only hardware
platform were we implement align16/fp64 at the moment.

v2: Adapted to the fact that we now do this after converting to
    hardware registers (Iago)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
e238601a2d i965/vec4: translate 64-bit swizzles to 32-bit
The hardware can only operate with 32-bit swizzles, which is a rather
limiting restriction. However, the idea is not to expose this to the
optimization passes, which would be a mess to deal with. Instead, we let
the bulk of the vec4 backend ignore this fact and we fix the swizzles right
at codegen time.

At the moment the pass only needs to handle single value swizzles thanks to
the scalarization pass that runs before it.

Notice that this only works for X/Y swizzles. We will add support for Z/W
swizzles in the next patch, since they need a bit more work.

v2 (Sam):
  - Do not expand swizzle of 64-bit immediate values.

v3:
  - Do this after translation to hardware registers instead of doing it right
    before so we don't need the force_vstride0 flag (Curro).
  - Squashed patch that included FIXED_GRF in the list of register files that
    need this translation (Iago).
  - Remove swizzle assignments for VGRF and UNIFORM files in
    convert_to_hw_regs(), they will be set by apply_logical_swizzle() (Iago).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
fb7cb853c9 i965/vec4: add a scalarization pass for double-precision instructions
The hardware only supports 32-bit swizzles, which means that we can
only access directly channels XY of a DF making access to channels ZW
more difficult, specially considering the various regioning restrictions
imposed by the hardware. The combination of both things makes handling
ramdom swizzles on DF operands rather difficult, as there are many
combinations that can't be represented at all, at least not without
some work and some level of instruction splitting depending on the case.

Writemasks are 64-bit in general, however XY and ZW writemasks also work
in 32-bit, which means these writemasks can't be represented natively,
adding to the complexity.

For now, we decided to try and simplify things as much as possible to
avoid dealing with all this from the get go by adding a scalarization
pass that runs after the main optimization loop. By fully scalarizing
DF instructions in align16 we avoid most of the complexity introduced
by the aforementioned hardware restrictions and we have an easier path
to an initial fully functional version for the vector backend in Haswell
and IvyBridge.

Later, we can improve the implementation so we don't necessarily
scalarize everything, iteratively adding more complexity and building
on top of a framework that is already working. Curro drafted some ideas
for how this could be done here:
https://bugs.freedesktop.org/show_bug.cgi?id=92760#c82

v2:
  - Use a copy constructor for the scalar instructions so we copy all
    relevant instructions fields from the original instruction.

v3: Fix indention in one switch (Matt)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
f4b8649233 i965/vec4: split double-precision SEL
There is a hardware bug affecting compressed double-precision SEL
instructions in align16 mode by which they won't read predication mask
properly. The bug does not affect other predicated instructions
and it does not affect SEL in Align1 mode either. This was found
empirically and verified by Curro in the simulator.

Fix this by splitting double-precision SEL in Align16 mode to use an
execution size of 4.

v2: Check that the dst type is 64-bit, since we can have 16-wide single
    precision bcsel instructions that also write 2 registers.

v3: Replace bcsel by SEL in all the comments as bcsel is the nir opcode
but SEL is the actual assembly instruction (Matt).

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
5356d52f31 i965/vec4: teach cmod propagation about different execution sizes
We can't propagate the conditional modifier from one instruction to
another of a different execution size / group, since that would change
the channels affected by the conditional.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
8f39b3668a i965/vec4: teach CSE about exec_size, group and doubles
v2: adapt to changes in offset()

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
ca63a3ce51 i965/disasm: print NibCtrl for instructions with execsize < 8
v2 (Curro):
  - Print it also for execsize < 4.
  - QtrCtrl is still in effect, so print 2 * qtr_ctl + nib_ctl + 1
  - Do not read the nib ctl from the instruction in gen < 7,
    the field only exists in gen7+.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
a83608f504 i965/vec4: dump NibCtrl for instructions with execsize != 8
v2: do it in the same fashion as the FS backend for consistency (Curro)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
e481dcc35e i965/vec4: make the generator set correct NibCtrl for SIMD4 DF instructions
From the HSW PRM, Command Reference, QtrCtrl:

   "NibCtrl is only allowed for SIMD4 instructions with a DF (Double Float)
    source or destination type."

v2: Assert that the type is DF (Samuel)
v3: Don't set the default group to 0 and then set it only for 4-wide
    instructions. Instead, assert that exec size and group are always
    a correct match and then always set the default group from the
    instruction. (Curro)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
58767f0fec i965/vec4: add a SIMD lowering pass
Generally, instructions in Align16 mode only ever write to a single
register and don't need any form of SIMD splitting, that's why we
have never had a SIMD splitting pass in the vec4 backend. However,
double-precision instructions typically write 2 registers and in
some cases they run into certain hardware bugs and limitations
that we need to work around by splitting the instructions so we only
write to 1 register at a time. This patch implements a SIMD splitting
pass similar to the one in the scalar backend.

Because we only use double-precision instructions in Align16 mode
in gen7 (gen8+ is fully scalar and gens < 7 do not implement fp64)
the pass should be a no-op on any other generation.

For now the pass only handles the gen7 restriction where any
instruction that writes 2 registers also needs to read 2 registers.
This affects double-precision instructions reading uniforms, for
example. Later patches will extend the lowering pass adding a few
more cases.

v2:
 - Move the simd lowering pass after the main optimization loop and
   run copy-propagation and dce if it reports progress (Curro)
 - Compute number of registers written instead of fixing it to 1 (Iago)
 - Use group from backend_instruction (Iago)
 - Drop assertion that checked that we only split 8-wide instructions
   into 4-wide. (Curro)
 - Don't assume that instructions can only be 8-wide, we might want
   to use 16-wide instructions in the future too (Curro)
 - Wrap gen7 workarounds in a conditional to ease adding workarounds
   for other gens in the future (Curro)
 - Handle dst/src overlap hazard (Curro)
 - Use the horiz_offset() helper to simplify the implementation (Curro)
 - Drop the assertion that checks that each split instruction writes
   exactly one register (Curro)
 - Use the copy constructor to generate split instructions with all
   the relevant fields initialized to the values in the original
   instruction instead of copying only a handful of them manually (Curro)

v3 (Iago):
 - When copying to a temporary, allocate the number of registers required
   for the copy based on the size written of the lowered instruction
   instead of assuming that all lowered instructions produce single-register
   writes
 - Adapt to changes in offset()

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
945269ab72 i965: move the group field from fs_inst to backend_instruction.
Just like the exec_size, we are going to need this in the vec4 backend
when we implement a simd splitting pass.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
07cadc306e i965/vec4: add a horiz_offset() helper
This will come in handy when we implement a simd lowering pass in a
follow-up patch.

v2: use byte_offset()

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Juan A. Suarez Romero
4ea3bf8ebb i965/vec4: handle 32 and 64 bit channels in liveness analysis
Our current data flow analysis does not take into account that channels
on 64-bit operands are 64-bit. This is a problem when the same register
is accessed using both 64-bit and 32-bit channels. This is very common
in operations where we need to access 64-bit data in 32-bit chunks,
such as the double packing and packing operations.

This patch changes the analysis by checking the bits that each source
or destination datatype needs. Actually, rather than bits, we use
blocks of 32bits, which is the minimum channel size.

Because a vgrf can contain a dvec4 (256 bits), we reserve 8
32-bit blocks to map the channels.

v2 (Curro):
  - Simplify code by making the var_from_reg helpers take an extra
    argument with the register component we want.
  - Fix a couple of cases where we had to update the code to the new
    way of representing live variables.

v3:
  - Fix indent in multiline expressions (Matt)
  - Fix comment's closing tag (Matt)
  - Use DIV_ROUND_UP(inst->size_written, 16) instead of 2 * regs_written(inst)
    to avoid rounding issues. The same for regs_read(i). (Curro).
  - Add asserts in var_from_reg() to avoid exceeding the allocated
    registers (Curro).

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
29dd5cf9d6 i965/vec4: dump the instruction execution size
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
486fd5422c i965/vec4: use the IR's execution size
In the vec4 backend the generator sets to 8 the execution size for all
instructions by default, however, to implement 64-bit floating-point we
will need to split certain instruction into smaller sizes so we need the
IR to convey this information like we do in the scalar backend. This patch
uses the execution size from the vec4 IR.

We will use this feature in a later patch when we implement a SIMD
splitting pass.

v2:
  - Drop the assertion on the execution size being 8 or 4 (Curro)
  - Use exec_size from backend_instruction (Curro)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
f79547840a i965/vec4: fix regs_read() for doubles
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
7c6fba5e7c i965/vec4: fix size_written for doubles
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:51 +01:00
Iago Toral Quiroga
9527a50da0 i965: move exec_size from fs_instruction to backend_instruction
We are going to need this in the vec4 backend too.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Samuel Iglesias Gonsálvez
b58026b31e i965/vec4: use the new helper function to create double immediates
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Iago Toral Quiroga
98da3623d5 i965/vec4: add a helper function to create double immediates
Gen7 hardware does not support double immediates so these need
to be moved in 32-bit chunks to a regular vgrf instead. Instead
of doing this every time we need to create a DF immediate,
create a helper function that does the right thing depending
on the hardware generation.

v2 (Curro):
  - Use swizzle() and writemask() helpers and make tmp const.

v3 (Iago):
  - Adapt to changes in offset()

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Iago Toral Quiroga
8f9ce5fa22 i965/vec4: fix optimize predicate for doubles
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Iago Toral Quiroga
1816ae8f68 i965/vec4: implement fsign() for doubles
v2: use a MOV with a conditional_mod instead of a CMP, like we do in d2b, to skip
    loading a double immediate.

v3: Fix comment (Matt)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Iago Toral Quiroga
6e570619e0 i965/vec4: implement d2b
v2 (Curro):
  - Generate the flag register with a MOV with conditional_mod instead of a CMP
    instruction, which has the benefit that we can skip loading a DF
    0.0 constant.
  - Avoid the PICK_LOW_32BIT + MOV by using the flag result and a
    SEL to set the boolean result.

v3:
  - Fix comment (Matt)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Iago Toral Quiroga
c1fb525016 i965/vec4: implement d2i, d2u, i2d and u2d
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Iago Toral Quiroga
4b22576234 i965/vec4: implement HW workaround for align16 double to float conversion
From the BDW PRM, Workarounds chapter:

   "DF->f format conversion for Align16 has wrong emask calculation when
    source is immediate."

Notice that Broadwell and later are strictly scalar at the moment though, so
this is not really necessary.

v2: Instead of moving the immediate to a vgrf and converting from there, just
    convert the double immediate to float in the compiler and move the result
    to the destination (Matt)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Iago Toral Quiroga
bfc1f0f017 i965/vec4: add helpers for conversions to/from doubles
Use these helpers to implement d2f and f2d. We will reuse these helpers when
we implement things like d2i or i2d as well.

v2:
- Rename the helpers (Matt)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Iago Toral Quiroga
c722a8e61e i965/vec4: Rename DF to/from F generator opcodes
The opcodes are not specific for conversions to/from float since we need
the same for conversions to/from other 32-bit types. Rename the opcodes
accordingly and change the asserts to check the size of the types involved
instead.

v2:
- Rename to VEC4_OPCODE_TO_DOUBLE and VEC4_OPCODE_FROM_DOUBLE (Matt)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Iago Toral Quiroga
619271ec87 i965/vec4: fix register allocation for 64-bit undef sources
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Iago Toral Quiroga
21cf6f14d5 i965/vec4: make opt_vector_float ignore doubles
The pass does not support doubles in its current form.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Iago Toral Quiroga
a8318b120e i965/vec4: fix get_nir_dest() to use DF type for 64-bit destinations
v2: Make dst_reg_for_nir_reg() handle this for nir_register since we
    want to have the correct type set before we call offset().

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Iago Toral Quiroga
bb0e67d55d i965/vec4: fix indentation in get_nir_src()
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Iago Toral Quiroga
8cdbbbd2cf i965/vec4/nir: implement double comparisons
v2:
- Added newline before if() (Matt)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Iago Toral Quiroga
8a3ba03339 i965/vec4: implement double packing
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Iago Toral Quiroga
94cfdf586a i965/vec4: implement double unpacking
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Iago Toral Quiroga
7ec57e91d6 i965/vec4: don't copy propagate vector opcodes that operate in align1 mode
Basically, ALIGN1 mode will ignore swizzles on the input vectors so we don't
want the copy propagation pass to mess with them.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Iago Toral Quiroga
553700cf55 i965/vec4: Fix DCE for VEC4_OPCODE_SET_{LOW,HIGH}_32BIT
These align1 opcodes do partial writes of 64-bit data. The problem is that we
want to use them to write on the same register to implement packDouble2x32 and
from the point of view of DCE, since both opcodes write to the same register,
only the last one stands and decides to eliminate the first, which is
not correct, so prevent this from happening.

v2: Make a helper in vec4_instruction to know if the instruction is an
    align1 partial write. This will come in handy when we implement a
    simd splitting pass in a later patch.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Iago Toral Quiroga
54b998e0e4 i965/vec4: add VEC4_OPCODE_SET_{LOW,HIGH}_32BIT opcodes
These opcodes will set the low/high 32-bit in each 64-bit data element
using Align1 mode. We will use this to implement packDouble2x32.

We use Align1 mode because in order to implement this in Align16 mode
we would need to use 32-bit logical swizzles (XZ for low, YW for high),
but the IR works in terms of 64-bit logical swizzles for DF operands
all the way up to codegen.

v2:
 - use suboffset() instead of get_element_ud()
 - no need to set the width on the dst

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Iago Toral Quiroga
6979e5a412 i965/vec4: add VEC4_OPCODE_PICK_{LOW,HIGH}_32BIT opcodes
These opcodes will pick the low/high 32-bit in each 64-bit data element
using Align1 mode. We will use this, for example, to do things like
unpackDouble2x32.

We use Align1 mode because in order to implement this in Align16 mode
we would need to use 32-bit logical swizzles (XZ for low, YW for high),
but the IR works in terms of 64-bit logical swizzles for DF operands
all the way up to codegen.

v2:
 - use suboffset() instead of get_element_ud()
 - no need to set the width on the dst

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Iago Toral Quiroga
9b6174dffa i965/vec4: add dst_null_df()
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Iago Toral Quiroga
4c040332f5 i965/vec4: We only support 32-bit integer ALU operations for now
Add asserts so we remember to address this when we enable 64-bit
integer support, as suggested by Connor and Jason.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Iago Toral Quiroga
611fe6b32f i965/disasm: align16 DF source regions have a width of 2
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Iago Toral Quiroga
c35fa7ac55 i965/vec4: set correct register regions for 32-bit and 64-bit
For 32-bit instructions we want to use <4,4,1> regions for VGRF
sources so we should really set a width of 4 (we were setting 8).

For 64-bit instructions we want to use a width of 2 because the
hardware uses 32-bit swizzles, meaning that we can only address 2
consecutive 64-bit components in a row. Also, Curro suggested that
the hardware is probably fixing the width to 2 for 64-bit instructions
anyway, so just go with that and use <2,2,1>.

v2:
 - No need to explicitly set the vertical stride of 64-bit regions to 2,
   brw_vecn_grf with a width of 2 will do that for us.
 - No need to adjust the width of dst registers.

v3 (Ian):
 - Make type_size and width const.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Connor Abbott
ed74b19ab4 i965: add brw_vecn_grf()
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Iago Toral Quiroga
e09a6be3b6 i965/vec4: translate d2f/f2d
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Iago Toral Quiroga
558f279531 i965/vec4: add double/float conversion pseudo-opcodes
These need to be emitted as align1 MOV's, since they need to have a
stride of 2 on the float register (whether src or dest) so that data
from another thread doesn't cross the middle of a SIMD8 register.

v2 (Iago):
- The float-to-double needs to align 32-bit data to 64-bit before doing the
conversion. This was doable in align16 when we tried to use an execsize
of 4, but with an execsize of 8 we would need another align1 opcode to do
that (since we need data to cross the middle of a SIMD register). Just
making the opcode handle this internally seems more practical that adding
another opcode just for this purpose and having the caller know about this
before converting.
- The double-to-float conversion produces 32-bit elements aligned to 64-bit
so we make the opcode re-pack the result to 32-bit and fit in one register,
as expected by SIMD4x2 operation. This still requires that callers reserve
two registers for the float data destination because we need to produce
64-bit aligned data first, and repack it later on the same destination
register, but it saves the need for a re-pack opcode only to achieve this
making the operation complete in a single opcode. Hopefully that is worth
the weirdness of the double register allocation...

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Connor Abbott
2d6eee3144 i965/vec4: add support for printing DF immediates
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Iago Toral Quiroga
9ce4b20bde i965/vec4/nir: fix emitting 64-bit immediates
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Connor Abbott
3457252b74 i965/vec4/nir: set the right type for 64-bit registers
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Iago Toral Quiroga
fef06f6356 i965/vec4/nir: support doubles in ALU operations
Basically, this involves considering the bit-size information to set
the appropriate type on both operands and destination.

v2 (Curro)
  - Don't use two temporaries (and write one of them twice ) to obtain
    the nir_alu_type.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Iago Toral Quiroga
0f096b1e5a i965/vec4/nir: Add bit-size information to types
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Connor Abbott
2d81a29203 i965/vec4/nir: allocate two registers for dvec3/dvec4
v2 (Curro):
  - Do not special-case for a bit-size of 64, divide the bit_size by 32
    instead.
  - Use DIV_ROUND_UP so we can handle sub-32-bit types.

v3 (Ian):
  - Make num_regs const.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Connor Abbott
54913850aa i965/vec4/nir: simplify glsl_type_for_nir_alu_type()
Less duplication, one one less case to handle for doubles and support
for sized NIR types.

v2: Fix call to get_instance by swapping rows and columns params (Iago)

Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Samuel Iglesias Gonsálvez
9fa24632f3 i965/nir: double/dvec2 uniforms only need to be padded to a single vec4 slot
max_vector_size is used in the vec4 backend to pad out the uniform
components to match a size that is a multiple of a vec4. Double and dvec2
uniforms only require a single vec4 slot, not two.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 11:26:50 +01:00
Samuel Iglesias Gonsálvez
c5ae6e78fc i965/fs: fix exec_size when emitting DIM instruction
Otherwise, DIM instructions will be emitted with the default exec size
which could be 16 in some cases, that is not legal.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-03 06:48:39 +01:00
Timothy Arceri
22639a6e19 st/mesa: get Version from gl_program rather than gl_shader_program
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-03 12:57:24 +11:00
Timothy Arceri
2c0d267717 i965: stop passing gl_shader_program to brw_compile_gs() and gen6_gs_visitor()
Instead we caan just use gl_program.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-03 12:20:10 +11:00
Timothy Arceri
b880281f0b i965: get InfoLog and LinkStatus via the shader program data pointer in gl_program
This removes another dependency on gl_shader_program in the codegen
functions.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-03 12:20:10 +11:00
Timothy Arceri
340b22c217 i965: eliminate gen6_xfb_enabled field in brw_gs_prog_data
We can just get this information from shader_info instead.

Note that passing gen6_gs_visitor() gl_program via _LinkedShaders
will go away in a later patch.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-03 12:20:10 +11:00
Timothy Arceri
6643da6d7f i965: update brw_get_shader_time_index() not to take gl_shader_program
This removes another dependency on gl_shader_program in the codegen
functions which will help allow us to use gl_program in the
CurrentProgram array rather than gl_shader_program.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-03 12:20:10 +11:00
Marek Olšák
cb6f49a902 gallium/hud: fix the windows build by disabling file dumping 2017-01-02 23:18:28 +01:00
Kenneth Graunke
bc7f1eddbd glsl: Update ES 3.2 shader output restrictions.
This disallows fancy varyings in tessellation and geometry shaders,
as required by ES 3.2.

Fixes:
dEQP-GLES31.functional.tessellation.user_defined_io.negative.per_patch_array_of_structs
dEQP-GLES31.functional.tessellation.user_defined_io.negative.per_patch_structs_containing_arrays

(Not a candidate for stable branches as it only disallows things which
should be working as desktop GL allows them.)

v2: Update error messages to not say "vertex shader" (caught by Iago).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-01-02 14:10:50 -08:00
Ben Widawsky
fc78ee5da0 i965/miptree: Create a disable CCS flag
Cc: Chad Versace <chadversary@chromium.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-01-02 10:35:17 -08:00
Ben Widawsky
d0b6a949f8 i965: Replace bool aux disable with enum
As CCS buffers are passed to KMS, it becomes useful to be able to
determine exactly what type of aux buffers are disabled. This was
previously not entirely needed (though the code was a little more
confusing), however it becomes very desirable after a recent patch from
Chad:

commit 1c8be049be
Author: Chad Versace <chadversary@chromium.org>
Date:   Fri Dec 9 16:18:11 2016 -0800

    i965/mt: Disable aux surfaces after making miptree shareable

The next patch will handle CCS and get rid of no_ccs.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-01-02 10:35:13 -08:00
Edmondo Tommasina
3f5fba8a7b docs: document GALLIUM_HUD_DUMP_DIR envvar
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-01-01 00:03:39 +01:00
Edmondo Tommasina
5b9d76296f gallium/hud: set filedescriptor for fps graph
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-01-01 00:03:38 +01:00
Edmondo Tommasina
94c9916710 gallium/hud: set filedescriptor for cpu graph
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-01-01 00:03:38 +01:00
Edmondo Tommasina
57f86fb3a8 gallium/hud: move file initialization to a function
The function will be used later to create the filedescriptor
for other metrics.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-01-01 00:03:38 +01:00
Edmondo Tommasina
22cd9040da gallium/hud: dump hud_driver_query values to files
Dump values for every selected data source in GALLIUM_HUD.

Every data source has its own file and the filename is
equal to the data source identifier.

Set GALLIUM_HUD_DUMP_DIR to dump values to files in this directory.

No values are dumped if the environment variable is not set, the
directory doesn't exist or the user doesn't have write access.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-01-01 00:03:06 +01:00
Ilia Mirkin
1f13cb8b15 anv,radv: disable StorageImageWriteWithoutFormat for now
The SPIR-V capability isn't even marked as enabled, and there are no
tests in Vulkan-CTS. Per Jason Ekstrand, this won't work in anv as such
write-only surfaces require additional setup which is currently not
performed.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-12-31 16:38:00 -05:00
Kenneth Graunke
62a8191841 i965: Avoid NULL pointer dereference when transform feedback is off.
upload_3dstate_streamout can be called when there's no currently bound
transform feedback object.  In this case, we get the default object,
which has a NULL shader (previously gl_shader_program, now gl_program).

The old code did something sketchy, but which worked:

   const struct gl_transform_feedback_info *linked_xfb_info =
      &xfb_obj->shader_program->LinkedTransformFeedback;

Here, if shader_program is NULL, this would be a bogus pointer of 0x60.
But we never actually dereferenced it, so it worked out.

With Timothy's recent reworks, we actually end up dereferencing
xfb_obj->program along the way, which crashes since it's NULL.

The solution is to move this pointer initialization into the "active"
block, where we know it actually exists and won't be bogus.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99231
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-12-30 15:46:22 -08:00
Timothy Arceri
68245aa6f5 glsl/mesa: add reference to gl_shader_program_data from gl_program
We also add the stubs for the standalone compiler in this change.

By adding a reference here we can now refactor some code to use
gl_program where we were previously awkwardly using gl_shader_program.

Reviewed-by: Eric Anholt <eric@anholt.net>
2016-12-31 09:48:51 +11:00
Timothy Arceri
9d99dc4bc1 mesa: make union in gl_program a struct and add FIXME
i915 is mixing the use of these fields, for now change this to a
struct and add a FIXME.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99229
2016-12-31 09:00:05 +11:00
Jason Ekstrand
c2799a80c5 i965/peephole_ffma: Use nir_builder
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-12-30 12:38:04 -08:00
Jason Ekstrand
8495ece52e nir/split_var_copies: Use a nir_shader rather than a void *mem_ctx
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-12-30 12:38:04 -08:00
Jason Ekstrand
ffa4ba71d9 nir/opt_peephole_select: Pass around the actual nir_shader
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-12-30 12:38:04 -08:00
Jason Ekstrand
cd6f736c07 nir/conditional_if: Properly use the builder
We were passing around a void *mem_ctx and using that to initialize the
builder which was wrong since that pointed to ralloc_parent(impl) which
is the shader but the builder is supposed to be initialized with the
nir_function_impl.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-12-30 12:38:04 -08:00
Jason Ekstrand
47b54a6f74 nir/lower_var_copies: Use a shader rather than a void *mem_ctx
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-12-30 12:38:04 -08:00
Jason Ekstrand
c4ccdfa513 nir/lower_io: Use the builder instead of carrying a mem_ctx
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-12-30 12:38:04 -08:00
Jason Ekstrand
c8e0612165 nir/from_ssa: Use nir_builder for emit_copy
This lets us get rid of the void *mem_ctx parameter and make things a
bit more type safe.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-12-30 12:38:04 -08:00
Jason Ekstrand
134a5ad31c nir: Make nir_copy_deref follow the "clone" pattern
We rename it to nir_deref_clone, re-order the sources to match the other
clone functions, and expose nir_deref_var_clone.  This past part, in
particular, lets us get rid of quite a few lines since we no longer have
to call nir_copy_deref and wrap it in deref_as_var.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-12-30 12:38:04 -08:00
Rob Clark
832dddcf91 freedreno/ir3: rework varying slots (maybe??)
See:
dEQP-GLES2.functional.shaders.swizzles.vector_swizzles.mediump_vec2_yyyy_fragment

if we only access (in FS) varying.y then it ends up in slot zero..  I'm
not sure the hw likes that..

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-30 13:49:57 -05:00
Ilia Mirkin
36c648b894 spirv: always expose SpvCapabilityStorageImageExtendedFormats
I forgot to do this in commit 76b97d544e ("anv: enable storage image
extended formats"). Since both drivers support this now, no need for the
conditional enable.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-12-29 22:09:58 -05:00
Ilia Mirkin
c633f228b4 anv: add support for extended texture gather
Now that the SPIR-V -> NIR translation is in place, no additional logic
is required.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-12-29 20:43:33 -05:00
Dave Airlie
80bafc0c11 radv: only allow cmask/dcc in color optimal.
I had this on transfers due to the clear color cmd, but
it seems like that path shouldn't get fast clears.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-30 00:04:16 +00:00
Dave Airlie
1814df7ea7 radv: only allow cmask/dcc on exclusive or concurrent with graphics queue.
Otherwise we don't get the barriers to flush dcc etc.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-30 00:04:01 +00:00
Jason Ekstrand
a4d1eb443e nir: Rewrite lower_regs_to_ssa to use the phi builder
This keeps some of Connor's original code.  However, while I was at it,
I updated this very old pass to a bit more modern NIR.
2016-12-29 16:02:44 -08:00
Jason Ekstrand
67a70889f6 nir/phi-builder: Set the value in the block when creating a phi
After we figure out the value that we are going to return, we have a
loop that walks up the dominance tree and sets the value in each of the
blocks that doesn't have one yet.  In the case of the phi, the def is
set to NEEDS_PHI not NULL, so the last one where the phi node actually
goes never gets filled out.  This can lead to duplicating the phi node
unnecessarily.
2016-12-29 16:02:44 -08:00
Jason Ekstrand
baf1aa1334 nir: Add foreach_register helper macros 2016-12-29 16:02:44 -08:00
Jason Ekstrand
fb181196de nir: Rename convert_to_ssa lower_regs_to_ssa
This matches the naming of nir_lower_vars_to_ssa, the other to-SSA pass.
2016-12-29 16:02:44 -08:00
Timothy Arceri
194537ebe4 mesa/glsl/i965: remove Driver.NewShader()
After removing brw_shader in the previous commit this is no longer
needed.

V2: remove use in src/compiler/glsl/test_optpass.cpp

Reviewed-by: Eric Anholt <eric@anholt.net>
2016-12-30 10:57:17 +11:00
Timothy Arceri
718a0cf49f i965: move compiled_once flag to brw_program
This allows us to delete brw_shader and removes the last use of
gl_linked_shader in the codegen paths.

Reviewed-by: Eric Anholt <eric@anholt.net>
2016-12-30 10:57:16 +11:00
Timothy Arceri
8417bf528e mesa/glsl: move BlendSupport bitfield to gl_program
This will let us to make _CurrentFragmentProgram a gl_program pointer
allowing for simpilifications to be made.

We also need to add a field to gl_shader to hold it during parsing.

In gl_program we put it inside a union in anticipation of moving
more fields here that can be only fs or vertex stage fields.

Reviewed-by: Eric Anholt <eric@anholt.net>
2016-12-30 10:57:16 +11:00
Timothy Arceri
3177eef392 mesa: store gl_program in gl_transform_feedback_object rather than gl_shader_program
This will allow us to make the CurrentProgram array store gl_program which allows
us to do a bunch of simplifications.

Reviewed-by: Eric Anholt <eric@anholt.net>
2016-12-30 10:57:16 +11:00
Timothy Arceri
700bc94dce mesa/glsl: move LinkedTransformFeedback from gl_shader_program to gl_program
This will help allow us to store gl_program in the CurrentProgram array rather
than gl_shader_program which will allow a bunch of simplifications.

Note that we make LinkedTransformFeedback a pointer so we don't waste
memory creating a struct for each stage. We also store a pointer to
the gl_program that will contain the pointer in gl_shader_program so
we can get easy access to the correct stage.

Reviewed-by: Eric Anholt <eric@anholt.net>
2016-12-30 10:57:16 +11:00
Timothy Arceri
31c04e4e22 i965: get LinkedTransformFeedback from gl_transform_feedback_object
We have already set the gl_shader_program pointer to the correct
shader program in _mesa_BeginTransformFeedback() so use it.

This is more consistent with how we do it for gen7.

Reviewed-by: Eric Anholt <eric@anholt.net>
2016-12-30 10:57:16 +11:00
Timothy Arceri
29d70f5de9 mesa: move _Used to gl_program
We no longer need to initialise it because gl_program is never reused.

Reviewed-by: Eric Anholt <eric@anholt.net>
2016-12-30 10:57:16 +11:00
Timothy Arceri
8a69ae5345 mesa/compiler: add local_size_variable to shader_info
This will be used in api_validate.c in a following patch when we
switch to using gl_program pointers for the pipelines CurrentProgram
array.

Reviewed-by: Eric Anholt <eric@anholt.net>
2016-12-30 10:57:16 +11:00
Timothy Arceri
9ea513e226 mesa: pass gl_program to _mesa_append_uniforms_to_file()
This now contains everything we need.

Reviewed-by: Eric Anholt <eric@anholt.net>
2016-12-30 10:57:16 +11:00
Timothy Arceri
b51bfbdd85 glsl/mesa: set separate_shader directly in shader_info
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-12-30 10:57:16 +11:00
Timothy Arceri
41dd6c3539 mesa/glsl: move subroutine metadata to gl_program
This will allow us to store gl_program rather than gl_shader_program
as the current program perstage which allows us to simplify code
that makes use of the CurrentProgram list.

Reviewed-by: Eric Anholt <eric@anholt.net>
2016-12-30 10:57:16 +11:00
Timothy Arceri
0de6f6223a mesa/compiler: add stage to shader_info
This will allow us to simplify the current program logic for SSO.

Also since we aim to detach shader_info from nir_shader this will come
in handy avoiding passing nir_shader around just to keep track of
the stage we are dealing with.

V2: set stage for arb asm programs also.

Reviewed-by: Eric Anholt <eric@anholt.net>
2016-12-30 10:57:15 +11:00
Eric Anholt
88b41239f9 vc4: Rework scheduling of thread switch to cut one more NOP.
Jonas's patch got us most of the benefit of scheduling instructions into
the delay slots of thread switch, but if there had been nothing to pair
the thrsw with, it would move the thrsw up and leave a NOP where the thrsw
was.

Instead, don't pair anything with thrsw through the normal scheduling
path, and have a separate helper function that inserts the thrsw earlier
if possible and inserts any necessary NOPs.

total instructions in shared programs: 93027 -> 92643 (-0.41%)
instructions in affected programs:     14952 -> 14568 (-2.57%)
2016-12-29 15:22:54 -08:00
Jonas Pfeil
d82dbc4cde vc4: Fill thread switching delay slots
Scan for instructions without a signal set in front of the switching
instruction and move the signal up there.

shader-db results:

total instructions in shared programs: 94494 -> 93027 (-1.55%)
instructions in affected programs:     23545 -> 22078 (-6.23%)

v2: Fix re-emitting of the instruction in the loop trying to emit NOPs,
    drop a scheduling change from branch delay slots. (by anholt)

Signed-off-by: Jonas Pfeil <pfeiljonas@gmx.de>
2016-12-29 14:41:09 -08:00
Eric Anholt
63e7671c7e vc4: Enable NIR-based loop unrolling.
This successfully unrolls a new shader in GLB2.7, which also gets that
shader to successfully compile in multithreaded mode.
2016-12-29 14:41:09 -08:00
Timothy Arceri
5f323198ea nir: stop gcc warning about uninitialised variables
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-29 13:47:11 +11:00
Dave Airlie
44f833ab18 radv: denote support for extended storage image formats.
I'm sure anv has support for these as well, but this is just
a first use of the interface to allow different supported spir-v
features.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-28 22:44:40 +00:00
Dave Airlie
de7dd4d621 spirv: add interface for drivers to define support extensions.
I expect over time the struct contents will change as all
drivers support stuff etc, but for now this should be a good
starting point.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-28 22:43:17 +00:00
Chad Versace
464b23b1f2 mesa/shaderobj: Fix races on refcounts
Use atomic ops when updating gl_shader::RefCount.

Fixes intermittent failures and crashes in
'dEQP-EGL.functional.sharing.gles2.multithread.*'.
All tests in that group now pass except
'dEQP-EGL.functional.sharing.gles2.multithread.simple_egl_server_sync.textures.copyteximage2d_texsubimage2d_render'.

Tested with:
  mesa: branch 'master' at d6545f2
  deqp: branch 'nougat-cts-dev' at 4acf725 with additional local fixes
  DEQP_TARGET: x11_egl
  hw: Intel Broadwell 0x1616

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99085
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Cc: Mark Janes <mark.a.janes@intel.com>
Cc: Haixia Shi <hshi@chromium.org>
2016-12-28 11:10:43 -08:00
Rob Clark
ec01ef2db1 freedreno/ir3: fix linkage::var size
It should actually be 32 for a4xx/a5xx.. we still only advertise 16 but
for a5xx the linkage map includes position/psize.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-27 16:54:01 -05:00
Rob Clark
c416ea31cf freedreno/ir3: treat clipvertex like a normal varying
We need this in case it is streamed out.  Not sure why we were treating
it specially before.  Having it as a VS out is harmless if FS doesn't
have a matching input.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-27 16:54:01 -05:00
Rob Clark
d10c5a2481 freedreno/a5xx: transform-feedback support
We'll need to revisit when adding hw binning pass support, whether we
can still do this in main draw step, as we do w/ a3xx/a4xx, or if we
needed to move it to the binning stage.

Still some failing piglits but most tests pass and the common cases seem
to work.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-27 16:54:01 -05:00
Rob Clark
928e9bd602 freedreno: update generated headers
Pull in a5xx streamout related regs.  Also fixes a couple incorrect
register definitions.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-27 16:54:01 -05:00
Rob Clark
6d77ceb701 freedreno/ir3: UBO support for 64b GPUs (a5xx)
Update address calculation to support 64b addresses.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-27 16:54:01 -05:00
Rob Clark
fc10dc9fde freedreno/ir3: rework location of driver constants
Rework how we lay out driver constants (driver-params, UBO/TFBO buffer
addresses, immediates) for more flexibility.  For a5xx+ we need to deal
with the fact that gpu ptrs are 64b instead of 32b, which makes the
fixed offset scheme not work so well.  While we are dealing with that
we might also make the layout more dynamic to account for varying # of
UBOs, etc.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-27 16:54:01 -05:00
Rob Clark
09202cde7e freedreno/a5xx: fix emit for bo addresses
Reloc for the buffer address is two dwords on 64b devices (a5xx+)

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-27 16:54:01 -05:00
Rob Clark
f043904080 freedreno/a5xx: texture layout
Seems to be imilar to a4xx, and sampler state "array-pitch" needs
to be aligned to page size.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-27 16:54:01 -05:00
Rob Clark
859cb24d94 ttn: set ->info->num_ubos
For dealing w/ 32b vs 64b gpu addresses, I need to rework how we pass
UBO buffer addresses to shader, and knowing up front the # of UBOs is
useful.  But I noticed ttn wasn't setting this.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-12-27 16:54:01 -05:00
Chad Versace
d6545f2345 anv: Handle vkGetPhysicalDeviceQueueFamilyProperties with count == 0
The spec implicitly allows the incoming count to be 0. From the Vulkan
1.0.38 spec, Section 4.1 Physical Devices:

    If the value referenced by pQueueFamilyPropertyCount is not 0 [then
    do stuff].

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-27 12:31:34 -08:00
Chad Versace
b85c0b569f egl: Emit correct error when robust context creation fails
Fixes dEQP-EGL.functional.create_context_ext.robust_*
on Intel with GBM.

If the user sets the EGL_CONTEXT_OPENGL_ROBUST_ACCESS_BIT_KHR in
EGL_CONTEXT_FLAGS_KHR when creating an OpenGL ES context, then
EGL_KHR_create_context spec requires that we unconditionally emit
EGL_BAD_ATTRIBUTE because that flag does not exist for OpenGL ES. When
creating an OpenGL context, the spec requires that we emit EGL_BAD_MATCH
if we can't support the request; that error is generated in the egl_dri2
layer where the driver capability is actually checked.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99188
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-12-27 10:21:29 -08:00
Damien Grassart
75252826e8 anv: return count of queue families written
The Vulkan spec indicates that
vkGetPhysicalDeviceQueueFamilyProperties() should overwrite
pQueueFamilyPropertyCount with the number of structures actually
written to pQueueFamilyProperties.

Signed-off-by: Damien Grassart <damien@grassart.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Cc: mesa-stable@lists.freedesktop.org
2016-12-27 10:15:47 -08:00
Chad Versace
e2d69d5e2d i965: Allow import/export of ARGB1555 images
To my knowledge, this fixes no tests. I simply wrote the patch for
completeness as a follow-up to the previous two patches.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-12-27 09:14:04 -08:00
Chad Versace
f3739810e3 mesa/texformat: Handle GL_RGBA + GL_UNSIGNED_SHORT_5_5_5_1
_mesa_choose_tex_format() already handles GL_RGBA + GL_UNSIGNED_SHORT_1_5_5_5_REV
by converting it to MESA_FORMAT_B5G5R5A1_UNORM. Teach it do the same for
the non-reversed type. Otherwise, the switch's fallthrough converts it
to an 8888 format, which has incompatible precision in the alpha
channel.

Patch 2/2 to fix dEQP-EGL.functional.image.modify.tex_rgb5_a1_tex_subimage_rgba8
on Intel.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99185
Cc: Haixia Shi <hshi@chromium.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-12-27 09:14:00 -08:00
Chad Versace
9aa6ab0748 dri: Add __DRI_IMAGE_FORMAT_ARGB1555
This allows eglCreateImage() to accept textures of said format.

Patch 1/2 to fix
dEQP-EGL.functional.image.modify.tex_rgb5_a1_tex_subimage_rgba8
on Intel.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99185
Cc: Haixia Shi <hshi@chromium.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-12-27 09:13:43 -08:00
Tapani Pälli
4d6d4f939e egl/dri2: implement query surface hook
This makes better guarantee that the values we return are
in sync what the underlying drawable currently has.

Together with dEQP change in bug #98327 this fixes following test:

   dEQP-EGL.functional.resize.surface_size.grow

v2: avoid unnecessary x11 roundtrips (Chad Versace)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98327
2016-12-27 08:01:08 +02:00
Dave Airlie
d8423772ca radv: add some asserts for operations on general queue
These might be useful in the future, or not.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-27 03:27:14 +00:00
Bas Nieuwenhuizen
059af2515a radv: Also skip DCC clear flushes for compute.
(airlied: fixes DOOM hang with compute queue enabled)
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
2016-12-27 03:27:13 +00:00
Dave Airlie
3fd306b423 radv: handle queue present directly to winsys
Don't call the QueueSubmit interface, just call direct to the
winsys, so we can pass the wait semaphores.

Noticed while debugging doom, doesn't fix anything.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-26 22:20:35 +00:00
Jordan Justen
097c9dc2d4 intel/blorp_blit: Fix max blit size for gen6
Fixes ES3-CTS.gtf.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_stencil_blit

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-26 08:50:21 -08:00
Dave Airlie
b5bb8b54cf radv: fix rendering to b10g11r11_ufloat_pack32
doom was causing a printf about an illegal color, it was due the
non-void returning -1, and the other function checking for 4,
align these.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-26 10:31:20 +10:00
Dave Airlie
4813c9ade7 radv: handle multi-component shared load/stores.
This was seen in doom shaders, so handle it properly.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave AIrlie <airlied@redhat.com>
2016-12-26 10:31:20 +10:00
Vedran Miletić
d9fef848a6 clover: Use Clang's diagnostics
Presently errors from frontend are handled only if they occur in
clang::CompilerInvocation::CreateFromArgs(). This patch uses
clang::DiagnosticsEngine to detect errors such as invalid values for
Clang frontend arguments.

Fixes Piglit's cl/program/build/fail/invalid-version-declaration.cl
test.

v2: fix inconsistent code formatting

Signed-off-by: Vedran Miletić <vedran@miletic.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Tested-by: Aaron Watry <awatry@gmail.com>
2016-12-24 18:35:09 -08:00
Damien Grassart
3a30b1a556 radv: return count of queue families written
The Vulkan spec indicates that
vkGetPhysicalDeviceQueueFamilyProperties() should overwrite
pQueueFamilyPropertyCount with the number of structures actually
written to pQueueFamilyProperties.

Signed-off-by: Damien Grassart <damien@grassart.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-12-25 02:25:02 +01:00
Jason Ekstrand
88b5acfa09 i965/generator/tex: Handle an immediate sampler with an indirect texture
In this case we were dying when we tried to do SHL addr sampler imm(8)
because that puts an immediate in src0 of a two source instruction. This
fixes 2704 of the new separate sampler Vulkan CTS tests on Sky Lake.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-12-23 07:27:13 -08:00
Bruce Cherniak
9e35426731 swr: fix icc compile error
ICC doesn't like the use of nullptr (std::nullptr_t) argument in
p_atomic_set.  GCC and clang don't complain.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99119
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-12-23 08:36:21 -06:00
Dave Airlie
e7279f16a0 radv: set some proper values for interp offset limits.
These are taken from the amdgpu-pro driver, and cause no
CTS change.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-23 14:36:54 +10:00
Dave Airlie
14737bcdd5 radv: bump texel offsets to align with radeonsi
it appears from the amdgpu-pro results the hw can do more,
but let's just align with radeonsi for now.

No CTS regressions.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-23 14:36:50 +10:00
Jason Ekstrand
d55835b8bd nir/algebraic: Add optimizations for "a == a && a CMP b"
This sequence shows up The Talos Principal, at least under Vulkan,
and prevents loop analysis from properly computing trip counts in a
few loops.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-12-22 16:27:19 -08:00
Jason Ekstrand
8962cc96ec i965: Use nir_opt_trivial_continues and nir_opt_if
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-12-22 16:27:19 -08:00
Jason Ekstrand
6d9f576b56 nir: Add a pass for moving SPIR-V continue blocks to the ends of loops
When shaders come in from SPIR-V, we handle continue blocks by placing
the contents of the continue inside of a "if (!first_iteration)".  We do
this so that we can properly handle the fact that continues in SPIR-V
jump to the continue block at the end of the loop rather than jumping
directly to the top of the loop like they do in NIR.  In particular, the
increment step of a simple for loop ends up in the continue block.  This
pass looks for this case in loops that don't actually have any continues
and moves the continue contents to the end of the loop instead.  We need
this because loop unrolling doesn't work if the increment is inside of a
condition.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-12-22 16:27:19 -08:00
Jason Ekstrand
1111a05f90 nir: Add an optimization pass to remove trivial continues
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-12-22 16:27:19 -08:00
Jason Ekstrand
993e9195d4 nir: Correctly handle blocks in cf_node_cf_tree_next
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-12-22 16:27:19 -08:00
Timothy Arceri
3321eb4c36 i965: make use of nir_lower_returns() for GL
Fixes two new piglit tests:

spec/glsl-1.10/execution/vs-nested-return-sibling-loop.shader_test
spec/glsl-1.10/execution/vs-nested-return-sibling-loop2.shader_test

shader-db results for BDW:

total instructions in shared programs: 12903158 -> 12903134 (-0.00%)
instructions in affected programs: 27100 -> 27076 (-0.09%)
helped: 32
HURT: 6

total cycles in shared programs: 294922518 -> 294922804 (0.00%)
cycles in affected programs: 4372828 -> 4373114 (0.01%)
helped: 31
HURT: 8

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-23 10:59:32 +11:00
Timothy Arceri
f20ba7ad44 nir: update nir_lower_returns to only predicate instructions when needed
Unless an if statement contains nested returns we can simply add
any following instructions to the branch without the return.

V2: fix handling if_nested_return value when there is a sibling if/loop
that doesn't contain a return. (Spotted by Ken)

V3:
 - add a better comment to the new variable
 - remove instructions after if when both branches return

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-23 10:59:32 +11:00
Timothy Arceri
40e9f2f138 i965: disable loop unrolling in GLSL IR
There is a single regression in loop unrolling which is:

loops HURT:   shaders/orbital_explorer.shader_test GS SIMD8:    0 -> 1

However the loop is huge so it seems reasonable not to unroll it. It's
surprising that GLSL IR does unroll it.

shader-db results BDW:

total instructions in shared programs: 13037455 -> 13036947 (-0.00%)
instructions in affected programs: 17982 -> 17474 (-2.83%)
helped: 63
HURT: 25

total cycles in shared programs: 262217870 -> 262227990 (0.00%)
cycles in affected programs: 2287046 -> 2297166 (0.44%)
helped: 969
HURT: 844

total loops in shared programs: 2951 -> 2952 (0.03%)
loops in affected programs: 0 -> 1
helped: 0
HURT: 1

LOST:   0
GAINED: 1

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-23 10:15:36 +11:00
Timothy Arceri
715f0d06d1 i965: use nir loop unrolling pass
shader-db results for BDW:

total instructions in shared programs: 12589614 -> 12590119 (0.00%)
instructions in affected programs: 50525 -> 51030 (1.00%)
helped: 7
HURT: 145

total cycles in shared programs: 241524604 -> 241490502 (-0.01%)
cycles in affected programs: 1941404 -> 1907302 (-1.76%)
helped: 302
HURT: 449

total loops in shared programs: 4245 -> 2947 (-30.58%)
loops in affected programs: 1535 -> 237 (-84.56%)
helped: 1142
HURT: 0

total spills in shared programs: 14453 -> 14453 (0.00%)
spills in affected programs: 0 -> 0
helped: 0
HURT: 0

total fills in shared programs: 18984 -> 18984 (0.00%)
fills in affected programs: 0 -> 0
helped: 0
HURT: 0

LOST:   26
GAINED: 15

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-23 10:15:36 +11:00
Timothy Arceri
e729504fb1 nir: pass compiler rather than devinfo to functions that call nir_optimize
Later we will pass compiler to nir_optimise to be used by the loop unroll
pass.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-23 10:15:36 +11:00
Timothy Arceri
51daccb289 nir: add a loop unrolling pass
V2:
- tidy ups suggested by Connor.
- tidy up cloning logic and handle copy propagation
 based of suggestion by Connor.
- use nir_ssa_def_rewrite_uses to fix up lcssa phis
  suggested by Connor.
- add support for complex loop unrolling (two terminators)
- handle case were the ssa defs use outside the loop is already a phi
- support unrolling loops with multiple terminators when trip count
  is know for each terminator

V3:
- set correct num_components when creating phi in complex unroll
- rewrite update remap table based on Jasons suggestions.
- remove unrequired extract_loop_body() helper as suggested by Jason.
- simplify the lcssa phi fix up code for simple loops as per Jasons suggestions.
- use mem context to keep track of hash table memory as suggested by Jason.
- move is_{complex,simple}_loop helpers to the unroll code
- require nir_metadata_block_index
- partially rewrote complex unroll to be simpler and easier to follow.

V4:
- use rzalloc() when creating nir_phi_src but not setting pred right away
 fixes regression cause by ralloc() no longer zeroing memory.

V5:
- simplify calling of complex_unroll()
- use new loop terminator fields to get the break/continue from blocks
  and simplify loop unrolling code
- handle slightly less trivial loop terminators. if branches can
  now have instructions but can only contain a single block.
- use nir print type IR snippets in unroll function descriptions
- add better explanation and variable for why we need to clone
  additional times when the second terminator it the limiting
  terminator.
- partially convert out of ssa before unrolling loops (suggested by Jason)

v6:
- remove unused nir_builder
- use Jasons new from ssa helper
- tidy/fixup cursor use
- unroll terminators that contain control flow correctly
- unroll complex loops with control flow before the terminators
  correctly

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-23 10:15:36 +11:00
Timothy Arceri
f8407a5398 nir: add helper for cloning nir_cf_list
V2:
- updated to create a generic list clone helper nir_cf_list_clone()
- continue to assert on clone when fallback flag not set as suggested
  by Jason.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-23 10:15:36 +11:00
Timothy Arceri
b84dfa0f62 nir: update fixup_phi_srcs() to handle registers
We need to do this because we partially get out of SSA when unrolling
and cloning loops.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-23 10:15:36 +11:00
Timothy Arceri
d781320974 nir: create helper for fixing phi srcs when cloning
This will be useful for fixing phi srcs when cloning a loop body
during loop unrolling.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-23 10:15:36 +11:00
Thomas Helland
ec8423a4b1 nir: Add a LCSAA-pass
V2: Do a "depth first search" to convert to LCSSA

V3: Small comment fixup

V4: Rebase, adapt to removal of function overloads

V5: Rebase, adapt to relocation of nir to compiler/nir
    Still need to adapt to potential if-uses
    Work around nir_validate issue

V6 (Timothy):
 - tidy lcssa and stop leaking memory
 - dont rewrite the src for the lcssa phi node
 - validate lcssa phi srcs to avoid postvalidate assert
 - don't add new phi if one already exists
 - more lcssa phi validation fixes
 - Rather than marking ssa defs inside a loop just mark blocks inside
   a loop. This is simpler and fixes lcssa for intrinsics which do
   not have a destination.
 - don't create LCSSA phis for loops we won't unroll
 - require loop metadata for lcssa pass
 - handle case were the ssa defs use outside the loop is already a phi

V7: (Timothy)
- pass indirect mask to metadata call

v8: (Timothy)
- make convert to lcssa a helper function rather than a nir pass
- replace inside loop bitset with on the fly block index logic.
- remove lcssa phi validation special cases
- inline code from useless helpers, suggested by Jason.
- always do lcssa on loops, suggested by Jason.
- stop making lcssa phis special. Add as many source as the block
  has predecessors, suggested by Jason.

V9: (Timothy)
- fix regression with the is_lcssa_phi field not being initialised
  to false now that ralloc() doesn't zero out memory.

V10: (Timothy)
- remove extra braces in SSA example, pointed out by Topi

V11: (Timothy)
- add missing support for LCSSA phis in if conditions.

V12: (Timothy)
- small tidy up suggested by Jason.
- always create lcssa phi even if it just points to an lcssa
  phi from an inner loop

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-23 10:15:36 +11:00
Thomas Helland
6772a17acc nir: Add a loop analysis pass
This pass detects induction variables and calculates the
trip count of loops to be used for loop unrolling.

V2: Rebase, adapt to removal of function overloads

V3: (Timothy Arceri)
 - don't try to find trip count if loop terminator conditional is a phi
 - fix trip count for do-while loops
 - replace conditional type != alu assert with return
 - disable unrolling of loops with continues
 - multiple fixes to memory allocation, stop leaking and don't destroy
   structs we want to use for unrolling.
 - fix iteration count bugs when induction var not on RHS of condition
 - add FIXME for && conditions
 - calculate trip count for unsigned induction/limit vars

V4: (Timothy Arceri)
- count instructions in a loop
- set the limiting_terminator even if we can't find the trip count for
 all terminators. This is needed for complex unrolling where we handle
 2 terminators and the trip count is unknown for one of them.
- restruct structs so we don't keep information not required after
 analysis and remove dead fields.
- force unrolling in some cases as per the rules in the GLSL IR pass

V5: (Timothy Arceri)
- fix metadata mask value 0x10 vs 0x16

V6: (Timothy Arceri)
- merge loop_variable and nir_loop_variable structs and lists suggested by Jason
- remove induction var hash table and store pointer to induction information in
  the loop_variable suggested by Jason.
- use lowercase list_addtail() suggested by Jason.
- tidy up init_loop_block() as per Jasons suggestions.
- replace switch with nir_op_infos[alu->op].num_inputs == 2 in
  is_var_basic_induction_var() as suggested by Jason.
- use nir_block_last_instr() in and rename foreach_cf_node_ex_loop() as suggested
  by Jason.
- fix else check for is_trivial_loop_terminator() as per Connors suggetions.
- simplify offset for induction valiables incremented before the exit conditions is
  checked.
- replace nir_op_isub check with assert() as it should have been lowered away.

V7: (Timothy Arceri)
- use rzalloc() on nir_loop struct creation. Worked previously because ralloc()
  was broken and always zeroed the struct.
- fix cf_node_find_loop_jumps() to find jumps when loops contain
  nested if statements. Code is tidier as a result.

V8: (Timothy Arceri)
- move is_trivial_loop_terminator() to nir.h so we can use it to assert is
  the loop unroll pass
- fix analysis to not bail when looking for terminator when the break is in the else
  rather then the if
- added new loop terminator fields: break_block, continue_from_block and
  continue_from_then so we don't have to gather these when doing unrolling.
- get correct array length when forcing unrolling of variables
  indexed arrays that are the same size as the iteration count
- add support for induction variables of type float
- update trival loop terminator check to allow an if containing
  instructions as long as both branches contain only a single
  block.

V9: (Timothy)
 - bunch of tidy ups and simplifications suggested by Jason.
 - rewrote trivial terminator detection, now the only restriction is there
   must be no nested jumps, anything else goes.
 - rewrote the iteration test to use nir_eval_const_opcode().
 - count instruction properly even when forcing an unroll.
 - bunch of other tidy ups and simplifications.

V10: (Timothy)
 - some trivial tidy ups suggested by Jason.
 - conditional fix for break inside continue branch by Jason.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-23 10:15:36 +11:00
Timothy Arceri
eda3ec7957 i965: use nir_lower_indirect_derefs() for GLSL
This moves the nir_lower_indirect_derefs() call into
brw_preprocess_nir() so thats is called by both OpenGL and Vulkan
and removes that call to the old GLSL IR pass
lower_variable_index_to_cond_assign()

We want to do this pass in nir to be able to move loop unrolling
to nir.

There is a increase of 1-3 instructions in a small number of shaders,
and 2 Kerbal Space program shaders that increase by 32 instructions.
The changes seem to be caused be the difference in the GLSL IR vs
NIR variable index lowering passes. The GLSL IR pass creates a
simple if ladder for arrays of size 4 or less, while the NIR pass
implements a binary search for all arrays regardless of size.

Shader-db results BDW:

total instructions in shared programs: 13021176 -> 13021819 (0.00%)
instructions in affected programs: 57693 -> 58336 (1.11%)
helped: 20
HURT: 190

total cycles in shared programs: 299805580 -> 299750826 (-0.02%)
cycles in affected programs: 2290024 -> 2235270 (-2.39%)
helped: 337
HURT: 442

total fills in shared programs: 19984 -> 19984 (0.00%)
fills in affected programs: 0 -> 0
helped: 0
HURT: 0

LOST:   4
GAINED: 0

V2: remove the do_copy_propagation() call from the i965 GLSL IR
linking code. This call was added in f7741c5211 but since we are
moving the variable index lowering to NIR we no longer need it and
can just rely on the nir copy propagation pass.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-23 10:15:36 +11:00
Timothy Arceri
976859ce57 i965: allow sampler indirects on all gens
Without this we will regress the max-samplers piglit test on Gen6
and lower when loop unrolling is done in NIR. There is a check
in the GLSL IR linker that errors when it finds indirects and
EmitNoIndirectSampler is set.

As far as I can tell there is no reason for not enabling this for
all gens regardless of whether they fully support ARB_gpu_shader5
or not.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-23 10:15:35 +11:00
Jason Ekstrand
a620f66872 nir: Add a couple quick-and-dirty out-of-SSA helpers
These are designed for use within an optimization pass when SSA becomes
more pain than it's worth.  They're very naive and don't generate
anything close to optimal register-based NIR.  Also, they may result in
shaders which do not validate because of, for instance, registers in phi
sources.  However, the register-based into-SSA pass should be pretty
efficient at cleaning up the mess.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-12-23 10:15:35 +11:00
Arda Coskunses
99de7b7525 vulkan/wsi/x11: don't crash on null wsi x11 connection
Without this check driver crash when application window
closed unexpectedly.

Acked-by: Edward O'Callaghan <funfunctor@folklore194.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-12-22 14:09:46 -08:00
Arda Coskunses
01dd363e67 vulkan/wsi/x11: don't crash on null visual
When application window closed unexpectedly due to
lost window visualtypes getting invlaid parameters
which is causing a crash. Necessary check is added
to prevent the crash.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-12-22 14:09:34 -08:00
Christian Inci
7a4ea95f1c radeonsi: Bugfix needed for hashcat
Hashcat needs MAX_GLOBAL_BUFFERS to be 21 or even 22 for some modes. It'll crash otherwise.
I'm adding an assert to see if programs need it to be even higher.

Signed-off-by: Christian Inci <chris.bugsfd@broke-the-inter.net>
[Handle first properly; should be NFC, since clover always uses first == 0.]
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-22 17:11:43 +01:00
Nicolai Hähnle
eca57f85ee radeonsi: fix gl_ClipDistance and gl_ClipVertex for points
The clipper hardware doesn't consider points as primitives that can be
clipped. Simply setting the corresponding cull bits works, and should not
have an adverse effect on other primitive types according to the hardware
team.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-12-22 16:59:58 +01:00
Nicolai Hähnle
3778a10d37 radeonsi: only set VS_OUT_MISC_SIDE_BUS_ENA when the misc vector is used
Should have no effect (other than perhaps on power consumption), but
Vulkan does this.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-12-22 16:58:53 +01:00
Vinson Lee
ede8c02ab0 llvmpipe: Link tests with CLOCK_LIB.
Fix linking error with 'make check'.

  CXXLD  lp_test_format
../../../../src/gallium/auxiliary/.libs/libgallium.a(os_time.o): In function `os_time_get_nano':
src/gallium/auxiliary/os/os_time.c:59: undefined reference to `clock_gettime'

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
2016-12-21 17:23:05 -08:00
Fredrik Höglund
27a8aab882 radv: fix dual source blending
Add the index to the location when assigning driver locations for
output variables.

Otherwise two fragment shader outputs declared as:

   layout (location = 0, index = 0) out vec4 output1;
   layout (location = 0, index = 1) out vec4 output2;

will end up aliasing one another.

Note that this patch will make the second output variable in the above
example alias a possible third output variable with location = 1 and
index = 0. But this shouldn't be a problem in practice since only one
color attachment is supported when dual-source blending is used.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-12-22 02:07:17 +01:00
Dave Airlie
877202b6dc radv: enable shaderStorageImageExtendedFormats
This passes all the CTS tests that get enabled for this.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-22 10:29:15 +10:00
Dave Airlie
a3ca2a9b7b radv: enable shaderGatherImageExtended
Thanks to Ilia's patch this works fine on radv.

No regressions in CTS, all enabled tests pass.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-22 09:48:18 +10:00
Dave Airlie
56020c7a7c radv/image: only touch queue family info for concurrent images.
The spec says to ignore these fields for exclusive images.

Fixes crashes in:
dEQP-VK.clipping.*

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-21 23:33:04 +00:00
Dave Airlie
9d23b8a18e radv: flush smem for uniform buffer bit.
(cc'ing stable as I'd like to backport the ubo speedup as well)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-21 22:31:14 +00:00
Junwei Zhang
13ae47234a radeonsi: add Polaris12 PCI ID
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-21 15:10:54 -05:00
Junwei Zhang
018ead4266 radeonsi: add Polaris12 support (v3)
v2: use gfxip names for llvm 4.0+
v3: use tonga for llvm <= 3.8, drop gfxip name,
we can just change that we change the other asics.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2016-12-21 15:10:03 -05:00
Ian Romanick
15c8f322ca glsl: Eliminate the open-coded version of process_block_array_leaf
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-12-21 10:24:45 -08:00
Juan A. Suarez Romero
415f5f09e3 ttn: handle GLSL_SAMPLER_DIM_SUBPASS_MS case
Fixes a warning.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-12-21 12:44:25 +01:00
Juan A. Suarez Romero
c32a9ec5f5 i965: allow unsourced enabled VAO
The GL 4.5 spec says:
    "If any enabled array’s buffer binding is zero when DrawArrays
    or one of the other drawing commands defined in section 10.4 is
    called, the result is undefined."

This commits avoids crashing the code, which is not a very good
"undefined result".

This fixes spec/!opengl 3.1/vao-broken-attrib piglit test.
2016-12-21 12:37:22 +01:00
Edward O'Callaghan
8801734da7 svga: Fix a strict-aliasing violation in shader dumper
As per the C spec, it is illegal to alias pointers to different
types. This results in undefined behaviour after optimization
passes, resulting in very subtle bugs that happen only on a
full moon..

Use a memcpy() as a well defined coercion between the isomorphic
bit-field interpretations of memory.

V.2: Use C99 compat STATIC_ASSERT() over C11 static_assert().

Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-12-21 15:00:21 +11:00
Roland Scheidegger
e827d91756 draw: use SoA fetch, not AoS one
Now that there's some SoA fetch which never falls back, we should always get
results which are better or at least not worse (something like rgba32f will
stay the same).

For cases which get way better, think something like R16_UNORM with 8-wide
vectors: this was 8 sign-extend fetches, 8 cvt, 8 muls, followed by
a couple of shuffles to stitch things together (if it is smart enough,
6 unpacks) and then a (8-wide) transpose (not sure if llvm could even
optimize the shuffles + transpose, since the 16bit values were actually
sign-extended to 128bit before being cast to a float vec, so that would be
another 8 unpacks). Now that is just 8 fetches (directly inserted into
vector, albeit there's one 128bit insert needed), 1 cvt, 1 mul.

v2: ditch the old AoS code instead of just disabling it.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-12-21 04:48:24 +01:00
Roland Scheidegger
cb81460dcc gallivm: generalize the compressed format soa fetch a bit
This can now handle rgtc (unorm) too - this path no longer handles plain
formats, but that's unnecessary they now all have their proper SoA unpack
(this will still be dog-slow though due to the actual fetch being per-pixel
util fallbacks).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-12-21 04:48:24 +01:00
Roland Scheidegger
3c98e3cd63 gallivm: provide soa fetch path handling formats with more than 32bit
This previously always fell back to AoS conversion. Even for 4-float formats
(which is the optimal case by far for that fallback case) this was suboptimal,
since it meant the conversion couldn't be done with 256bit vectors. While this
may still only be partly possible for some formats, (unless there's AVX2
support) at least the transpose can be done with half the unpacks
(and before using the transpose for AoS fallbacks, it was worse still).
With less than 4 channels, things got way worse with the AoS fallback
quickly even with 128bit vectors.
The strategy is pretty much the same as the existing one for formats
which fit into 32 bits, except there's now multiple vectors to be
fetched (2 or 4 to be exact), which need to be shuffled first (if it's 4
vectors, this amounts to a transpose, for 2 it's a bit different),
then the unpack is done the same (with the exception that the shift
of the channels is now modulo 32, and we need to select the right
vector).
In fact the most complex part about it is to get the shuffles right
for separating into lo/hi parts for AVX/AVX2...
This also makes use of the new ability of gather to use provided type
information, which we abuse to outsmart llvm so we get decent shuffles,
and to fetch 3x32bit vectors without having to ZExt the scalar.
And just because we can, we handle double formats too, albeit they are
a bit different (draw sometimes needs to handle that).
v2: fix typo float/int bug (generating inefficient code).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-12-21 04:48:24 +01:00
Roland Scheidegger
8bd67a35c5 gallivm: optimize gather a bit, by using supplied destination type
By using a dst_type in the the gather interface, gather has some more
knowledge about how values should be fetched.
E.g. if this is a 3x32bit fetch and dst_type is 4x32bit vector gather
will no longer do a ZExt with a 96bit scalar value to 128bit, but
just fetch the 96bit as 3x32bit vector (this is still going to be
2 loads of course, but the loads can be done directly to simd vector
that way).
Also, we can now do some try to use the right int/float type. This should
make no difference really since there's typically no domain transition
penalties for such simd loads, however it actually makes a difference
since llvm will use different shuffle lowering afterwards so the caller
can use this to trick llvm into using sane shuffle afterwards (and yes
llvm is really stupid there - nothing against using the shuffle
instruction from the correct domain, but not at the cost of doing 3 times
more shuffles, the case which actually matters is refusal to use shufps
for integer values).
Also do some attempt to avoid things which look great on paper but llvm
doesn't really handle (e.g. fetching 3-element 8 bit and 16 bit vectors
which is simply disastrous - I suspect type legalizer is to blame trying
to extend these vectors to 128bit types somehow, so fetching these with
scalars like before which is suboptimal due to the ZExt).

Remove the ability for truncation (no point, this is gather, not conversion)
as it is complex enough already.

While here also implement not just the float, but also the 64bit avx2
gathers (disabled though since based on the theoretical numbers the benefit
just isn't there at all until Skylake at least).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-12-21 04:48:24 +01:00
Roland Scheidegger
5b950319ce gallivm: optimize SoA AoS fallback fetch path a little
We should do transpose, not extract/insert, at least with "sufficient" amount
of channels (for 4 channels, extract/insert shuffles generated otherwise look
truly terrifying). Albeit we shouldn't fallback to that so often in any case.
v2: ditch the extract/insert path, not worth keeping (we're going to avoid
hitting the fallback that often with future patches).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-12-21 04:48:24 +01:00
Roland Scheidegger
d7d23aee4b gallivm: (trivial) handle non-aligned fetch for lp_build_fetch_rgba_soa
soa fetch so far always assumed that data was aligned. However, we want to
use this for vertex fetch, and data might not be aligned there, so handle
it in this path too (basically just pass through alignment through to other
functions). (It looks like it wouldn't work for for cached s3tc but this is
no different than with AoS fetch.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-12-21 04:48:24 +01:00
Axel Davy
123e947228 st/nine: Upload on secondary context for Draw*Up
Avoid synchronization by using the secondary context
for uploading the vertex data for Draw*Up.

v2: Rely on u_upload_mgr to use persistent coherent
buffers. Do not flush.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:47:08 +01:00
Axel Davy
0ec4e5f630 st/nine: Dirty MANAGED buffers at Lock time
Tests suggest MANAGED buffers are made dirty
at Lock time, not at Unlock time.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:47:08 +01:00
Axel Davy
bad7f7cc63 st/nine: Implement new buffer upload path
This new buffer upload path enables to lock
faster than the normal path when using
DISCARD/NOOVERWRITE.

v2: Diverse cleanups and fixes.
v3: Fix allocation size for 'lone' buffers and
add more debug info.
v4: Rewrite of the path to handle when DISCARD/NOOVERWRITE
is not used anymore. The resource content is copied to the
new resource used.
v5: flush for safety after unmap (not sure it is really required
here, but safer to flush).
v6: Do not use the path if persistent coherent mapping is unavailable.
Fix buffer creation flags.
v7: Do not flush since it is not needed.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:47:08 +01:00
Axel Davy
8960be0e93 st/nine: Allow non-zero resource offset for vertex buffers
Next patches will introduce an offset.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:47:08 +01:00
Axel Davy
1e64be6f91 st/nine: Do not wait for DEFAULT lock for volumes when we can
If the volumes (and the texture container) are not referenced,
then they are no pending operations on them. We can lock directly.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:47:08 +01:00
Axel Davy
b4f16615ef st/nine: Do not wait for DEFAULT lock for surfaces when we can
If the surfaces (and the texture container) are not referenced,
then they are no pending operations on them. We can lock directly.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:47:08 +01:00
Axel Davy
525a1b292a st/nine: Add arguments to context's blit and copy_region
The new arguments enable to reference the objects while
the function hasn't run.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:47:08 +01:00
Axel Davy
325324c749 st/nine: Idem for nine_context_gen_mipmap
Will enable to use the bind count as an information for
whether the surface/volume is used in the worker thread.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:47:08 +01:00
Axel Davy
7089d88199 st/nine: Bind destination for surface/volume uploads
Will enable to use the bind count as an information for
whether the surface/volume is used in the worker thread.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:47:08 +01:00
Axel Davy
d4a9b21feb st/nine: Use nine_context_box_upload for volumes
Use nine_context_box_upload for uploads:
. systemmem volume to default volume
. managed volume internal content to its resource.

Check the uploads are executed before any action
that can alter the data, that is LockBox and
volume destruction.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:47:08 +01:00
Axel Davy
f042639231 st/nine: Fix leak with volume dtor
The last level was not released.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:47:08 +01:00
Axel Davy
76e392d852 st/nine: Fix leak with cubetexture dtor
The last level was not released.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:47:08 +01:00
Axel Davy
fec0b7f067 st/nine: Use nine_context_box_upload for surfaces
Use nine_context_box_upload for uploads:
. systemmem surface to default surface
. managed surface internal content to its resource.

Check the uploads are executed before any action
that can alter the data, that is LockRect,
NineSurface9_CopyDefaultToMem and surface destruction.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:47:08 +01:00
Axel Davy
c873a2bd0c st/nine: Implement nine_context_box_upload
This function will be used for surface and volume uploads

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:47:08 +01:00
Axel Davy
cadc7a5d94 st/nine: Use nine_context_gen_mipmap in BaseTexture9
Generate mipmaps in the worker thread.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:47:08 +01:00
Axel Davy
8d3e0f2187 st/nine: Implement nine_context_gen_mipmap
To offload mipmap generation as well.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:47:08 +01:00
Axel Davy
16b6fb65ae st/nine: Optimize managed buffer upload
Do the upload in the other thread.

Usually managed buffers are used once per frame.
It is then very likely pending_upload is 0 at Lock
time.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:47:08 +01:00
Axel Davy
a78b5f4378 st/nine: Implement nine_context_range_upload
Will be used to upload buffers.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:47:08 +01:00
Axel Davy
1843e36b03 st/nine: Do not bind the container if forward is false
This doesn't make sense to bind the container in that specific case.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:47:08 +01:00
Axel Davy
2fc8ef1401 st/nine: Comment and simplify iunknown
The behaviour is a bit less obscure now.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:47:08 +01:00
Axel Davy
098ba64c4c st/nine: Detach buffers in swapchain dtor.
BackBuffers can survive swapchain dtor if
the user has a reference on them.

The swapchain itself has no reference on the buffer.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:47:08 +01:00
Axel Davy
14875ebd83 st/nine: Fix NineUnknown_Detach
We don't bind the container in AddRef.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:47:08 +01:00
Axel Davy
930f479acf st/nine: Simplify ARG_BIND_REF
Remove some noop operations.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:47:08 +01:00
Axel Davy
9c4b4e8809 st/nine: Avoid flushing the queue for queries GetData
Use the newly introduced counter to know when we don't
need synchronization.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:47:08 +01:00
Patrick Rudolph
8a69343f1e st/nine: Add CSMT_NO_WAIT_WITH_COUNTER
Similar to the other macros, but introduces a counter,
which enables to know when the instructions has been
executed.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
2016-12-20 23:47:08 +01:00
Axel Davy
884166a251 st/nine: Use nine_context_clear_render_target
Enables to not wait for the worker thread for ColorFill.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:47:08 +01:00
Axel Davy
7b154ac04d st/nine: Optimize ColorFill
When we lock the whole surface to overwrite it, we can use DISCARD.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:23 +01:00
Axel Davy
9bf1da05d9 st/nine: Simplify ColorFill
For render targets, NineSurface9_GetSurface is not
expected to fail.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:23 +01:00
Axel Davy
31262bbce0 st/nine: use get_pipe_acquire/release when possible
Use the acquire/release semantic when we don't need
to wait for any pending command.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:23 +01:00
Axel Davy
22f6d6fbd2 st/nine: Implement Fast path for dynamic buffers and csmt
Use the secondary pipe for DISCARD/NOOVERWRITE, which
avoids stalling to get the pipe from the worker thread.

v2: flush at unmap. This is required for example if
the driver does hidden draw calls or copies. In the case
of unsynchronized it is probably not required, but
it is more safe.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:23 +01:00
Axel Davy
3e8234fff4 st/nine: Add secondary pipe for device
The secondary pipe will be used for operations
that don't need synchronization.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:23 +01:00
Axel Davy
7a7eeefd7d st/nine: Add nine_context_get_pipe_acquire/release
See commit for description.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:23 +01:00
Axel Davy
ddb6f1d2d1 st/nine: SYSTEMMEM ignores DISCARD.
Tests show SYSTEMMEM should ignore DISCARD.

Prevents game bugs with following patches reimplementing
DISCARD. Halo is affected.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:23 +01:00
Axel Davy
4f344db8b0 st/nine: Upload Managed buffers just before draw call using them
Previously we were uploading Managed buffers at the next draw call
after they were set dirty.
This is not the expected behaviour. Instead upload just before
draw call needing the content.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:23 +01:00
Axel Davy
e52aded87f st/nine: Track bindings for buffers
Similar code than for textures.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:23 +01:00
Axel Davy
62068c9d90 st/nine: Fix BASETEX_REGISTER_UPDATE
BASETEX_REGISTER_UPDATE was adding the texture to the list
of textures to upload in too many cases. tex->base.base.bind
will be set to true if the texture is in a stateblock, whereas
we want to upload only if bound to the device, which is
what bind_count is for.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:23 +01:00
Axel Davy
804b28cdc4 st/nine: Simplify the logic to bind textures
This makes the code more readable.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:23 +01:00
Patrick Rudolph
fef23f6712 st/nine: Use nine_context for resource_copy_region
Use nine_context wrapper for resource_copy_region.
Enables to offload it with CSMT.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
2016-12-20 23:44:23 +01:00
Patrick Rudolph
c8913a06b4 st/nine: Use nine_context for blit
Enables to offload it with CSMT.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
2016-12-20 23:44:23 +01:00
Patrick Rudolph
0fd5730613 st/nine: Add NINE_DEBUG=tid to turn threadid on or off
To ease debugging.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
2016-12-20 23:44:23 +01:00
Patrick Rudolph
3098bf03a6 st/nine: Print threadid in debug log
To ease debugging.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
2016-12-20 23:44:23 +01:00
Patrick Rudolph
ac2927335b st/nine: Implement gallium nine CSMT
Use an offloading thread for all nine_context functions.
Macros are used to ease the reading of the code.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:23 +01:00
Axel Davy
2c371a25a8 st/nine: Call GetPipe for implicit pipe usages
With csmt, every usage of the pipe in the main thread
has to be protected by calling GetPipe.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:23 +01:00
Patrick Rudolph
1277ceefd1 st/nine: Add struct nine_clipplane
Required to know the size exact size of the plane.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
2016-12-20 23:44:22 +01:00
Patrick Rudolph
3af17a671d st/nine: Add nine_queue
This queue mechanism will be used for CSMT.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
2016-12-20 23:44:22 +01:00
Axel Davy
e068d3afe1 st/nine: Create pipe_surfaces on resource creation.
Create the pipe_surfaces on renderable resources creation.

This enables to avoid creating them on the fly.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
bb666b0297 st/nine: Back swvp in nine_context
Part of the refactor to move all gallium calls to
nine_state.c, and have all internal states required
for those calls in nine_context.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
f5f881fd3e st/nine: Change the way nine_shader gets the pipe
The change is required with csmt, where depending on the thread
you don't access the pipe the same way.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
97e4b65e7f st/nine: Reimplement nine_context_apply_stateblock
The new version uses nine_context functions instead of
applying the changes directly to nine_context.
This will enable it to work with CSMT.

v2: Fix nine_context_light_enable_stateblock
The memcpy arguments were wrong, and the state
wasn't set dirty.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
8d967abb98 st/nine: Decompose nine_context_set_texture
Part of the refactor to move all gallium calls to
nine_state.c, and have all internal states required
for those calls in nine_context.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
69f447752d st/nine: Decompose nine_context_set_indices
Part of the refactor to move all gallium calls to
nine_state.c, and have all internal states required
for those calls in nine_context.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
08b717dfd3 st/nine: Decompose nine_context_set_stream_source
Part of the refactor to move all gallium calls to
nine_state.c, and have all internal states required
for those calls in nine_context.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
7ebdbb573b st/nine: Do not use NineBaseTexture9 in nine_context
Some fields are subject to modification outside of nine_context
(SetLod, etc).

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
152d007769 st/nine: Move Managed Pool handling out of nine_context
Part of the refactor to move all gallium calls to
nine_state.c, and have all internal states required
for those calls in nine_context.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
eb884a4ac2 st/nine: Integrate nine_pipe_context_clear to nine_context_clear
Part of the refactor to move all gallium calls to
nine_state.c, and have all internal states required
for those calls in nine_context.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
b95205b1f2 st/nine: Move pipe and cso to nine_context
Part of the refactor to move all gallium calls to
nine_state.c, and have all internal states required
for those calls in nine_context.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
66ad5b1592 st/nine: Rename pipe to pipe_data in nine_context
This patch it to avoid name conflict when device->pipe
will be moved to nine_context.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
fc49f7df89 st/nine: Rename cso in nine_context to cso_shader
This patch it to avoid name conflict when device->cso
is moved to nine_context.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
c7237e2c5c st/nine: Access pipe_context via NineDevice9_GetPipe
Except for nine_ff and nine_state.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
4a4eba8c05 st/nine: Remove NineDevice9_GetCSO
Was useless. Remove useless usage in
swapchain9.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
6a7541a5aa st/nine: Move query9 pipe calls to nine_context
This will enable to use threading for them.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
0a5252d25b st/nine: Use atomics for nine_bind
nine_bind didn't need atomics up to now,
because it's use what always within a protected
mutex. We need to use atomics because with the
next patches several threads may use nine_bind.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
b748b8fd86 st/nine: Track dirty state groups in nine_context
Part of the refactor to move all gallium calls to
nine_state.c, and have all internal states required
for those calls in nine_context.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
a0a18920c7 st/nine: Back User Clip Planes to nine_context
Part of the refactor to move all gallium calls to
nine_state.c, and have all internal states required
for those calls in nine_context.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
c6ca7c747e st/nine: Back ps to nine_context
Part of the refactor to move all gallium calls to
nine_state.c, and have all internal states required
for those calls in nine_context.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
d671190df9 st/nine: Back ds to nine_context
Part of the refactor to move all gallium calls to
nine_state.c, and have all internal states required
for those calls in nine_context.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
1a735a99d0 st/nine: Back all ff states in nine_context
Part of the refactor to move all gallium calls to
nine_state.c, and have all internal states required
for those calls in nine_context.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
bb62ea925a st/nine: Refactor LightEnable
Call a helper function.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
cbe370020e st/nine: Refactor SetLight
Call a helper function to set the light.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
c5af96aebd st/nine: Put ff data in a separate structure
And make nine_state_access_transform take this
new structure as input.
Part of the refactor to move all gallium calls to
nine_state.c, and have all internal states required
for those calls in nine_context.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
4a6d83ebc2 st/nine: Back viewport to nine_context
Part of the refactor to move all gallium calls to
nine_state.c, and have all internal states required
for those calls in nine_context.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
9498613607 st/nine: Back scissor to nine_context
Part of the refactor to move all gallium calls to
nine_state.c, and have all internal states required
for those calls in nine_context.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
7f6e01052b st/nine: Back RT to nine_context
Part of the refactor to move all gallium calls to
nine_state.c, and have all internal states required
for those calls in nine_context.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
aafbd62955 st/nine: Back current index buffer to nine_context
Part of the refactor to move all gallium calls to
nine_state.c, and have all internal states required
for those calls in nine_context.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
b13b217243 st/nine: Back all shader constants to nine_context
For device vs shader float constants and may_swvp,
the same tips than for the other constant types is
used.

Also memset the constants properly.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
93ac6dfdcc st/nine: Back sampler states to nine_context
Part of the refactor to move all gallium calls to
nine_state.c, and have all internal states required
for those calls in nine_context.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:22 +01:00
Axel Davy
2a698c3df2 st/nine: Back vs to nine_context
And move programmable_vs storage and computation.
Part of the refactor to move all gallium calls to
nine_state.c, and have all internal states required
for those calls in nine_context.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
43288cf376 st/nine: Back vdecl to nine_context
Part of the refactor to move all gallium calls to
nine_state.c, and have all internal states required
for those calls in nine_context.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
63633e2a08 st/nine: Move stream freq data to nine_context
Part of the refactor to move all gallium calls to
nine_state.c, and have all internal states required
for those calls in nine_context.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
848ffc81e4 st/nine: Move vtxbuf to nine_context
Part of the refactor to move all gallium calls to
nine_state.c, and have all internal states required
for those calls in nine_context.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
aea7a019ef st/nine: Move stream_usage_mask to nine_context
Part of the refactor to move all gallium calls to
nine_state.c, and have all internal states required
for those calls in nine_context.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
eed47b748f st/nine: Back textures into nine_context
Part of the refactor to move all gallium calls to
nine_state.c, and have all internal states required
for those calls in nine_context.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
6bbb7b9fc5 st/nine: Move texture setting to nine_context_*
And move samplers_shadow to nine_context.
Part of the refactor to move all gallium calls to
nine_state.c, and have all internal states required
for those calls in nine_context.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
c1871e829a st/nine: Track changed.texture only for stateblocks
Part of the refactor to move all gallium calls to
nine_state.c, and have all internal states required
for those calls in nine_context.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
64e232bd60 st/nine: Move draw calls to nine_state
Part of the refactor to move all gallium calls to
nine_state.c, and have all internal states required
for those calls in nine_context.

v2: Release buffers for Draw*Up functions in device9.c,
instead of nine_context. This prevents a leak with csmt
where the wrong pointers were released.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
f72d8719eb st/nine: Move core of device clear to nine_state
Part of the refactor to move all gallium calls to
nine_state.c, and have all internal states required
for those calls in nine_context.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
1b24d5e1f5 st/nine: Introduce nine_context
nine_context is a new structure which goal will be
to contain all internal states. It will be the states
of the second thread in the to-be-introduced CSMT mode.

This patch moves several internal states to nine_context,
while the next patches add the other fields.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
e3c59fbd25 st/nine: Implement WFOG properly
We were advertising support for WFOG (like all win drivers),
but we weren't implementing it.
This patch implements the behaviour. See comments.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
b40f12ebf0 st/nine: Fix ff texture coordinate selection
The code was wrongly detecting which texture coordinates
to generate when the coordinate index was different to
the stage index.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
c75415224f st/nine: Convert redundant check to assert in ff ps
We disable the alpha stage if the color stage is disabled.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
32f6f91617 st/nine: Fix two special cases in ff ps
if first alpha stage is disabled and writes to temp,
diffuse alpha is written to temp.
Last stage always writes to current.

Behaviour was deduced by tests with a test app.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
1efdc8f594 st/nine: Remove useless code in ff ps
Current is already initialized to Diffuse.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
7afea63d4f st/nine: Fix ff cases when stages should be disabled
When a texture is read by a stage for colorop, it should
be disabled, and disable following stages.

When a texture is read for alphaop, 1.0f is read for the input,
which is the behaviour for a dummy texture.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
191b90a35c st/nine: Always initialize current in ff ps
The check was not catching all possible cases.
NVE4 should be fine.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
1ee978fa50 st/nine: Fix check for ff specular
Fix the check for computing ff specular.

This seems to match the opengl behavior,
and give the correct output on windows.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
89716b0b38 st/nine: Do not saturate illumination coefficients in ff
Fixes bad rendering of a test app.
Wine has the same behaviour.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
877dc0308f st/nine: Fix ff COLOR0 w component computation
The computation was wrong. COLOR0's last component
should be equal to the material diffuse w component.

The behaviour was checked with a test app on Windows.
Wine has the same behaviour.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
e94ac230a2 st/nine: Fix specular enable for alpha
Apparently specular enable doesn't affect
the alpha channel.

Fixes https://github.com/iXit/Mesa-3D/issues/253

Behaviour comfirmed looking in wine sources.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
85811d0e87 st/nine: Ignore MULTISAMPLEMASK when RT is not multisampled
We were ignoring MULTISAMPLEMASK for non-maskable multisample
modes, but we were missing the non-multisampled case.

Fixes a crash in Halo.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
bce9fe8db2 driconf: Fix missing gettext
DRI_CONF_NINE_OVERRIDEVENDOR was missing gettext for the
description.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
59048e7548 st/nine: Add new driconf options to control DISCARD behaviour
See the patch for the new controls added.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
06657fa203 st/nine: Rework buffer presentation path
Use the new API for DISCARD.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
35ea402a24 st/nine: Fix a leak in Swapchain dtor
Count properly the number of backbuffers,
and use the new info to release the correct
number of buffers

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
f78cbbdfaa d3dadapter/present: Add precision for WaitBufferReleased
Add precision on the behaviour of WaitBufferReleased.
All implementers and users of the API were expecting
that behaviour.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
0eef5491d3 d3dadapter/present: Add new API to ID3DPresent
The API will enable better support for the commonly
used DISCARD swapchain parameter.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
b2f17e5f62 st/nine: Silent warnings with guid_str
In non-debug build, the variables are unused,
and thus trigger a compilation warning.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
2cd8622fb3 st/nine: Do not generate gallium NOP on d3d NOP
Some drivers crash if NOP is generated.
Besides there is no point to generate NOP.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
461e03167e st/nine: Fix leak in user constant upload path
The new code properly releases the previous buffers
allocated.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
c3e5140142 st/nine: Correctly release sw cursor image
cursor.image is used for software cursor
emulation. It wasn't released.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
9c0f65e08a st/nine: Handle when cursor stride is not what is expected
SetCursor assumes for now a 32x32 argb cursor with pitch 128.
32x32 argb doesn't have pitch 128 on all hw, thus use a
temporary surface with the correct pitch when needed.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:21 +01:00
Axel Davy
e7a0f580a6 st/nine: Avoid crash on empty Draw*Up
Ignore empty draw calls.
Avoid assertion fault when such draw calls happen
in u_upload_mgr.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:20 +01:00
Axel Davy
ada0c2ceaa st/nine: Capture texturestage states in pixel stateblocks
pixels stateblocks need to capture these.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:20 +01:00
Axel Davy
8b021be769 st/nine: Add missing changed states to pixel stateblocks
Some states were not properly recorded in pixel stateblocks.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:20 +01:00
Axel Davy
b3b593b83b st/nine: Add some debug info in stateblocks
This is useful to check what is exactly recorded.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:20 +01:00
Axel Davy
fad0f147fb st/nine: Remove useless check in surface9 ctor
Textures already have the check in BaseTexture9.
Non-Textures cannot be in the MANAGED Pool.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:20 +01:00
Axel Davy
503d729029 st/nine: Fix bad light initialization in stateblocks
src was initialized instead of dst.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:20 +01:00
Axel Davy
9be94d5c1a st/nine: Remove unused ff.changed.group
It was unused.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:20 +01:00
Axel Davy
638b70985d st/nine: Fix ps multisample check
We want to use centroid for nonmaskable
multisampling as well.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:20 +01:00
Axel Davy
d38215fb17 st/nine: Fix useless swapchain init checks
In NineDevice9_SetDefaultState we can assume the
implicit swapchain is properly initialized.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:20 +01:00
Axel Davy
409ad78777 st/nine: Don't update stream_usage_mask in sw path
The variable is used only in the hw path.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:20 +01:00
Axel Davy
0630d3600b st/nine: Remove useless call to nine_update_state
The call was not needed.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:20 +01:00
Axel Davy
494ace4b85 st/nine: Add validation to SetSamplerState
Check value validity and mimick Win behaviour.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:20 +01:00
Axel Davy
f4d5bc2555 st/nine: Improve doc of D3DPMISCCAPS_POSTBLENDSRGBCONVERT
The cap should be advertised for d3d10 able cards,
but only for Ex contexts.

Unfortunately at this point Mesa has no way to know if
Ex is used or not (the info is got later).

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-12-20 23:44:20 +01:00
Axel Davy
c4268fd175 gallium-docs: Add documentation for when using several contexts
Add documentation to explicit what can be expected and what is allowed
when using several contexts.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-20 23:44:20 +01:00
Axel Davy
1736ef6570 gallium-docs: Add documentation for threading requirements
Add documentation for the requirements related to threading
for screens and contexts.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-20 23:44:20 +01:00
Chad Versace
fbb4af96c6 egl: Check config's surface types in eglCreate*Surface()
If the provided EGLConfig does not support the requested surface type,
then emit EGL_BAD_MATCH.

Fixes dEQP-EGL.functional.negative_api.create_pbuffer_surface
on GBM.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-12-20 11:53:31 -08:00
Kenneth Graunke
62b8bcda1c glsl: Use ir_var_temporary when generating inline functions.
We were using ir_var_auto for the inlined function parameter variables,
which is wrong, as it suggests that those are real variables declared
by the program.

Normally this doesn't matter.  However, if you called built-ins at
global scope, it would pollute the global variable namespace with
these new parameter temporaries.  If the shader already had variables
with those names, the linker might see contradictory global variable
declarations and raise an error.

Making them temporaries indicates that these are just things generated
by the compiler internally.  This avoids confusing the linker.

Fixes a new Piglit test: glsl-fs-multiple-builtins.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99154
Reported-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-12-20 11:41:29 -08:00
Kenneth Graunke
8fc5443a2b i965: Don't bail on vertex element processing if we need draw params.
BaseVertex, BaseInstance, DrawID, and some edge flag conditions need
vertex buffer and elements structs.  We can't bail early in this case.

Gen4-7 already do this properly.  Gen8+ did not.

Thanks to Ilia Mirkin for helping track this down.

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99144
Reported-by: Pierre-Eric Pelloux-Prayer <pelloux@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-20 11:41:28 -08:00
Jonathan Gray
d74c3e55b3 mesa: don't attempt to unlock an unlocked debug state mutex
Commit 929fcee47e introduced code that
attempts to unlock an unlocked mutex which is undefined behaviour.

On OpenBSD this leads to an abort:

0  0x0000124dadfa96ba in thrkill () at <stdin>:2
1  0x0000124dadf3da39 in *_libc_abort () at /usr/src/lib/libc/stdlib/abort.c:52
2  0x0000124d2c1165b5 in *_libpthread_pthread_mutex_unlock (mutexp=<optimized out>)
    at /usr/src/lib/librthread/rthread_sync.c:221
3  0x0000124d279c02e4 in init_attrib_groups (ctx=0x124df0fda000) at main/context.c:825
4  _mesa_initialize_context (ctx=ctx@entry=0x124df0fda000, api=api@entry=API_OPENGL_CORE,
    visual=visual@entry=0x7f7ffffbdfd0, share_list=share_list@entry=0x0,
    driverFunctions=driverFunctions@entry=0x7f7ffffbda60) at main/context.c:1204
5  0x0000124d27b507ec in st_create_context (api=api@entry=API_OPENGL_CORE,
    pipe=pipe@entry=0x124dc4910000, visual=visual@entry=0x7f7ffffbdfd0,
    share=share@entry=0x0, options=options@entry=0x7f7ffffbe128)
    at state_tracker/st_context.c:545
6  0x0000124d27b8639f in st_api_create_context (stapi=<optimized out>,
    smapi=0x124d1b608800, attribs=0x7f7ffffbe100, error=0x7f7ffffbe0fc, shared_stctxi=0x0)
    at state_tracker/st_manager.c:669
7  0x0000124d27cc5b9c in dri_create_context (api=<optimized out>, visual=0x124d8a0f8a00,
    cPriv=0x124de473f240, major_version=<optimized out>, minor_version=<optimized out>,
    flags=<optimized out>, notify_reset=false, error=0x7f7ffffbe2b4,
    sharedContextPrivate=0x0) at dri_context.c:123
8  0x0000124d27cc5029 in driCreateContextAttribs (screen=0x124d8a0f8400,
    api=<optimized out>, config=0x124d8a0f8a00, shared=<optimized out>,
    num_attribs=<optimized out>, attribs=<optimized out>, error=0x7f7ffffbe2b4,
    data=0x124d77814a00) at dri_util.c:448
9  0x0000124d8e109b00 in drisw_create_context_attribs (base=0x124df3e08700,
    config_base=0x124d7a0e7300, shareList=<optimized out>, num_attribs=<optimized out>,
    attribs=<optimized out>, error=0x7f7ffffbe2b4) at drisw_glx.c:476
10 0x0000124d8e104b4a in glXCreateContextAttribsARB (dpy=0x124d533f0000,
    config=0x124d7a0e7300, share_context=0x0, direct=1, attrib_list=0x7f7ffffbe300)
    at create_context.c:78

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-12-20 11:41:04 -08:00
Dave Airlie
ab8ea1b3d4 glsl: allow invariant on fragment shader outputs.
From page 27 (page 33 of the PDF) of the GLSL 1.20 spec:

    " Only variables output from a vertex shader can be candidates for
      invariance."

But this later changes to:

From page 37 (page 43 of the PDF) of the GLSL 1.30 spec:

    " Only variables output from a shader can be candidates for
      invariance."

We can also find:

From page 37 (page 43 of the PDF) of the GLSL 1.30 spec:

    " Initially, by default, all output variables are allowed to be
      variant. To force all output variables to be invariant, use the
      pragma

        #pragma STDGL invariant(all)

      before all declarations in a shader. If this pragma is used after
      the declaration of any variables or functions, then the set of
      outputs that behave as invariant is undefined. It is an error to
      use this pragma in a fragment shader."

But this needs to be corrected and it is being addressed at:
https://cvs.khronos.org/bugzilla/show_bug.cgi?id=16140

Fixes GL45-CTS.shading_language_420pack.qualifier_order.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2016-12-20 11:44:34 +02:00
Timothy Arceri
f562b13bc7 i965: keep gl_program shader info in sync after gather info
It's possible that nir_shader was cloned and it no longer contains
a pointer to the shader_info in gl_program. So we need to copy
shader_info back to gl_program if that is the case.

Fixes a regression with NIR_TEST_CLONE=true

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98840
2016-12-20 15:43:53 +11:00
Ian Romanick
ee1f35eb69 nir: Trivial clean ups in the generated nir_constant_expressions.c
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-19 15:55:44 -08:00
Ian Romanick
3c7066c1ed nir: Silence unused parameter warnings in nir_constant_expression.c
nir/nir_constant_expressions.c:290:25: warning: unused parameter 'num_components' [-Wunused-parameter]
 evaluate_ball3(unsigned num_components, nir_const_value *_src)
                         ^
nir/nir_constant_expressions.c: In function 'evaluate_fddx':
nir/nir_constant_expressions.c:1282:57: warning: unused parameter '_src' [-Wunused-parameter]
 evaluate_fddx(unsigned num_components, nir_const_value *_src)
                                                         ^

v2: Unconditionally mark the parameters as MAYBE_UNUSED instead of
conditionally adding (void) casts to keep the generator simple.
Suggested by Jason.

Number of total warnings in my build reduced from 1575 to 1485
(reduction of 89).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-19 15:55:44 -08:00
Ian Romanick
4300693a07 nir: Silence missing field initializer warnings for vectors in nir_constant_expressions
nir/nir_constant_expressions.c: In function 'evaluate_ball2':
nir/nir_constant_expressions.c:279:7: warning: missing initializer for field 'z' of 'struct bool_vec' [-Wmissing-field-initializers]
       };
       ^
nir/nir_constant_expressions.c:234:10: note: 'z' declared here
    bool z;
          ^

Number of total warnings in my build reduced from 2532 to 2304
(reduction of 228).

v2: Initialize bool vectors with 0 instead of false to keep the
generator simpler.  Suggested by Ken.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-19 15:55:44 -08:00
Ian Romanick
8bfe397974 glsl: Silence "unused parameter" warnings in ast_type.cpp
glsl/ast_type.cpp: In function ‘bool validate_point_mode(YYLTYPE*, _mesa_glsl_parse_state*, const ast_type_qualifier&, const ast_type_qualifier&)’:
glsl/ast_type.cpp:173:30: warning: unused parameter ‘loc’ [-Wunused-parameter]
 validate_point_mode(YYLTYPE *loc,
                              ^~~
glsl/ast_type.cpp:174:45: warning: unused parameter ‘state’ [-Wunused-parameter]
                     _mesa_glsl_parse_state *state,
                                             ^~~~~

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-19 15:55:44 -08:00
Ian Romanick
d7aee96cc6 glsl: Trivial whitespace fixes in link_uniforms.cpp
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-19 15:55:44 -08:00
Ian Romanick
dd4fada6cd glsl: Silence unused parameter warning in propagate_invariance.cpp
glsl/propagate_invariance.cpp: In member function ‘virtual ir_visitor_status {anonymous}::ir_invariance_propagation_visitor::visit_leave(ir_assignment*)’:
glsl/propagate_invariance.cpp:86:63: warning: unused parameter ‘ir’ [-Wunused-parameter]
 ir_invariance_propagation_visitor::visit_leave(ir_assignment *ir)
                                                               ^~

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-19 15:55:44 -08:00
Ian Romanick
88cc9484f8 glsl: Minor formatting fixes in link_uniform_blocks.cpp
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-19 15:55:43 -08:00
Ian Romanick
296407990b glsl: Fix all the whitespace errors in link_uniform_block_active_visitor.cpp
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-19 15:55:43 -08:00
Ian Romanick
1f2659ad4d mesa: Silence numerous "unused parameter" warnings in dlist.c
main/dlist.c: In function ‘save_DrawArraysInstancedARB’:
main/dlist.c:1748:36: warning: unused parameter ‘mode’ [-Wunused-parameter]
 save_DrawArraysInstancedARB(GLenum mode,
                                    ^~~~
main/dlist.c:1749:35: warning: unused parameter ‘first’ [-Wunused-parameter]
                             GLint first,
                                   ^~~~~
main/dlist.c:1750:37: warning: unused parameter ‘count’ [-Wunused-parameter]
                             GLsizei count,
                                     ^~~~~
main/dlist.c:1751:37: warning: unused parameter ‘primcount’ [-Wunused-parameter]
                             GLsizei primcount)
                                     ^~~~~~~~~
main/dlist.c: In function ‘save_DrawElementsInstancedARB’:
main/dlist.c:1759:38: warning: unused parameter ‘mode’ [-Wunused-parameter]
 save_DrawElementsInstancedARB(GLenum mode,
                                      ^~~~
main/dlist.c:1760:39: warning: unused parameter ‘count’ [-Wunused-parameter]
                               GLsizei count,
                                       ^~~~~
main/dlist.c:1761:38: warning: unused parameter ‘type’ [-Wunused-parameter]
                               GLenum type,
                                      ^~~~
main/dlist.c:1762:45: warning: unused parameter ‘indices’ [-Wunused-parameter]
                               const GLvoid *indices,
                                             ^~~~~~~
main/dlist.c:1763:39: warning: unused parameter ‘primcount’ [-Wunused-parameter]
                               GLsizei primcount)
                                       ^~~~~~~~~
main/dlist.c: In function ‘save_DrawElementsInstancedBaseVertexARB’:
main/dlist.c:1771:48: warning: unused parameter ‘mode’ [-Wunused-parameter]
 save_DrawElementsInstancedBaseVertexARB(GLenum mode,
                                                ^~~~
main/dlist.c:1772:49: warning: unused parameter ‘count’ [-Wunused-parameter]
                                         GLsizei count,
                                                 ^~~~~
main/dlist.c:1773:48: warning: unused parameter ‘type’ [-Wunused-parameter]
                                         GLenum type,
                                                ^~~~
main/dlist.c:1774:55: warning: unused parameter ‘indices’ [-Wunused-parameter]
                                         const GLvoid *indices,
                                                       ^~~~~~~
main/dlist.c:1775:49: warning: unused parameter ‘primcount’ [-Wunused-parameter]
                                         GLsizei primcount,
                                                 ^~~~~~~~~
main/dlist.c:1776:47: warning: unused parameter ‘basevertex’ [-Wunused-parameter]
                                         GLint basevertex)
                                               ^~~~~~~~~~
main/dlist.c: In function ‘save_DrawArraysInstancedBaseInstance’:
main/dlist.c:1785:45: warning: unused parameter ‘mode’ [-Wunused-parameter]
 save_DrawArraysInstancedBaseInstance(GLenum mode,
                                             ^~~~
main/dlist.c:1786:44: warning: unused parameter ‘first’ [-Wunused-parameter]
                                      GLint first,
                                            ^~~~~
main/dlist.c:1787:46: warning: unused parameter ‘count’ [-Wunused-parameter]
                                      GLsizei count,
                                              ^~~~~
main/dlist.c:1788:46: warning: unused parameter ‘primcount’ [-Wunused-parameter]
                                      GLsizei primcount,
                                              ^~~~~~~~~
main/dlist.c:1789:45: warning: unused parameter ‘baseinstance’ [-Wunused-parameter]
                                      GLuint baseinstance)
                                             ^~~~~~~~~~~~
main/dlist.c: In function ‘save_DrawElementsInstancedBaseInstance’:
main/dlist.c:1797:47: warning: unused parameter ‘mode’ [-Wunused-parameter]
 save_DrawElementsInstancedBaseInstance(GLenum mode,
                                               ^~~~
main/dlist.c:1798:48: warning: unused parameter ‘count’ [-Wunused-parameter]
                                        GLsizei count,
                                                ^~~~~
main/dlist.c:1799:47: warning: unused parameter ‘type’ [-Wunused-parameter]
                                        GLenum type,
                                               ^~~~
main/dlist.c:1800:52: warning: unused parameter ‘indices’ [-Wunused-parameter]
                                        const void *indices,
                                                    ^~~~~~~
main/dlist.c:1801:48: warning: unused parameter ‘primcount’ [-Wunused-parameter]
                                        GLsizei primcount,
                                                ^~~~~~~~~
main/dlist.c:1802:47: warning: unused parameter ‘baseinstance’ [-Wunused-parameter]
                                        GLuint baseinstance)
                                               ^~~~~~~~~~~~
main/dlist.c: In function ‘save_DrawElementsInstancedBaseVertexBaseInstance’:
main/dlist.c:1810:57: warning: unused parameter ‘mode’ [-Wunused-parameter]
 save_DrawElementsInstancedBaseVertexBaseInstance(GLenum mode,
                                                         ^~~~
main/dlist.c:1811:58: warning: unused parameter ‘count’ [-Wunused-parameter]
                                                  GLsizei count,
                                                          ^~~~~
main/dlist.c:1812:57: warning: unused parameter ‘type’ [-Wunused-parameter]
                                                  GLenum type,
                                                         ^~~~
main/dlist.c:1813:62: warning: unused parameter ‘indices’ [-Wunused-parameter]
                                                  const void *indices,
                                                              ^~~~~~~
main/dlist.c:1814:58: warning: unused parameter ‘primcount’ [-Wunused-parameter]
                                                  GLsizei primcount,
                                                          ^~~~~~~~~
main/dlist.c:1815:56: warning: unused parameter ‘basevertex’ [-Wunused-parameter]
                                                  GLint basevertex,
                                                        ^~~~~~~~~~
main/dlist.c:1816:57: warning: unused parameter ‘baseinstance’ [-Wunused-parameter]
                                                  GLuint baseinstance)
                                                         ^~~~~~~~~~~~

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-19 15:55:43 -08:00
Ian Romanick
1b9f285265 mesa: Fix all the whitespace errors in dlist.c
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-19 15:55:43 -08:00
Ian Romanick
ceea514d91 linker: Accurately mark a uniform block instance array element as used in a stage
Now that information about which array-of-arrays elements are accessed
is tracked, use that information to only mark an instance array element
as used-by-stage if, in fact, it is.

Fixes GL45-CTS.program_interface_query.uniform-block-types.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-19 15:55:43 -08:00
Ian Romanick
d32956935e glsl: Walk a list of ir_dereference_array to mark array elements as accessed
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-19 15:55:43 -08:00
Ian Romanick
e92935089b glsl: Mark a set of array elements as accessed using a list of array_deref_range
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-19 15:55:43 -08:00
Ian Romanick
8d499f60c8 glsl: Add structures to track accessed elements of a single array
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-19 15:55:43 -08:00
Ian Romanick
b7053b80f2 glsl: Add tracking for elements of an array-of-arrays that have been accessed
If there's a better way to provide access to ir_array_refcount_entry
private members to the test functions, I am very interested to know
about it.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-19 15:55:43 -08:00
Ian Romanick
5085b64031 glsl: Use simpler visitor to determine which UBO and SSBO blocks are used
Very soon this visitor will get more complicated.  The users of the
existing ir_variable_refcount visitor won't need the coming
functionality, and this use doesn't need much of the functionality of
ir_variable_refcount.

v2: ir_array_refcount_visitor::get_variable_entry cannot return NULL, so
don't check it.  Suggested by Timothy.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-19 15:55:43 -08:00
Ian Romanick
d56bd07bb3 glsl: Track the linearized array index for each UBO instance array element
v2: Set linearizer_array_index in process_block_array_leaf.  Suggested
by Timothy.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-19 15:55:37 -08:00
Ian Romanick
300de78ab1 glsl: Fix wonkey indentation left from previous commit
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-19 15:54:38 -08:00
Ian Romanick
8862fefba0 glsl: Split process_block_array into two functions
One for the array parts and one for the leaf members.  This will
simplify later changes.

The indentation is wonkey after this patch.  This was done to make it
more obvious that the function is just getting split.  The next patch
will fix the indentation.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-19 15:54:38 -08:00
Kenneth Graunke
4c4d9e4f03 glsl: Fix program interface queries relating to interface blocks.
This fixes 555 dEQP tests (using the nougat-cts-dev branch), Piglit's
arb_program_interface_query/arb_program_interface_query-resource-query,
and GL45-CTS.program_interface_query.separate-programs-{tess-control,
tess-eval,geometry}.  Only one dEQP program interface failure remains.

I would have liked to split this up into several distinct changes, but
I wasn't sure how to do that given thet tangled nature of these issues.

So, the issues:

   * We need to treat interface blocks declared as an array of instances
     as a single block - removing the outer array.  The resource list
     entry's name should not include the array length.  Properties such
     as GL_ARRAY_SIZE should refer to the variable inside the block, not
     the interface block's array properties.

   * We need to do this prefixing even for structure variables.

   * We need to do this for built-ins (such as gl_PerVertex.gl_Position).

   * After interface array unwrapping, any variable which is an array
     should have [0] appended.  It doesn't matter if it's a TCS/TES/GS
     input or TCS output - that looked like an attempt to unwrap for
     per-vertex variables, but that didn't consider per-patch variables,
     and as far as I can tell there's nothing to justify this.

Several Mesa developers have suggested that Issue 16 contradicts the
main specification, but I believe that it doesn't - the main spec just
isn't terribly clear.  The main ARB_program_interface query spec says:

  "* For an active interface block not declared as an array of block
     instances, a single entry will be generated, using the block name from
     the shader source.

   * For an active interface block declared as an array of instances,
     separate entries will be generated for each active instance.  The name
     of the instance is formed by concatenating the block name, the "["
     character, an integer identifying the instance number, and the "]"
     character."

Issue 16 says that built-ins should be named "gl_PerVertex.gl_Position",
but several people suggested the second bullet above means that it
should be named "gl_PerVertex[array length].gl_Position".

There are two important things to note.  Those bullet points say
"an active interface block", while the others say "variable" or "active
shader storage block member".  They also don't mention applying the
rules recursively (unlike the other bullets).  Both suggest that
these rules apply to blocks themselves, not members of blocks.

In fact, for GL_UNIFORM_BLOCK queries, we do have "block[0]",
"block[1]", ... resource list entries - so those rules are real,
and actually used.  So if they don't apply to block members, then how
should members be named?  Unfortunately, I don't see any rules outside
of issue 16 - where the rationale is very unclear.  I hope to clarify
the spec in the future.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-12-19 15:43:09 -08:00
Kenneth Graunke
ad6d1d70ad glsl: Drop bogus is_vertex_input from add_shader_variable().
stage_mask is a bitmask of shader stages, so the proper comparison would
be (1 << MESA_SHADER_VERTEX), not MESA_SHADER_VERTEX itself.

But we only care for structure types, and VS inputs cannot be structs.
So we can just drop this entirely.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-12-19 15:40:47 -08:00
Kenneth Graunke
37d63b50b1 mesa/get: Convert stencil values to TYPE_UINT.
These are listed as Z+ in the GL spec, and often have values of
0xFFFFFFFF.  For glGetFloat, we should return 4294967295.0 rather than
-1.0.  Similarly, for glGetInteger64v, we should return 0xFFFFFFFF, not
the sign extended 0xFFFFFFFFFFFFFFFF.

Fixes 6 dEQP tests matching the pattern
dEQP-GLES3.functional.state_query.integers.stencil*value*mask*getfloat
when run in a single process (with state reset code happening between
tests, which makes dEQP set the stencil value mask to 0xFFFFFFFF).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-12-19 11:33:40 -08:00
Kenneth Graunke
9f93afb9a5 mesa/get: Add TYPE_UINT for casting through a GLuint.
The "State Tables" section of the OpenGL specification lists many values
as belonging to Z+ (non-negative integers), not Z (all integers).

For ordinary glGetInteger queries, this doesn't matter.  However, when
accessing Z+ values via glGetFloat or glGetInteger64, we need to treat
the source value as an unsigned value.  Otherwise, we'll produce a
negative number when bit 31 is set.

This commit merely adds the plumbing.  It doesn't convert any values.

v2: Gotta catch 'em all (add missing cases caught by Ilia)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-12-19 11:33:40 -08:00
Kenneth Graunke
78a391ed83 mesa/get: Make GetFloat/GetDouble of TYPE_INT_N not normalize things.
GetFloat of integer valued things is supposed to perform a simple
int -> float conversion.  INT_TO_FLOAT is not that.  Instead, it
converts [-2147483648, 2147483647] to a normalized [-1.0, 1.0] float.

This is only used for COMPRESSED_TEXTURE_FORMATS, which nobody in
their right mind would try and access via glGetFloat(), but we may
as well fix it.

Found by inspection.

v2: Gotta catch 'em all (fix another case of this caught by Ilia)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-12-19 11:33:40 -08:00
Michel Dänzer
52098fada7 Revert "cso: don't release sampler states that are bound"
This reverts commit 6dc96de303. No longer
necessary with the previous change.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-19 17:51:38 +09:00
Michel Dänzer
95eb5e4eed cso: Make sanitize_hash safe for samplers
Remove currently bound sampler states from the hash table before pruning
entries from the hash table, so they cannot accidentally be deleted by
the pruning.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-19 17:51:34 +09:00
Michel Dänzer
745e2eaaec cso: Store hash key in struct cso_sampler
Preparation for following changes, no functional change intended.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-19 17:51:31 +09:00
Michel Dänzer
9e14238647 cso: Optimize cso_save/restore_fragment_samplers
Only copy/memset the pointers that actually need to be.

v2:
* Cast info->nr_samplers to int for calculating delta (Nicolai)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-19 17:50:21 +09:00
Michel Dänzer
5e70f80c99 cso: Store pointers to struct cso_sampler in struct sampler_info
Preparation for following changes, no functional change intended.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-19 17:50:17 +09:00
Michel Dänzer
3d661a12be cso: Don't restore nr_samplers in cso_restore_fragment_samplers
If info->nr_samplers > ctx->nr_fragment_samplers_saved, the assignment
would prevent cso_single_sampler_done from unbinding the no longer used
samplers from the driver, which could result in use-after-free. This is
probably unlikely to happen in practice though.

Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-19 17:50:08 +09:00
Liu Zhiquan
e2610bf165 EGL/android: Enhance pbuffer implementation
Some dri drivers will pass multiple bits in buffer_mask parameter
to droid_image_get_buffer(), more than the actual supported buffer
type combination. For such case, will go through all the bits, and
will not return error when unsupported buffer is requested, only
return error when the allocation for supported buffer failed.

v2: coding style and log changes
v3: coding style changes and update patch format

Signed-off-by: Liu Zhiquan <zhiquan.liu@intel.com>
Signed-off-by: Long, Zhifang <zhifang.long@intel.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
2016-12-19 08:26:32 +02:00
Bas Nieuwenhuizen
1d529cba02 radv: Use correct workgroup size limits.
Not sure where the 16k comes from, but pretty sure 2k is the max.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-12-18 22:18:14 +01:00
Dave Airlie
6229994ab7 radv: expose the compute queue
v2: Don't expose the SDMA queue and use the CIK check also in the
    second if. (Bas)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-12-18 20:52:55 +01:00
Bas Nieuwenhuizen
442735d35d radv: Only emit PFP ME syncs for DMA on the GFX queue.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-12-18 20:52:51 +01:00
Bas Nieuwenhuizen
f2523ebf52 radv: Create an empty CS per ring type.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-12-18 20:52:47 +01:00
Bas Nieuwenhuizen
accc5fc026 radv: Don't enable CMASK on compute queues.
We can't fast clear on compute queues.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-12-18 20:52:41 +01:00
Bas Nieuwenhuizen
bfee9866ea radv: Use RELEASE_MEM packet for MEC timestamp query.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-12-18 20:52:37 +01:00
Bas Nieuwenhuizen
9b0efc98ba radv: Implement indirect dispatch for the MEC.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-12-18 20:52:33 +01:00
Bas Nieuwenhuizen
3a559029e2 radv: update vkCmdUpdateBuffer for the MEC.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-12-18 20:52:29 +01:00
Bas Nieuwenhuizen
b3499557a2 radv: Implement cache flushing for the MEC.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-12-18 20:52:26 +01:00
Dave Airlie
72aaa83f4b radv: add semaphore support
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-12-18 20:52:26 +01:00
Dave Airlie
d270b5fac3 radv: pass queue index into winsys submission
This is so we can submit on separate queues if needed

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-12-18 20:52:26 +01:00
Dave Airlie
d0e6fb0574 radv: init compute queue and avoid initing transfer queues
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-12-18 20:52:26 +01:00
Bas Nieuwenhuizen
71dabe1c16 radv/winsys: Make WaitIdle queue aware.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-12-18 20:52:20 +01:00
Dave Airlie
d028bd7b55 radv/meta: update header info
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-12-18 20:52:20 +01:00
Dave Airlie
4bd666a319 radv: hook compute clears into clear image api.
These aren't used yet but we will want to use them when we
implement a separate compute queue.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-12-18 20:52:20 +01:00
Dave Airlie
f11ea8779d radv: clear image implementation for compute queue
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-12-18 20:52:20 +01:00
Dave Airlie
9839ce282b radv/meta: split clear image out into a separate layer clear function
This will make it easier to add support for clears on compute queues.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-12-18 20:52:20 +01:00
Dave Airlie
ef5f59c9a9 radv: implement image->image copies using compute shader
This is required for having a separate compute queue, we
probably can't use this on GFX queue due to DCC.

v2: Set coord_components = 2 for itoi texture fetch. (Bas)

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-12-18 20:52:20 +01:00
Dave Airlie
983af3a6d1 radv: add a compute shader implementation for buffer to image
This implements the reverse of the current buffer->image path
and can be used when we need to do image transfer on compute queues

This just adds the code turned off as we don't support separate
computes queues yet, and we don't want to use this path on the GFX
queues for DCC reasons.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-12-18 20:52:20 +01:00
Bas Nieuwenhuizen
35cf08ef64 radv: Use correct pitch for views with different block size.
Needed when accessing a comrpessed texture as R32G32B32A32 from a shader. This
was not encountered previously, as we used the CB for the reinterpretation, which
does not use this pitch.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-12-18 20:52:15 +01:00
Dave Airlie
94a7434bbc radv: Store queue family in command buffers.
v2: Added helper (Bas)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-12-18 20:52:15 +01:00
Dave Airlie
c20701f4be radv: start fixing up queue allocate for multiple queues
v2: Fix error handling  and zero init the device (Bas)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-12-18 20:52:15 +01:00
Dave Airlie
59c9a131f4 radv/winsys: start adding support for DMA/compute queue
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-12-18 20:52:15 +01:00
Bas Nieuwenhuizen
86cb418bd4 radv/winsys: Expose number of compute/dma rings.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-12-18 20:52:08 +01:00
Rob Clark
2c0dfd48f0 freedreno/a5xx: border color support
Not 100% sure it works if you have border color in VS.. but it might be
right.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-18 13:49:45 -05:00
Rob Clark
939486d3d3 freedreno/a5xx: use MRT0 to import linear zs
A bit of a hack, but we need to do this until we can do tiled zs in
sysmem (and associated tile/until blits for transfer_map).

Fixes xonotic and glmark2 "refract", when reorder wasn't enabled.
(reorder would paper over the issue by avoiding the extra round-
trip to system memory and back to gmem.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-18 13:48:10 -05:00
Rob Clark
bea8602e5b freedreno: fdN_gmem_restore_format() is not gen specific
Refactor out into a common helper, since this is the same across
generations when we need equiv z/s gmem restore format.

Next patch needs this in a5xx, rather than creating yet another
helper push this into core.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-18 13:48:03 -05:00
Rob Clark
6f93c75a47 freedreno/a5xx: cargo-cult end-batch sequence more faithfully
Fixes some issues at least with GMEM bypass mode, where we'd sometimes
end up with some FS quads not hitting memory.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-18 13:47:54 -05:00
Rob Clark
d35022f24d freedreno/a5xx: misc fix
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-18 13:47:47 -05:00
Rob Clark
651f2655a8 freedreno/a5xx: fix (at least some) vtx formats
Swap/component-order doesn't seem to be quite what that is.  At least
blob was always setting it to XYZW ('11') but we weren't.  Causing
problems w/ formats like sint16..  Hard-coding this instead at least
seems to get glamor working.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-18 13:47:38 -05:00
Rob Clark
2540226f66 freedreno/a5xx: more formats
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-18 13:47:31 -05:00
Rob Clark
c768461c1f freedreno/a5xx: fixup caps
Might not be 100% accurate, mostly just copy from a4xx to get started.

We are defn lying about occlusion query at this point (not implemented
yet) but need it to expose anything higher than gl1.4 (glamor needs
gl2.1)

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-18 13:47:18 -05:00
Rob Clark
abcf8f5b58 freedreno/a5xx: fix random faults on first sysmem draw
Not sure what this event is, but blob writes it.. and it seems to solve
random write faults at mystery address that would sometimes happen on
first BYPASS draw.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-18 13:47:08 -05:00
Rob Clark
54537fa1dc freedreno: update generated headers
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-18 13:47:00 -05:00
Rob Clark
5e632b3a83 freedreno/a5xx: fix stride/size for mem->gmem blits
<brownpaperbag>these should be the in-GMEM dimensions</brownpaperbag>

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-18 13:46:48 -05:00
Dave Airlie
0f2e9a8986 radv/winsys: consolidate request->fence code
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-12-17 16:30:16 +01:00
Dave Airlie
7ad1c24e2a radv: handle fence allocation failing
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-12-17 16:29:57 +01:00
Bas Nieuwenhuizen
b2b4f7248b radv: Don't bail out on pipeline create failure.
The spec says we have to try to create all, and only set failed
pipelines to VK_NULL_HANDLE. If one of them fails, we have to return
an error, but as far as I can see, the spec does not care which of
the suberrors.

Fixes
dEQP-VK.api.object_management.alloc_callback_fail_multiple.compute_pipeline
dEQP-VK.api.object_management.alloc_callback_fail_multiple.graphics_pipeline

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-12-17 11:41:53 +01:00
Ilia Mirkin
6493b4f4dd spirv/nir: add support for ImageGatherExtended
The strategy is to do the same thing that the GLSL lower_offset_arrays
pass does - create 4 separate texture gather ops, one per offset, and
read in the results from each gather's w component to recreate the
desired result.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-12-16 20:27:37 -05:00
Francisco Jerez
79d08ed3d2 anv: Fix uniform and storage buffer offset alignment limits.
This fixes a regression in a bunch of image store vulkan CTS tests
from commit ad38ba1134, which started
using OWORD block read messages to implement UBO loads.  The reason
for the failure is that we were giving bogus buffer alignment limits
to the application (1B), so the CTS would happily come back with
descriptor sets pointing at not even word-aligned uniform buffer
addresses.

Surprisingly the sampler messages used to fetch pull constants before
that commit were able to cope with the non-texel aligned addresses,
but the dataport messages used to fetch pull constants after that
commit and the ones used to access storage buffers (before and after
the same commit) aren't as permissive with unaligned addresses.

Cc: <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99097
Reported-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-16 14:12:54 -08:00
Thomas Helland
a66818830a nir: Remove nir_array from lower_locals_to_regs
We do nothing but initialize it, add to it, and delete it.
This is a fallout from removing constant initializer support.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-16 12:02:28 -08:00
Bruce Cherniak
79b66ec05e swr: Implement fence attached work queues for deferred deletion.
Work can now be added to fences and triggered by fence completion. This
allows for deferred resource deletion, and other asynchronous tasks.

Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
2016-12-16 11:29:02 -06:00
Timothy Arceri
3421b3f5a3 nir: Turn imov/fmov of undef into undef
Reverting the previous attempt at this a5502a721f resulted in
the following Vulkan test failing.

dEQP-VK.glsl.return.return_in_dynamic_loop_dynamic_vertex

This time we use the num_components from the alu dest rather than
num_inputs to the op to determine the size of the undef.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99100
2016-12-16 20:32:59 +11:00
Eric Engestrom
08fc74663b egl/x11: cleanup init code
No functional change, just rewriting it in an easier-to-understand way.

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-12-15 11:48:31 +00:00
Iago Toral Quiroga
47351b843a nir/lower_tex: fix number of components in replace_gradient_with_lod()
We should make the dest in the textureLod() operation have the same number
of components as the destination in the original textureGrad()

Fixes regression in ES3-CTS.gtf.GL3Tests.shadow

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99072
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-15 08:34:55 +01:00
Timothy Arceri
a5502a721f Revert "nir: Turn imov/fmov of undef into undef."
This reverts commit 6aa730000f.

This was changing the size of the undef to always be 1 (the number of inputs
to imov and fmov) which is wrong, we could be moving a vec4 for example.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-12-15 17:05:12 +11:00
Kenneth Graunke
84e19322d3 i965/vec4: Fix TCS output reads with non-zero component qualifiers.
We want to perform the URB read to a vec4 temporary, with no writemask,
then issue a MOV to swizzle the data and store it to the actual
destination, using the final writemask.

We were doing this wrong.  For example, let's say we wanted to read
a vec2 stored in components 2-3 of a vec4.  We would generate a URB
read message of:

   SEND <actual destination>.XY <header with mask set to XY>
   MOV <actual destination>.XY <actual destination>.ZW

This doesn't work, because the URB message reads the .XY components
of the vec4, rather than the ZW.  It writes to the right place, but
with the wrong data.  Then the MOV comes along and overwrites it
with data that didn't even come from the URB at all.

Instead we want to do:

   SEND <temporary> <header with mask set to ZW>
   MOV <actual destination>.XY <temporary>.ZW

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-14 21:15:39 -08:00
Francisco Jerez
fd3120d85c i965/disasm: Decode dataport constant cache control fields.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-14 16:50:27 -08:00
Francisco Jerez
23caf75182 i965/fs: Remove the FS_OPCODE_SET_SIMD4X2_OFFSET virtual opcode.
Not used anymore.  It was just a scalar MOV.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-14 16:50:27 -08:00
Francisco Jerez
e014058195 i965/fs: Drop useless access mode override from pull constant generator code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-14 16:50:27 -08:00
Francisco Jerez
b56fa830c6 i965/fs: Fetch one cacheline of pull constants at a time.
Asking the DC for less than one cacheline (4 owords) of data for
uniform pull constants is suboptimal because the DC cannot request
less than that from L3, resulting in wasted bandwidth and unnecessary
message dispatch overhead, and exacerbating the IVB L3 serialization
bug.  The following table summarizes the overall framerate improvement
(with statistical significance of 5% and sample size ~10) from the
whole series up to this patch for several benchmarks and hardware
generations:

                         | SKL           | BDW          | HSW
SynMark2 OglShMapPcf     | 24.63% ±0.45% | 4.01% ±0.70% | 10.31% ±0.38%
GfxBench4 gl_manhattan31 |  5.93% ±0.35% | 3.92% ±0.31% |  6.62% ±0.22%
GfxBench4 gl_4           |  2.52% ±0.44% | 1.23% ±0.10% |      N/A
Unigine Valley           |  0.83% ±0.17% | 0.23% ±0.05% |  0.74% ±0.45%

Note that there are two versions of the Manhattan demo shipped with
GfxBench4, one of them is the original gl_manhattan demo which doesn't
use UBOs, so this patch will have no effect on it, and another one is
the gl_manhattan31 demo based on GL 4.3/GLES 3.1, which this patch
benefits as shown above.

I haven't observed any statistically significant regressions in the
benchmarks I have at hand.  Note that the comparatively huge
improvement on SKL in the OglShMapPcf test case is due to the combined
effect of this patch and the register pressure benefit on SKL+ of
"i965/fs: Switch to the constant cache for uniform pull constants.",
part of the same series.

Going up to 8 oword blocks would improve performance of pull constants
even more, but at the cost of some additional bandwidth and register
pressure, so it would have to be done on-demand based on the number of
constants actually used by the shader.

v2: Fix for Gen4 and 5.
v3: Non-trivial rebase.  Rework to allow the visitor specifiy
    arbitrary pull constant block sizes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-14 16:50:27 -08:00
Francisco Jerez
9b22a0d295 i965/fs: Expose arbitrary pull constant load sizes to the IR.
Change the FS generator to ask the dataport for enough owords worth of
constants to fill the execution size of the instruction -- Which means
that the visitor now needs to set the execution size correctly for
uniform pull constant load instructions, which we were kind of
neglecting until now.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-14 16:50:26 -08:00
Francisco Jerez
7a6aadb76f i965: Factor out oword block read and write message control calculation.
We'll need roughly the same logic in other places and it would be
annoying to duplicate it.  Instead factor it out into a function-like
macro that takes the number of dwords per block (which will prove more
convenient than taking the same value in owords or some other unit).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-14 16:50:26 -08:00
Francisco Jerez
ad38ba1134 i965/fs: Switch to the constant cache for uniform pull constants.
This reverts to using the oword block read messages for uniform pull
constant loads, as used to be the case until
4c1fdae0a0.  There are two important differences
though: Now the L3 cacheability bits are set up correctly for UBOs
(since 11f5d8a5d4), and we target the
constant cache instead of the data cache.  The latter used to get no
L3 way allocation on boot on all platforms that existed at the time,
so oword read messages wouldn't get cached on L3 regardless of the
MOCS bits, what probably explains the apparent slowness of oword
fetches.

Constant cache loads seem to perform better than SIMD4x2 sampler loads
in a number of cases, they alleviate some of the cache thrashing
caused by the competition with textures for the L1/L2 sampler caches,
and they allow fetching up to 128B worth of constants with a single
oword fetch message.

Note that IVB devices suffer from a hardware bug that leads to
serialization of L3 read requests overlapping the same cacheline as
result of a (on IVB buggy) mechanism of the L3 to preserve coherency.
Since read requests for matching cachelines from any L3 client are not
pipelined, throughput may decrease in cases where there are no
non-overlapping requests left in the queue that can be processed
between them.

This situation should be relatively uncommon as long as we make sure
that we don't use the 1/2 oword messages in cases where the shader
intends to read from any other location of the same cacheline at some
other point.  This is generally a good idea anyway on all generations
because using the 1 and 2 oword messages is expected to waste
bandwidth since the minimum L3 request size for the DC is exactly 4
owords (i.e. one cacheline).  A future commit will have this effect.
I haven't been able to find any real-world example where this would
still result in a regression on IVB, but if someone happens to find
one it shouldn't be too difficult to add an IVB-specific check to have
it fall back to the sampler cache for pull constant loads.

Note that on SKL+ this change has the additional benefit of reducing
the register footprint of pull constant loads.  The following table
summarizes the effect of the whole series on several shader-db stats:

     Total instructions          Total cycles
BWR: 4571248 -> 4568342 (-0.06%) 123375740 -> 123373296 (-0.00%)
ELK: 3989020 -> 3985402 (-0.09%)  98757068 -> 98754058 (-0.00%)
ILK: 6383591 -> 6376787 (-0.11%) 143649910 -> 143648914 (-0.00%)
SNB: 7528395 -> 7501446 (-0.36%) 103503796 -> 102460370 (-1.01%)
IVB: 6949221 -> 6943317 (-0.08%)  60592262 -> 60584422 (-0.01%)
HSW: 6409753 -> 6403702 (-0.09%)  60609070 -> 60604414 (-0.01%)
BDW: 8043467 -> 7976364 (-0.83%)  68427730 -> 68483042 (0.08%)
CHV: 8045019 -> 7977916 (-0.83%)  68297426 -> 68352756 (0.08%)
SKL: 8204037 -> 7939086 (-3.23%)  66583900 -> 65624378 (-1.44%)

     Lost->Gained Total spills          Total fills
BWR:  5 ->   5    1488 -> 1488 (0.00%)  1957 -> 1957 (0.00%)
ELK:  5 ->   5    1489 -> 1489 (0.00%)  1958 -> 1958 (0.00%)
ILK:  1 ->   4    1449 -> 1449 (0.00%)  1921 -> 1921 (0.00%)
SNB:  0 ->   0     549 -> 549 (0.00%)     52 -> 52 (0.00%)
IVB: 13 ->   3    1271 -> 1271 (0.00%)  1162 -> 1162 (0.00%)
HSW: 11 ->   0    1271 -> 1271 (0.00%)  1162 -> 1162 (0.00%)
BDW: 12 ->   0    1340 -> 1340 (0.00%)  1452 -> 1452 (0.00%)
CHV: 12 ->   0    1340 -> 1340 (0.00%)  1452 -> 1452 (0.00%)
SKL:  0 -> 120    1269 -> 375 (-70.45%) 1563 -> 690 (-55.85%)

v3: Non-trivial rebase.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-14 16:50:26 -08:00
Francisco Jerez
3c78d31374 i965: Let the caller of brw_set_dp_write/read_message control the target cache.
brw_set_dp_read_message already had a target_cache argument, but its
interpretation was rather convoluted (on Gen6 the render cache was
used if the caller asked for it, otherwise it was ignored using the
sampler cache instead), and the constant cache wasn't representable at
all.  brw_set_dp_write_message used the data cache on Gen7+ except for
RENDER_TARGET_WRITE messages, in which case it would use the render
cache.  On Gen6 the render cache was always used.

Instead of the above, provide the shared unit SFID that the caller
expects will be used.  Makes no functional changes.

v3: Non-trivial rebase.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-14 16:50:26 -08:00
Francisco Jerez
591e14ec08 i965/gen6+: Invalidate constant cache on brw_emit_mi_flush().
In order to make sure that the constant cache is coherent with
previous rendering when we start using it for pull constant loads.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-14 16:50:26 -08:00
Kenneth Graunke
e0c1ec3b09 genxml: Make Gen8 3DSTATE_DS SIMD8 enable work like Gen9+.
This will let us avoid ifdefs.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2016-12-14 14:59:06 -08:00
Kenneth Graunke
000b563a1b genxml: Rename "DS Function Enable" to "Function Enable".
This makes Gen7/7.5 match Gen8-9.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2016-12-14 14:59:06 -08:00
Chad Versace
72ffe8318d anv: Reject VkMemoryAllocateInfo::allocationSize == 0
The Vulkan 1.0.33 spec says "allocationSize must be greater than 0".

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-12-14 12:04:58 -08:00
Chad Versace
5e97b8f5ce egl: Fix crashes in eglCreate*Surface()
Don't dereference a null EGLDisplay.

Fixes tests
  dEQP-EGL.functional.negative_api.create_pbuffer_surface
  dEQP-EGL.functional.negative_api.create_pixmap_surface

Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=99038
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-12-14 12:03:15 -08:00
Jason Ekstrand
b18cd8ce2c i965/miptree: Use intel_miptree_copy for maps
What we're really doing is copying a texture not blitting it in the sense
of glBlitFramebuffers.  Also, the intel_miptree_copy function is capable of
properly handling compressed textures which intel_miptree_blit is not.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97473
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-12-13 15:48:34 -08:00
Jason Ekstrand
157971e450 i965/blit: Fix the src dimension sanity check in miptree_copy
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-12-13 15:48:13 -08:00
Lionel Landwerlin
9fe3f2649e docs: add INTEL_conservative_rasterization to relaese notes for 13.1.0
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-12-13 16:28:00 +00:00
Lionel Landwerlin
60330d730b main: add INTEL_conservative_rasterization enum query support
v2: add extra parameter (Ilia)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-12-13 16:27:59 +00:00
Lionel Landwerlin
d4b753a50b glapi: add missing INTEL_conservative_rasterization
v2: put enum directly in gl_API.xml (Ilia)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-12-13 16:27:56 +00:00
Lionel Landwerlin
47285d4602 extensions: update INTEL_conservative_rasterization dependencies
Suggested by Ilia.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-12-13 16:27:54 +00:00
Lionel Landwerlin
300d96a433 main: don't error when enabling conservative rasterization on gles
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-12-13 16:27:51 +00:00
Lionel Landwerlin
9854a3ba8b main: use new driver flag for conservative rasterization state
Suggested by Marek.

v2: Use new driver flag (Marek)

v3: Fix i965 comments (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-13 16:27:33 +00:00
Iago Toral Quiroga
da3389a331 nir/lower_tex: lower gradients on shadow cube maps if lower_txd_shadow is set
Even if lower_txd_cube_map isn't. Suggested by Ken to make the flag more
consistent with its name.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-13 10:33:29 +01:00
Iago Toral Quiroga
44873ad0a4 i965: remove brw_lower_texture_gradients
This has been ported to NIR now so we don'tneed to keep the GLSL IR
lowering any more.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-13 10:33:20 +01:00
Iago Toral Quiroga
77f65b3b64 i965/nir: enable lowering of texture gradient for shadow samplers
This gets the lowering on the Vulkan driver too, which is required for
hardware that does not have the sample_l_d message (up to IvyBridge).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-13 10:33:14 +01:00
Iago Toral Quiroga
5be2e785b1 nir/lower_tex: add lowering for texture gradient on shadow samplers
This is ported from the Intel lowering pass that we use with GLSL IR.
This takes care of lowering texture gradients on shadow samplers other
than cube maps. Intel hardware requires this for gen < 8.

v2 (Ken):
 - Use the helper function to retrieve ddx/ddy
 - Swizzle away size components we are not interested in

v3:
- Get rid of the ddx/ddy helper and use nir_tex_instr_src_index
  instead (Ken, Eric)

v4:
- Add a 'continue' statement if the lowering makes progress because it
  replaces the original texture instruction

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v3)
2016-12-13 10:32:52 +01:00
Iago Toral Quiroga
f90da64fc6 i965/nir: enable lowering of texture gradient for cube maps
This gets the lowering on the Vulkan driver too.

Fixes Vulkan CTS cube map texture gradient tests in:
dEQP-VK.glsl.texture_functions.texturegrad.*

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-13 10:32:46 +01:00
Iago Toral Quiroga
a8e740c354 nir/lower_tex: add lowering for texture gradient on cube maps
This is ported from the Intel lowering pass that we use with GLSL IR.
The NIR pass only handles cube maps, not shadow samplers, which are
also lowered for gen < 8 on Intel hardware. We will add support for
that in a later patch, at which point we should be able to remove
the GLSL IR lowering pass.

v2:
- added a helper to retrieve ddx/ddy parameters (Ken)
- No need to make size.z=1.0, we are only using component x anyway (Iago)

v3:
- Get rid of the ddx/ddy helper and use nir_tex_instr_src_index
  instead (Ken, Eric)

v4:
- When emitting the textureLod operation, copy all texture parameters
  from the original textureGrad() (except for ddx/ddy) using a loop
- Add a 'continue' statement if the lowering makes progress because it
  replaces the original texture instruction

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v3)
2016-12-13 10:32:00 +01:00
Iago Toral Quiroga
bac303c286 nir/lower_tex: generalize get_texture_size()
This was written specifically for RECT samplers. Make it more generic so
we can call this from the gradient lowerings too.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-13 10:31:38 +01:00
Ilia Mirkin
fd249c803e treewide: s/comparitor/comparator/
git grep -l comparitor | xargs sed -i 's/comparitor/comparator/g'

Just happened to notice this in a patch that was sent and included one
of the tokens in question.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-12 22:13:07 -05:00
Ian Romanick
a0ce9ff8c4 nir: Only float and double types can be matrices
In 19a541f (nir: Get rid of nir_constant_data) a number of places that
operated on nir_constant::values were mechanically converted to operate
on the whole array without regard for the base type.  Only
GLSL_TYPE_FLOAT and GLSL_TYPE_DOUBLE can be matrices, so only those
types can have data in the non-0 array element.

See also b870394.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
2016-12-12 17:17:12 -08:00
Tim Rowley
75149088be swr: [rasterizer core/memory] StoreTile: AVX512 progress
Fixes to 128-bit formats.

Reviwed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-12-12 17:52:39 -06:00
Matt Turner
ac6646129f nir: Move fsat outside of fmin/fmax if second arg is 0 to 1.
instructions in affected programs: 550 -> 544 (-1.09%)
helped: 6

cycles in affected programs: 6952 -> 6850 (-1.47%)
helped: 6

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-12 12:39:27 -08:00
Matt Turner
7bed52bb5f i965/fs: Reject copy propagation into SEL if not min/max.
We shouldn't ever see a SEL with conditional mod other than GE (for max)
or L (for min), but we might see one with predication and no conditional
mod.

total instructions in shared programs: 8241806 -> 8241902 (0.00%)
instructions in affected programs: 13284 -> 13380 (0.72%)
HURT: 62

total cycles in shared programs: 84165104 -> 84166244 (0.00%)
cycles in affected programs: 75364 -> 76504 (1.51%)
helped: 10
HURT: 34

Fixes generated code in at least Sanctum 2, Borderlands 2, Goat
Simulator, XCOM: Enemy Unknown, and Shogun 2.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92234
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-12 12:38:55 -08:00
Matt Turner
091a8a04ad i965/fs: Add unit tests for copy propagation pass.
Pretty basic, but it's a start.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-12 12:38:50 -08:00
Matt Turner
6014da50ec i965/fs: Rename opt_copy_propagate -> opt_copy_propagation.
Matches the vec4 backend, cmod propagation, and saturate propagation.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-12 12:38:43 -08:00
Nicolai Hähnle
ec0a0a60cc radeonsi: shrink the GSVS ring to account for the reduced item sizes
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:05:17 +01:00
Nicolai Hähnle
6fdef7d265 radeonsi: shrink each vertex stream to the actually required size
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:05:13 +01:00
Nicolai Hähnle
2f2e941e2d radeonsi: use a single descriptor for the GSVS ring
We can hardcode all of the fields for swizzling in the geometry shader.

The advantage is that we use fewer descriptor slots and we no longer have to
update any of the (ring) descriptors when the geometry shader changes.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:05:05 +01:00
Nicolai Hähnle
18616e7551 radeonsi: pack GS output components for each vertex stream contiguously
Note that the memory layout of one vertex stream inside one "item" (= memory
written by one GS wave) on the GSVS ring is:

  t0v0c0 ... t15v0c0 t0v1c0 ... t15v1c0 ... t0vLc0 ... t15vLc0
  t0v0c1 ... t15v0c1 t0v1c1 ... t15v1c1 ... t0vLc1 ... t15vLc1
                        ...
  t0v0cL ... t15v0cL t0v1cL ... t15v1cL ... t0vLcL ... t15vLcL
  t16v0c0 ... t31v0c0 t16v1c0 ... t31v1c0 ... t16vLc0 ... t31vLc0
  t16v0c1 ... t31v0c1 t16v1c1 ... t31v1c1 ... t16vLc1 ... t31vLc1
                        ...
  t16v0cL ... t31v0cL t16v1cL ... t31v1cL ... t16vLcL ... t31vLcL

                        ...

  t48v0c0 ... t63v0c0 t48v1c0 ... t63v1c0 ... t48vLc0 ... t63vLc0
  t48v0c1 ... t63v0c1 t48v1c1 ... t63v1c1 ... t48vLc1 ... t63vLc1
                        ...
  t48v0cL ... t63v0cL t48v1cL ... t63v1cL ... t48vLcL ... t63vLcL

where tNN indicates the thread number, vNN the vertex number (in the order of
EMIT_VERTEX), and cNN the output component (vL and cL are the last vertex and
component, respectively).

The vertex streams are laid out sequentially.

The swizzling by 16 threads is hard-coded in the way the VGT generates the
offset passed into the GS copy shader, and the jump every 16 threads is
calculated from VGT_GSVS_RING_OFFSET_n and VGT_GSVS_RING_ITEMSIZE in a way
that makes it difficult to deviate from this layout (at least that's what
I've experimentally confirmed on VI after first trying to go the simpler
route of just interleaving the vertex streams).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:05:00 +01:00
Nicolai Hähnle
edf034ac14 radeonsi: do not write non-existent components through the GSVS ring
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:58 +01:00
Nicolai Hähnle
af976f12a5 radeonsi: only write values belonging to the stream when emitting GS vertex
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:54 +01:00
Nicolai Hähnle
bdf1bf1cb5 radeonsi: generate an explicit switch instruction over vertex streams
SimplifyCFG generates a switch instruction anyway when all four streams
are present, but is simultaneously not smart enough to eliminate some
redundant jumps that it generates.

The generated assembly is still a bit silly, probably because the
control flow annotation doesn't know how to handle a switch with uniform
condition.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:49 +01:00
Nicolai Hähnle
bae929f96e radeonsi: fetch only outputs of current vertex stream from the GSVS ring
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:46 +01:00
Nicolai Hähnle
dfb69cac33 radeonsi: only export from GS copy shader for vertex stream 0
When running the copy shader for vertex streams != 0, the SX does not need
any data from us (there is no rasterization for the higher vertex streams,
only streamout).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:43 +01:00
Nicolai Hähnle
21f2bb22a3 radeonsi: do not export VS outputs from vertex streams != 0
This affects for GS copy shaders. When an output is meant for vertex
stream != 0, then we don't have to make it available to the pixel
shader.

There is a minor inefficiency here because the GLSL varying packing pass
does not group varyings of the same vertex stream together, but it
shouldn't be important in practice.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:36 +01:00
Nicolai Hähnle
fc0e009aa7 radeonsi: pull iteration over vertex streams into GS copy shader logic
The iteration is not needed for normal vertex shaders.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:33 +01:00
Nicolai Hähnle
180ae18ec5 radeonsi: group streamout writes by vertex stream
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:30 +01:00
Nicolai Hähnle
d89592836a radeonsi: load the streamout buf descriptors closer to their use
LLVM can still decide to hoist the loads since they're marked invariant.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:27 +01:00
Nicolai Hähnle
564f17f0d7 radeonsi: extract writing of a single streamout output
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:24 +01:00
Nicolai Hähnle
b41dd00235 radeonsi: separate the call to si_llvm_emit_streamout from exports
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:22 +01:00
Nicolai Hähnle
5ad6e56ca3 radeonsi: plumb the output vertex_stream through to si_shader_output_values
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:19 +01:00
Nicolai Hähnle
2985708fa0 radeonsi: rename members of si_shader_output_values
Be a bit more verbose and avoid confusion in future patches.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:16 +01:00
Nicolai Hähnle
88509518b0 radeonsi: fix an off-by-one error in the bounds check for max_vertices
The spec actually says that calling EmitStreamVertex is undefined when
you exceed max_vertices. But we do need to avoid trampling over memory
outside the GSVS ring.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:13 +01:00
Nicolai Hähnle
7655bccce8 radeonsi: do not kill GS with memory writes
Vertex emits beyond the specified maximum number of vertices are supposed to
have no effect, which is why we used to always kill GS that reached the limit.

However, if the GS also writes to memory (SSBO, atomics, shader images), then
we must keep going and only skip the vertex emit itself.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:10 +01:00
Nicolai Hähnle
7b5b3d63c5 radeonsi: update all GSVS ring descriptors for new buffer allocations
Fixes GL45-CTS.gtf40.GL3Tests.transform_feedback3.transform_feedback3_geometry_instanced.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:06 +01:00
Nicolai Hähnle
2eaacba7f2 st/glsl_to_tgsi: plumb the GS output stream qualifier through to TGSI
Allow drivers to emit GS outputs in a smarter way.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:03 +01:00
Nicolai Hähnle
cc34a6f0bd tgsi/scan: collect information about output usagemasks
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:01 +01:00
Nicolai Hähnle
cf8e9778fc tgsi/scan: collect information about output vertex streams
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:03:57 +01:00
Nicolai Hähnle
81d0dc5e55 gallium: extract individual streamout output structure
So that we can pass pointers to individual array entries around.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:03:54 +01:00
Nicolai Hähnle
04811354c8 tgsi: add Stream{X,Y,Z,W} fields to tgsi_declaration_semantic
This is for geometry shader outputs. Without it, drivers have no way of
knowing which stream each output is intended for, and have to
conservatively write all outputs to all streams.

Separate stream numbers for each component are required due to output
packing.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:03:51 +01:00
Nicolai Hähnle
173d80b401 glsl: remember per-component vertex streams for packed varyings
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:03:47 +01:00
Grazvydas Ignotas
6092169b96 i965/blorp: fix release build unused variable warning
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-12-12 07:09:33 +01:00
Edward O'Callaghan
5e6b2b05a5 virgl: Fix a strict-aliasing violation in the encoder
As per the C spec, it is illegal to alias pointers to different
types. This results in undefined behaviour after optimization
passes, resulting in very subtle bugs that happen only on a
full moon..

Use a memcpy() as a well defined coercion between the double
to uint64_t interpretations of the memory.

V.2: Use static_assert() instead of assert().
V.3: Use C99 compat STATIC_ASSERT() over C11 static_assert().

Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Acked-by: Dave Airlie <airlied@redhat.com>
2016-12-12 16:50:15 +11:00
Kenneth Graunke
35c5a9a64d i965: Print out cycle estimates at the start of block annotations.
We now print

   START B15 <-B14 (42774 cycles)

indicating that we estimate B15 will take 42,774 cycles.  Printing
this should make it easier where time is spent in the program.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-12-11 16:33:05 -08:00
Kenneth Graunke
713cd23d8e mesa: Return LINEAR encoding for winsys FBO depth/stencil.
GetFramebufferAttachmentParameteriv should return GL_LINEAR for the
window system default framebuffer's GL_DEPTH or GL_STENCIL attachments
when there are zero depth or stencil bits.

The GL 4.5 spec's GetFramebufferAttachmentParameteriv section says:

"If the value of FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE is not NONE,
 these queries apply to all other framebuffer types:

 [...]

 If attachment is not a color attachment, or no data storage or texture
 image has been specified for the attachment, then params will contain
 the value LINEAR."

Note that we already return LINEAR for the case where there is an actual
depth or stencil renderbuffer attached.  In the case modified by this
patch, FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE returns FRAMEBUFFER_DEFAULT
rather than NONE.

Fixes a CTS test when run in a visual without depth / stencil buffers:
GL45-CTS.gtf30.GL3Tests.framebuffer_srgb.framebuffer_srgb_default_encoding

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-12-11 16:33:05 -08:00
Grazvydas Ignotas
b58d1eecc6 intel/aubinator: fix 32bit shift overflow warning
Doesn't look like this can work on 32bit, just rids of annoying
warning.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-12-11 20:04:15 +01:00
Grazvydas Ignotas
3a1b15c392 anv: fix release build unused variable warnings
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-12-11 20:03:14 +01:00
Grazvydas Ignotas
90c29784c6 radv/ac: some fix maybe-uninitialized warnings
Mark some paths unreachable so that compiler knows variables are
initialized in all valid paths.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-12-10 21:46:56 +01:00
Grazvydas Ignotas
ec08666a28 radv/meta: use VK_NULL_HANDLE for handles
Otherwise we get 32bit warnings because handle is plain uint64_t there
and NULL is not suited to initialize that.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-12-10 21:46:56 +01:00
Grazvydas Ignotas
9bff2c9884 radv: fix release build unused variable warnings
Just mark with MAYBE_UNUSED.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-12-10 21:46:56 +01:00
Grazvydas Ignotas
15e12ab8fc softpipe: fix release build unused variable warning
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-12-10 21:25:45 +01:00
Grazvydas Ignotas
c81a89f662 radeonsi: fix release build unused variable warnings
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-12-10 21:19:59 +01:00
Chad Versace
42011be1e2 i965/mt: Disable HiZ when sharing depth buffer externally (v2)
intel_miptree_make_shareable() discarded and disabled CCS. Fix it so
that it discards and disables HiZ too.

Fixes dEQP-EGL.functional.image.render_multiple_contexts.gles2_renderbuffer_depth16_depth_buffer
on Skylake.

v2: Actually do what the commit message says. Discard the HiZ buffer.

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=98329
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Nanley Chery <nanley.g.chery@intel.com
Cc: Haixia Shi <hshi@chromium.org>
Cc: mesa-stable@lists.freedesktop.org
2016-12-10 08:05:11 -08:00
Chad Versace
1c8be049be i965/mt: Disable aux surfaces after making miptree shareable
The entire goal of intel_miptree_make_shareable() is to permanently
disable the miptree's aux surfaces. So set
intel_mipmap_tree:disable_aux_buffers after the function's done with
discarding down the aux surfaces.

References: https://bugs.freedesktop.org/show_bug.cgi?id=98329
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Nanley Chery <nanley.g.chery@intel.com
Cc: Haixia Shi <hshi@chromium.org>
Cc: mesa-stable@lists.freedesktop.org
2016-12-10 08:05:11 -08:00
Jason Ekstrand
da1c49171d spirv: Use a simpler and more correct implementaiton of tanh()
The new implementation is more correct because it clamps the incoming value
to 10 to avoid floating-point overflow.  It also uses a much reduced
version of the formula which only requires 1 exp() rather than 2.  This
fixes all of the dEQP-VK.glsl.builtin.precision.tanh.* tests.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0" <mesa-dev@lists.freedesktop.org>
2016-12-09 18:38:21 -08:00
Jason Ekstrand
9807f502eb glsl: Use a simpler formula for tanh
The formula we have used in the past is a trivial reduction from the
definition by simply multiplying both the numerator and denominator of the
formula by 2.  However, multiplying by e^x, you can further reduce it.
This allows us to get rid of one side of the clamp and two of exponential
functions which should make it faster.  The new formula still passes the
dEQP precision tests for tanh so it should be fine.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-09 18:38:21 -08:00
Edward O'Callaghan
efe9d1cde3 anv: Clean up some unused variables
Following on from the spirit of commit 011e5570f.

Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-10 11:59:59 +11:00
Tim Rowley
2a127b780b swr: [rasterizer common/core/jitter] fetch support for GL_FIXED
v2: use fmul(1/65536) instead of fdiv(65535)

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-12-09 16:20:13 -06:00
Emil Velikov
d0d21532f9 configure: cleanup GLX_USE_TLS handling
Mesa requires ax_pthread_ok = yes, thus we can fold/rewrite the
conditional to follow the more common "if test" pattern.

No functional change intended.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-12-09 19:22:03 +00:00
Emil Velikov
b83153e77b configure: enable glx-tls by default
In the (not too) distant future we'd want to remove this option and
effectively drop the other codepath(s) we have in our dispatch.

Linux distributions have been using --enable-glx-tls for a number of
years. Some/most BSD platforms still don't support this, yet this should
serve as an encouragement to move things forwards.

Note: we had many bug reports were opened due to the wrong default
option. See the list below for details.

v2:
 - Correct default option in help string (Andreas)
 - Add bugzilla references.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70623
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72902
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73778
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89043

Cc: Jean-Sébastien Pédron <dumbbell@FreeBSD.org>
Cc: Jonathan Gray <jsg@jsg.id.au>
Cc: mesa-maintainers@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
2016-12-09 19:21:41 +00:00
Emil Velikov
0715ba4be6 docs: document how to (self-) reject stable patches
Document what has been the unofficial way to self-reject stable patches.

Namely: drop the mesa-stable tag and push the commit.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-12-09 17:37:30 +00:00
Emil Velikov
26541a1fcc egl: add and enable EGL_KHR_config_attribs
Extension is already implemented in the main code.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-12-09 17:36:28 +00:00
Emil Velikov
bf384a2d85 egl/surfaceless: remove duplicate KHR_image_base enablement
Already set by the core code - dri2_create_screen/dri2_setup_screen

Cc: Chad Versace <chadversary@chromium.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-12-09 17:36:26 +00:00
Eric Engestrom
9e1d35ca75 egl: unexport _eglConvertIntsToAttribs
Nobody else makes use of this function.
We can always re-export it if someone ever needs it.

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-12-09 17:33:43 +00:00
Eric Engestrom
4729e1b511 egl: rename static functions to match convention
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-12-09 17:33:36 +00:00
Haixia Shi
d4983390a8 compiler/glsl: fix precision problem of tanh
Clamp input scalar value to range [-10, +10] to avoid precision problems
when the absolute value of input is too large.

Fixes dEQP-GLES3.functional.shaders.builtin_functions.precision.tanh.* test
failures.

v2: added more explanation in the comment.
v3: fixed a typo in the comment.

Signed-off-by: Haixia Shi <hshi@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0" <mesa-dev@lists.freedesktop.org>
2016-12-09 09:14:20 -08:00
Tim Rowley
7aea08667c swr: [rasterizer core/memory] Finish R24_UNORM_X8_TYPELESS for AVX512
This one-off specialization was missed.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-12-09 10:41:31 -06:00
Bas Nieuwenhuizen
53e1c970ef radv: Use enum for memory types.
Inspired by patches from Eric Engestrom.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Cc: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-12-09 08:53:05 +01:00
Bas Nieuwenhuizen
4ae84efbc5 radv: Use enum for memory heaps.
Inspired by patches from Eric Engestrom.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Cc: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-12-09 08:53:05 +01:00
Bas Nieuwenhuizen
011e5570f8 radv: Clean up some unused variables.
Leftovers from anv?

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-12-09 08:53:05 +01:00
Timothy Arceri
8977cd4fdd i965: delay adding built-in uniforms to Parameters list
This is a step towards using NIR optimisations over GLSL IR
optimisations. Delaying adding built-in uniforms until after
we convert to NIR gives it a chance to optimise them away.

V2: move the new code back to brw_link_shader()

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-09 16:29:10 +11:00
Ilia Mirkin
429e2ec324 swr: [rasterizer core] supply proper clip distances to point sprites
Large points become pairs of triangles when rasterized, so we must feed
it three clip distances, one for each vertex.

The clip distance is not subject to sprite coord replacement, so there's
no interpolation of it. We just take its value and put it in the "z"
component of the barycentric-ready plane equation.

(We could also just cull it at an earlier point in time, but that would
require larger changes.)

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-12-08 22:47:39 -05:00
Ilia Mirkin
192317dfeb swr: [rasterizer core] perform perspective division on clip distances
Clip distances need to be perspective-divided. This fixes all the
interpolation-*-{distance,vertex} piglits.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-12-08 22:47:27 -05:00
Dave Airlie
bd56de88df radv/ac: no need to pass nir to the post outputs handling
We don't use the nir shader in here at all.

Reviewed by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-08 23:10:34 +00:00
Dave Airlie
d38eece4e6 radv: fix warnings in ubo load code.
Reviewed by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-08 23:06:30 +00:00
Dave Airlie
0fafe94a39 radv/ac: pass a mask of array params not a number.
This makes it easier to add new params before the array ones.

Reviewed by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-08 23:06:18 +00:00
Dave Airlie
257866ae46 radv: split out a chunk of variant filling code.
This code will have use for copy shaders etc.

Reviewed by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-08 23:06:14 +00:00
Dave Airlie
6cde094bf7 radv/meta: don't pass rect into blit2d src function.
Reviewed by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-08 23:06:10 +00:00
Dave Airlie
71a9574ffa radv/meta: cleanup image info setup.
This just passes the subresource info in and uses it.

Reviewed by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-08 23:06:06 +00:00
Dave Airlie
6f08dcd398 radv/meta: split copyimage api into api and meta function
This make it easier to add multiple queues later.

Reviewed by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-08 23:06:02 +00:00
Dave Airlie
0689b8f485 radv/meta: clean up buffer->image code.
Removes some unnecessary functions and pull
some stuff out of the loop.

Reviewed by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-08 23:05:58 +00:00
Dave Airlie
c46c376977 radv/ac: don't pass nir to create_function
This isn't needed for later things like geom shader copy shaders,
we won't have NIR.

Reviewed by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-08 23:05:52 +00:00
Dave Airlie
2a33049c70 radv: add missing license file to radv_meta_bufimage.
Just noticed this file was missing license and any
explaination of what is in it.

(stable just for license header reasons)
Reviewed by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-08 23:05:26 +00:00
Dave Airlie
e54af02567 radv/ac: use build_gep0 instead of opencoding it.
Reviewed by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-08 23:03:39 +00:00
Marek Olšák
31f988a9d6 radeonsi: disable the constant engine (CE) on Carrizo and Stoney
It must be disabled until the kernel bug is fixed, and then we'll enable CE
based on the DRM version.

Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-08 15:54:38 +01:00
Michel Dänzer
26ba8c920d radeonsi: Fix typo: "llvm.fs.interp" => "llvm.SI.fs.interp"
Fixes lots of pixel shaders failing to compile with LLVM 3.9 or older.

Trivial.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99013#c4
2016-12-08 10:51:03 +09:00
Dave Airlie
c7dc1b010a radv: make push constants optional
We don't set the push constants slot up unless
something will cause us to need it.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-07 23:26:19 +00:00
Dave Airlie
dfef9c7c1f radv: only emit descriptor sgprs when needed
This only emits enough descriptor sgprs for the number
of sets in the layout, and only emits the descriptors
necessary for the current stage.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-07 23:25:54 +00:00
Dave Airlie
ae61ddabe8 radv: move userdata sgpr ownership to compiler side.
This isn't fully what we want yet, but is a good step on the way.

This allows the compiler to create the information structures
for the state setting side, however the state setting still expects
things to be pretty much in 2 sgpr wide register sets, and can't
handle the indirect setting yet.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-07 23:25:49 +00:00
Dave Airlie
221ab77956 radv: refactor out the constant setting user sgpr code.
This just refactors out some common code to make future changes
easier to understand.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-07 23:25:45 +00:00
Dave Airlie
11208f0049 radv: refactor out the descriptor user sgpr setting.
This just splits some common code into a utility function.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-07 23:25:42 +00:00
Dave Airlie
a74a4edc90 radv: only bind descriptor sets to stages that need them
This copies the push constant code and only binds descriptor
sets to the stages that need them. It also now has to dirty
descriptors on pipeline binds.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-07 23:25:38 +00:00
Dave Airlie
85118a1e4d radv: move descriptor set userdata emission to draw flush time.
This is another step towards having the compiler decide the
user sgpr layout.

This still emits the descriptors sets for all shader types, but
we will fix this later.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-07 23:25:28 +00:00
Dave Airlie
a5d10844ee radv: refactor descriptor set userdata emission out.
This just moves this into a separate function.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-07 23:25:18 +00:00
Dave Airlie
f847676990 radv: pass pipeline to constant flush function
I'll need this later rather than just the layout.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-07 23:25:15 +00:00
Dave Airlie
eb2ba5c8df radv: consolidate compute pipeline flushing (v1.1)
This just moves some common code into a utility function
to avoid having to change multiple places later.

v1.1: rename function to better reflect what it does. (Bas)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-07 23:24:53 +00:00
Marek Olšák
13c34cf8ca radeonsi: wait for outstanding LDS instructions in memory barriers if needed
Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-07 19:40:29 +01:00
Marek Olšák
16ba04d6de tgsi: fix the src type of TGSI_OPCODE_MEMBAR
It's a literal integer. The next commit will need this.

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-07 19:40:29 +01:00
Marek Olšák
16f49c16c7 radeonsi: wait for outstanding memory instructions in TCS barriers
Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-07 19:40:29 +01:00
Marek Olšák
15e96c70b0 radeonsi: allow specifying simm16 of emit_waitcnt at call sites
The next commit will use this.

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-07 19:40:29 +01:00
Marek Olšák
57b9d75af5 radeonsi: write shader descriptors into hang reports
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-07 19:40:29 +01:00
Marek Olšák
6caa558ca6 radeonsi: check for sampler state CSO corruption
It really happens.

v2: declare "magic" in debug builds only

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
2016-12-07 19:40:03 +01:00
Marek Olšák
f2b0c66c3c radeonsi: properly declare context sampler states
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-07 18:46:54 +01:00
Marek Olšák
38d4859b94 radeonsi: fix incorrect FMASK checking in bind_sampler_states
Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-07 18:46:54 +01:00
Marek Olšák
b3a2aa9cba radeonsi: always restore sampler states when unbinding sampler views
Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-07 18:46:54 +01:00
Marek Olšák
d205faeb6c radeonsi: take LDS into account for compute shader occupancy stats
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-07 18:46:54 +01:00
Marek Olšák
132b69c4ed st/mesa: round lod_bias to a multiple of 1/256
This reduces the number of sampler states 3.6x in Batman Arkham: Origins.
(from ~7200 to ~2000)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-07 18:46:54 +01:00
Marek Olšák
4b0d8b2da0 gallium: decrease the size of pipe_sampler_state fields
We've had unused bits.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-12-07 18:46:54 +01:00
Marek Olšák
6dc96de303 cso: don't release sampler states that are bound
This fixes random radeonsi GPU hangs in Batman Arkham: Origins (Wine) and
probably many other games too.

cso_cache deletes sampler states when the cache size is too big and doesn't
check which sampler states are bound, causing use-after-free in drivers.
Because of that, radeonsi uploaded garbage sampler states and the hardware
went bananas. Other drivers may have experienced similar issues.

Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-12-07 18:46:54 +01:00
Jordan Justen
e9133dd90e i965: Increase max texture to 16k for gen7+
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98297
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-07 09:00:49 -08:00
Jordan Justen
d6526d7247 intel/blorp_blit: Add split_blorp_blit_debug switch
Enabling this debug switch causes surface shrinking to happen by
default, and lowers the surface size limit which causes blorp blits to
be split.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-07 09:00:49 -08:00
Jordan Justen
da381ae647 intel/blorp_blit: Enable splitting large blorp blits
Detect when the surface sizes are too large for a blorp blit. When it
is too large, the blorp blit will be split into a smaller operation
and attempted again.

For gen7, this fixes the cts test:

ES3-CTS.gtf.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_multisampled_to_singlesampled_blit

It will also enable us to increase our renderable size from 8k x 8k to
16k x 16k.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-07 09:00:49 -08:00
Jordan Justen
efea8e7244 intel/blorp_blit: Move RGB=>R conversion to follow blit splitting
In blorp_copy, when RGB surfaces are copied, we convert the
destination surface to a Red only surface, but 3 times as wide. This
introduces an implicit restriction of "mod 3" for the destination
width.

It is easier to handle the blorp split buffer offsetting with the
original RGB surface, and do the RGB=>R after this.

Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-07 09:00:49 -08:00
Jordan Justen
edf3113aed intel/blorp_blit: Adjust blorp surface parameters for split blits
If try_blorp_blit() previously returned that a blit was too large,
shrink_surface_params() will be used to update the surface parameters
for the smaller blit so the blit operation can proceed.

v2:
 * Use double instead of float. (Jason)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-07 09:00:49 -08:00
Jordan Justen
12e0a6e259 intel/blorp_blit: Split blorp blits if they are too large
We rename do_blorp_blit() to try_blorp_blit(), and add a return error
if the surface size for the blit is too large. Now, do_blorp_blit() is
rewritten to try to split the blit into smaller operations if
try_blorp_blit() fails.

Note: In this commit, try_blorp_blit() will always attempt to blit and
never return an error, which matches the previous behavior. We will
enable the size checking and splitting in a future commit.

The motivation for this splitting is that in some cases when we
flatten an image, it's dimensions grow, and this can then exceed the
programmable hardware limits. An example is w-tiled+MSAA blits.

v2:
 * Use double instead of float. (Jason)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-07 09:00:49 -08:00
Jordan Justen
b74d4f6ca0 intel/blorp_blit: Create structure for src & dst coordinates
This will be useful for splitting blits into smaller sizes.

We also make the coordinates of type double rather than float. Since
we will be splitting and scaling the coordinates, we might require
extra precision in the calculations.

v2:
 * Use double instead of float. (Jason)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-07 09:00:49 -08:00
Edward O'Callaghan
a77426fd92 vulkan: use STATIC_ASSERT instead of static_assert
Following the spirit of commit 23d1799f, fixes compilation
warnings on Android build due to lack of C11 features.

Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-07 22:32:38 +11:00
Lionel Landwerlin
e9f17e9fb0 i965: enable INTEL_conservative_rasterization on Gen9+
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
2016-12-07 11:02:19 +00:00
Lionel Landwerlin
039d836d6e mesa: add support for GL_INTEL_conservative_rasterization
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
2016-12-07 11:02:16 +00:00
Plamena Manolova
0ff74a8990 i965: Add i965 plumbing for ARB_post_depth_coverage for i965 (gen9+).
This extension allows the fragment shader to control whether values in
gl_SampleMaskIn[] reflect the coverage after application of the early
depth and stencil tests.

Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-07 11:01:50 +00:00
Plamena Manolova
8481386892 mesa: Add GL and GLSL plumbing for ARB_post_depth_coverage for i965 (gen9+).
This extension allows the fragment shader to control whether values in
gl_SampleMaskIn[] reflect the coverage after application of the early
depth and stencil tests.

Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
2016-12-07 11:01:50 +00:00
Nicolai Hähnle
d3931a355f radeonsi: fix isolines tess factor writes to control ring
Fixes piglit arb_tessellation_shader/execution/isoline{_no_tcs}.shader_test.

Cc: mesa-stable@lists.freedesktop.org
2016-12-07 11:21:32 +01:00
Kenneth Graunke
9871bde351 i965: Drop redundant key->outputs_written initialization.
This was already set to the same value earlier.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-12-06 22:14:58 -08:00
Kenneth Graunke
09ffc5c84f i965: Initialize "separate" flag in VUE maps.
This was uninitialized, which resulted in weird looking printouts where
it appeared that the TCS output and TES input patch URB entries differed
in SSO/non-SSO layout.  There is no "separable" layout for both, as
they're tied together.

It should have no other actual effect.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-12-06 22:14:58 -08:00
Ian Romanick
b87039499b nir: In split_var_copies_block, uint, int, and bool types cannot be matrices
Noticed while adding support for 64-bit integer types.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-06 17:30:38 -08:00
Tom Stellard
4c8c13b356 radeonsi: Use amdgcn intrinsics for fs interpolation
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-07 00:42:40 +00:00
Rob Clark
a9383ae6d6 freedreno/a5xx: fix draw packet size with index buffer
gpuaddr of idx buffer is now two dwords (64b).

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-06 18:01:31 -05:00
Rob Clark
ec24f009ca freedreno/a5xx: gmem bypass mode
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-06 18:01:31 -05:00
Rob Clark
85a3057f65 freedreno/a5xx: fix emit_string_marker()
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-06 18:01:31 -05:00
Rob Clark
c1e9cca696 freedreno: pitch alignment should match gmem alignment
Deal w/ differing gmem tile size alignment between generations, and make
sure texture pitch matches.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-06 18:01:31 -05:00
Rob Clark
8f4da2ff63 freedreno/a5xx: more formats
Bunch of stuff we can at least turn on for vbo formats.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-06 18:01:31 -05:00
Rob Clark
b337099849 freedreno/a5xx: fix fragface
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-06 18:01:31 -05:00
Rob Clark
f143eeaffa freedreno/a5xx: fix fragcoord
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-06 18:01:31 -05:00
Rob Clark
f5c5f76255 freedreno: update generated headers
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-06 18:01:31 -05:00
Rob Clark
3ec4d1f809 freedreno/a5xx: fix alpha test
GRAS_SU_DEPTH_PLANE_CNTL doesn't in fact seem to be anything to do with
alpha test.  This fixes xonotic and (other than some iommu faults) gets
gnome-shell working.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-06 18:01:31 -05:00
Rob Clark
2b305725e2 freedreno/a5xx: fix VPC_VAR[n].DISABLE bits
We don't need varying interpolators enabled for pos/psize out of the VS
(despite the fact that they show up in VS_OUT map), so emit these before
we append pos/psize to the linkage.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-06 18:01:31 -05:00
Nanley Chery
72db1570b4 anv/TODO: Document sampling from HiZ
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-06 14:51:30 -08:00
Kenneth Graunke
05a4e3a009 i965: Don't force SSO layout for VS->TCS.
This was a hack which worked around the VS and TCS disagreeing on their
shared interface due to the lack of varying packing.  In particular, it
was needed by Piglit's tcs-input-read-array-interface test.

However, that was just one case where things could go awry, so the
previous commit forcibly made interfaces match.  This hack is no longer
necessary.

It also seems to be broken, though I'm not sure why.  It fixes Piglit
regressions in spec/arb_shader_image_load_store/semantics from commit
ec1f159ac8.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98893
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-12-06 12:36:21 -08:00
Kenneth Graunke
44fd85d8eb i965: Unify shader interfaces explicitly.
A while ago, I made i965 start compiling shaders independently.  The VUE
map layouts were based entirely on each shader's input/output bitfields.
Assuming the interfaces match, this works out well - both sides will
compute the same layout, and outputs are correctly routed to inputs.

At the time, I had assumed that the linker would guarantee that the
interfaces match.  While it usually succeeds, it unfortunately seems
to fail in some cases.

For example, Piglit's tcs-input-read-array-interface test has a VS
output array with two elements, but the TCS only reads one.  The linker
isn't able to eliminate the unused element from the VS, which makes the
interfaces not match.

Another case is where a shader other than the last writes clip/cull
distances.  These should be demoted to ordinary varyings, but they
currently aren't - so we think they still have some special meaning,
and prevent them from being eliminated.

Fixing the linker to guarantee this in all cases is complicated.  It
needs to be able to optimize out dead code.  It's tied into varying
packing and other messiness.  While we can certainly improve it---and
should---I'd rather not rely on it being correct in all cases.

This patch ORs adjacent stages' input/output bitfields together,
ensuring that their interface (and hence VUE map layout) will be
compatible.  This should safeguard us against linker insufficiencies.

Fixes line rendering in Dolphin, and the Piglit test based on it:
spec/glsl-1.50/execution/geometry/clip-distance-vs-gs-out.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97232
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-12-06 12:34:23 -08:00
Jason Ekstrand
eb7b51d62a genxml/gen9: Change the default of MI_SEMAPHORE_WAIT::RegisterPoleMode
We would really like it to be false as that's what you get on hardware that
doesn't have RegisterPoleMode (Sky Lake for example).  While we're at it,
we change it to a boolean.  This fixes dEQP-VK.synchronization.smoke.events
on Broxton.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-12-06 11:35:13 -08:00
Roland Scheidegger
8ac3c1bf1a gallivm: optimize 16bit->32bit gather path a bit
LLVM can't really optimize anything which crosses scalar/vector boundaries,
so help a bit with some particular gather operations when the width is
expanded (only do it for 16->32bit expansion for now), by doing expansion
after fetch. That is probably a better solution anyway even if llvm would
recognize it, makes for cleaner IR...

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-12-06 20:06:06 +01:00
Roland Scheidegger
fd5f420fbb gallivm: handle 16bit float fetches in lp_build_fetch_rgba_soa
Note that we really want to _never_ reach the bottom of the function, which
resorts to AoS fetch.
Half floats can be handled just like other formats which fit into 32bit
vectors (so, only 1x16 and 2x16 formats, albeit with more channels things
are not THAT bad), with minimal plumbing. I've seen code size go down nearly
by a factor of 3 for a complete texture sampling function (including bilinear
filtering) using R16F.
(What we should do for everything not special cased is to do AoS gather,
shuffle/shift things into SoA vectors, and then do the conversion there.
Otherwise it's particularly bad with 1 or 2 channel formats - that r16f
format with either 4 or 8-wide vectors was still doing one element at a
time, essentially doing exactly the same work as for rgba16f. Also replacing
the channels with SWIZZLE0/1 (particularly the latter) adds even more
work, as it has to be done per aos vector, and not just straightforward
at the end with the SoA vector.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-12-06 20:06:06 +01:00
Roland Scheidegger
775a244645 util: (trivial) ETC1 meets the criteria for fitting into unorm8
Just like other similar compressed formats.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-12-06 20:06:06 +01:00
Matt Turner
43cdbb3e6a i965: Emit proper NOPs.
The PRMs for HSW and newer say that other than the opcode and DebugCtrl
bits of the instruction word, the rest must be zero.

By zeroing the instruction word manually, we avoid using any of the
state inherited through brw_codegen.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96959
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-12-06 10:42:46 -08:00
Roland Scheidegger
9c95ad24cc glsl: (trivial) fix type typo
Accidentally changed the type of a constant in
df33f11b39 causing assertion failures.
2016-12-06 17:44:21 +01:00
Kenneth Graunke
a41f5dcb14 i965: Allocate at least some URB space even when max_vertices = 0.
Allocating zero URB space is a really bad idea.  The hardware has to
give threads a handle to their URB space, and threads have to use that
to terminate the thread.  Having it be an empty region just breaks a
lot of assumptions.  Hence, why we asserted that it isn't possible.

Unfortunately, it /is/ possible prior to Gen8, if max_vertices = 0.
In theory a geometry shader could do SSBO/image access and maybe
still accomplish something.  In reality, this is tripped up by
conformance tests.

Gen8+ already avoids this problem by placing the vertex count DWord
in the URB entry header.  This fixes things on earlier generations.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
2016-12-05 20:47:03 -08:00
Roland Scheidegger
cd9bb4b918 main: allow NEAREST_MIPMAP_NEAREST for stencil texturing
As per GL 4.5 rules, which fixed a spec mistake in GL_ARB_stencil_texturing.
The extension spec wasn't updated, but just allow it with older GL versions
as well, hoping there aren't any crazy tests which want to see an error
there... (Compile tested only.)

Reported by Józef Kucia <joseph.kucia@gmail.com>

Acked-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-06 04:10:43 +01:00
Roland Scheidegger
df33f11b39 glsl: fix ldexp lowering if bitfield insert lowering is also requested
Trivial, this just resurrects the code which was there once upon a time
(the code can't lower instructions generated in the lowering pass there,
and even if it could it would probably be suboptimal).
This fixes piglit mesa_shader_integer_functions fs-ldexp.shader_test and
vs-ldexp.shader_test with llvmpipe.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-12-06 04:10:43 +01:00
Nayan Deshmukh
3015a23fe0 radv: fix resource leak in radv_amdgpu_ctx_create
CovID: 1396387

V2. Fixup bad whitespace.

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folkore1984.net>
2016-12-06 11:49:01 +11:00
Andy Furniss
5338fb34d6 st/omx/enc Raise default encode level
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=91281

Signed-off-by: Andy Furniss <adf.lists@gmail.com>

Reviewed-by: Christian König <christian.koenig@amd.com>
2016-12-05 19:39:47 -05:00
Andy Furniss
2a38a5b2b2 radeon/vce Handle H.264 level 5.2
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=91281
v2: explicitly add case 52

Signed-off-by: Andy Furniss <adf.lists@gmail.com>

Reviewed-by: Christian König <christian.koenig@amd.com>
2016-12-05 19:39:47 -05:00
Jason Ekstrand
7db009b59e nir: Remove some unused fields from nir_variable
All of these are happily set from glsl_to_nir or spirv_to_nir but their
values are never used for anything.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-12-05 15:40:10 -08:00
Jason Ekstrand
50e0b0bee3 nir: Delete most of the constant_initializer support
Constant initializers have been a constant (ha!) pain for quite some time.
While they're useful from a language perspective, people writing passes or
backends really don't want deal with them most of the time.  This commit
removes most of the constant initializer support from NIR.  It is expected
that you call nir_lower_constant_initializers VERY EARLY to ensure that
they're gone before you do anything interesting.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-12-05 15:40:09 -08:00
Jason Ekstrand
2f19c19b5d nir: Simplify nir_lower_gs_intrinsics
It's only ever called on single-function shaders.  At this point, there are
a lot of helpers that can make it all much simpler.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-12-05 15:40:09 -08:00
Jason Ekstrand
257aa5a1c4 nir/lower_returns: Stop using constant initializers
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-12-05 15:40:09 -08:00
Jason Ekstrand
507626304c glsl/nir: Call nir_lower_constant_initializers
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-12-05 15:40:09 -08:00
Jason Ekstrand
c5d664f9dc anv/pipeline: Call nir_lower_constant_initializers
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-12-05 15:40:09 -08:00
Jason Ekstrand
f5232db9e5 nir: Add a pass for lowering away constant initializers
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-12-05 15:40:09 -08:00
Jason Ekstrand
0291bf4db2 Revert "i965: use nir_lower_indirect_derefs() for GLSL"
This reverts commit 9404439a75.  I didn't
intend to push it and it breaks clip and cull distance.
2016-12-05 15:21:20 -08:00
Jason Ekstrand
5f0e4c7c79 i965: Delete the meta-base CopyImageSubData implementation
When I originally implemented the ARB_copy_image extension, the fast-path
was written in meta using texture views.  This path only worked if both
images were uncompressed color images.  All of the other cases fell back to
the blitter or, in the worst case, mapping and memcpy on the CPU.  Now that
we have the blorp path, it handles all copies ever and the old meta,
blitter, and CPU paths are only used on gen5 and below.  The primary reason
why we needed the meta path (apart from having a slow blitter on later
hardware) was to handle multisampling which gen5 and earlier don't support
anyway.  Since the blitter is reasonably fast on gen5, we can just delete
the meta path and get rid of all that terrible code.

If we decide that we're ok with just disabling ARB_copy_image on gen5 and
earlier (I personally am), then we could get rid of another 300 lines or so
of semi-hairy code.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-12-05 14:00:35 -08:00
Jason Ekstrand
06d864921e i965/copy_image: Re-implement the blitter path with emit_miptree_blit
By using emit_miptree_blit which does chunking, this fixes the blitter path
for the case where the image is too tall to blit normally.  We also pull it
into intel_blit as intel_miptree_copy.  This matches the naming of the
blorp blit and copy functions brw_blorp_blit and brw_blorp_copy.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "13.0" <mesa-dev@lists.freedesktop.org>
2016-12-05 14:00:35 -08:00
Jason Ekstrand
6c74e7f492 i965/blit: Break the guts of intel_miptree_blit into a helper
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "13.0" <mesa-dev@lists.freedesktop.org>
2016-12-05 14:00:35 -08:00
Timothy Arceri
9404439a75 i965: use nir_lower_indirect_derefs() for GLSL
This moves the nir_lower_indirect_derefs() call into
brw_preprocess_nir() so thats is called by both OpenGL and Vulkan
and removes that call to the old GLSL IR pass
lower_variable_index_to_cond_assign()

We want to do this pass in nir to be able to move loop unrolling
to nir.

There is a increase of 1-3 instructions in a small number of shaders,
and 2 Kerbal Space program shaders that increase by 32 instructions.

Shader-db results BDW:

total instructions in shared programs: 8705873 -> 8706194 (0.00%)
instructions in affected programs: 32515 -> 32836 (0.99%)
helped: 3
HURT: 79

total cycles in shared programs: 74618120 -> 74583476 (-0.05%)
cycles in affected programs: 528104 -> 493460 (-6.56%)
helped: 47
HURT: 37

LOST:   2
GAINED: 0
2016-12-05 14:00:35 -08:00
Tim Rowley
0c70b26a2d swr: mark PIPE_CAP_NATIVE_FENCE_FD unsupported
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-12-05 13:42:39 -06:00
Tim Rowley
efc3ca64ba swr: include llvm version and vector width in renderer string
Uses llvmpipe's string formating.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-12-05 13:42:39 -06:00
Tim Rowley
b035d9cab5 gallivm: use getHostCPUFeatures on x86/llvm-4.0+.
Use llvm provided API based on cpuid rather than our own
manually mantained list of mattr enabling/disabling.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-12-05 13:42:39 -06:00
Juan A. Suarez Romero
48416b6f4d st/va: declare vlVaBuffer before vlVaContext
And declare coded_buf in vlVaContext as "vlVaBuffer *" instead of
"struct vlVaBuffer *".

This fixes several warnings later about assignment from incompatible
pointer type.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-12-05 17:03:57 +00:00
Juan A. Suarez Romero
5a585d019e st/va: remove unused variable pbuff
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
2016-12-05 17:03:56 +00:00
Emil Velikov
510722d146 st/va: automake: cleanup C{PP,}FLAGS
Remove some transitional left overs from the gallium pipe-loader rework
and kill off unneeded AM_CPPFLAGS.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-12-05 17:03:56 +00:00
Rob Clark
8ca14b04e1 add EGL_TEXTURE_EXTERNAL_WL to WL_bind_wayland_display spec
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2016-12-05 16:01:21 +00:00
Emil Velikov
d09da32cfa docs: add news item and link release notes for 12.0.5
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-12-05 15:42:58 +00:00
Emil Velikov
d7747cccaf docs: add sha256 checksums for 12.0.5
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 6b1c3c3aa0)
2016-12-05 15:38:56 +00:00
Emil Velikov
ef0417e8d1 docs: add release notes for 12.0.5
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 01579a9d00)
2016-12-05 15:38:55 +00:00
Tobias Droste
0e9a5be7e7 configure.ac: Create correct LLVM_VERSION_INT with minor >= 10
This makes sure that we handle LLVM minor version >= 10 correctly.

Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-12-05 14:43:47 +00:00
Tobias Droste
a66cd76b16 configure.ac: Get complete LLVM version from header
Major and minor version are included in the header file since LLVM
version 3.1.0. Since the minimal required version is 3.3.0 we can
remove the workaround if no values for major/minor were found in the
header.

Since LLVM 3.6.0 the patch version is inside the header file of LLVM.
Only radeon drivers need the patch version and they depend on
LLVM >= 3.6.0, so this is safe too.

Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-12-05 14:43:47 +00:00
Tobias Droste
5db89531bc configure.ac: Add required LLVM versions to the top
Consolidate the required LLVM versions at the top where the other
versions for dependencies are listed.

v5:
Splitted out separate changes (see patch 19 and 20)

Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-12-05 14:43:47 +00:00
Tobias Droste
45c8a4ea0a configure.ac: Only add default LLVM components if needed
LLVM components are only added when LLVM is needed.
This means gallium adds this as soon as "--enable-gallium-llvm"
is "yes" and radv + opencl add it explicitly.

v5:
Removed hunk that disabled LLVM for gallium if it was not found.

Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-12-05 14:43:47 +00:00
Tobias Droste
a8ae340f7e configure.ac: Reorder arguments in radeon_llvm_check
Use the same order as llvm_check_version_for.

Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-12-05 14:43:47 +00:00
Tobias Droste
3f42859367 configure.ac: Move radv check to the Vulkan section
This moves the LLVM check for radv to the corresponding driver section.
No functional change.

Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-12-05 14:43:47 +00:00
Tobias Droste
c702369bf5 configure.ac: Move LLVM ac_subst closer to usage
This moves llvm_set_environment_variables to its final destination
and moves all the LLVM AC_SUBST() below the function call.
No functional change.

Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-12-05 14:43:47 +00:00
Tobias Droste
62f4e6f272 configure.ac: Move oCL LLVM checks to the oCL section
The LLVM checks can be anywhere below line 1161 now.
Move the openCL LLVM checks to the section with the other openCL checks.
No functional change.

Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
[Emil Velikov: s/ipos/ipo/, drop "yes" argument from llvm_add_component]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-12-05 14:43:47 +00:00
Tobias Droste
9d14a25bee configure.ac: Move llvm_set_environment_variables higher.
This moves the function to get the LLVM environment variables higher
in the file. It still needs to be below the "--enable-opencl" because
it uses $enable_opencl.
It can be called without condition now as it only throws errors if
openCL is enabled.

v5:
HAVE_MESA_LLVM is only used for gallium. Rename it to HAVE_GALLIUM_LLVM.
In order to only link LLVM when it is needed, HAVE_GALLIUM_LLVM is only
set if "$enable-gallium-llvm" is yes.

Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-12-05 14:43:46 +00:00
Tobias Droste
19ff3975de configure.ac: Remove swr_llvm_check()
No need for an additional function here.
Use the same style for LLVM checks as the other drivers
(e.g. r300, llvmpipe) that don't need a load of other checks.
Instead of open conding the LLVM version check, use the
function used by other drivers.

"enable_gallium_llvm" is checked by gallium_require_llvm().

Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-12-05 14:43:46 +00:00
Tobias Droste
b3119a3360 configure.ac: Check gallium LLVM version in gallium_require_llvm
This moves the LLVM version check to the helper function
gallium_require_llvm() and uses the llvm_check_version_for() helper
instead of open conding the LLVM version check.

gallium_require_llvm is functionally the same as before, because
"enable_gallium_llvm" is only set to "yes" if the host cpu is x86:

if test "x$enable_gallium_llvm" = xauto; then
    case "$host_cpu" in
    i*86|x86_64|amd64) enable_gallium_llvm=yes;;
    esac
fi

This function is also only called now when needed.
Before this patch llvmpipe would call this as soon as LLVM is
installed. Now it only gets called by llvmpipe if gallium
LLVM is actually enabled (i.e. only on x86).

Both reasons mentioned above remove the need to check host cpu
in the gallium_require_llvm function.

Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-12-05 14:43:46 +00:00
Tobias Droste
44a672ef0e configure.ac: Use short names for r600 und r300
There are no non gallium r300 and r600 drivers anymore.
No need to explicilty mention gallium here.
Just cosmetics, no functional change.

Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-12-05 14:43:46 +00:00
Tobias Droste
1ca0486147 configure.ac: Remove useless oCL LLVM check
This is handled by "llvm_check_version_for" for openCL.

Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-12-05 14:43:46 +00:00
Tobias Droste
8c98e27074 configure.ac: Move llvm-config searching outside the function
There's no harm in always searching llvm-config.
This way it's available as soon as possible for all functions.

Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-12-05 14:43:46 +00:00
Tobias Droste
0cc4ffd67b configure.ac: Move LLVM functions to the top
This just moves code around so that all LLVM related stuff is at the
top of the file in the correct order.
No functional change.

Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-12-05 14:43:46 +00:00
Tobias Droste
bf4b0fc33b configure.ac: Move LLVM version check to the top
A function with the LLVM version checked is moved to the top.
The function is called where the old code was.
No functional change.

Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
[Emil Velikov: s/ipos/ipo/, drop "yes" argument from llvm_add_component]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-12-05 14:43:46 +00:00
Tobias Droste
9a3bccc75e configure.ac: Use new helper function for LLVM
Use the new helper function to add LLVM targets and components.
The components are added one by one to later find out which component
is missing in case there is one.

Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
[Emil Velikov: s/ipos/ipo/, drop "yes" argument from llvm_add_component]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-12-05 14:43:46 +00:00
Tobias Droste
d434633b76 configure.ac: Use new llvm_add_default_components
Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-12-05 14:43:46 +00:00
Tobias Droste
352831c5d9 configure.ac: Add helper function for targets/components
Add functions to add and check targets/components.
Not used in this patch.

Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-12-05 14:43:46 +00:00
Tobias Droste
2350387d24 configure.ac: Don't search llvm-config if it's known
This way LLVM_CONFIG can bet set from an env variable if it's outside
the $llvm_prefix.

This is not a must, but it helps testing.

Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-12-05 14:43:46 +00:00
Boyuan Zhang
3949d7c6ea st/va: fix gop size for rate control
The gop_size in rate control is the budget window for internal rate
control calculation, and shouldn't always equal to idr period. Define
a coefficient to let budget window contains a number of idr period for
proper rate control calculation. Adjust the number of i/p frame remaining
accordingly.

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=98005

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2016-12-05 09:23:38 -05:00
Boyuan Zhang
8206882392 st/va: force to submit two consecutive single jobs
The gop_size in rate control is the budget window for internal rate
control calculation, and shouldn't always equal to idr period. Define
a coefficient to let budget window contains a number of idr period for
proper rate control calculation. Adjust the number of i/p frame remaining
accordingly.

v2: fixed regression issues introduced by previous version

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=98005

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2016-12-05 09:23:38 -05:00
Nayan Deshmukh
7b811c362a st/vdpau: fix compiler warning in vlVdpVideoMixerRender
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-12-05 11:20:55 +01:00
Topi Pohjolainen
5b27405eff i965: Release aux buffer when disabling ccs
Otherwise subsequent render cycles keep on using compression
and/or fast clear.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-05 09:20:05 +02:00
Bas Nieuwenhuizen
92d7563fba ac/nir: Only use the first component for SSBO atomics.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-12-05 01:40:54 +01:00
Dave Airlie
8033f78f94 radv: fix another regression since shadow fixes.
This fixes:
dEQP-VK.glsl.texture_gather.basic.2d.depth32f.*

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-12-05 10:14:37 +10:00
Iago Toral Quiroga
66e7effc85 spirv: Builtin Layer is an input for fragment shaders
This change makes it so we emit a load_input intrinsic when Layer
is read in a fragment shader.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-03 20:50:57 +01:00
Bruce Cherniak
a7b510f656 swr: Fix active_queries count
The active_query count was incorrect for query types that don't require
a begin_query.  Removed the unnecessary assert.

Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-12-02 14:36:28 -06:00
George Kyriazis
2085088033 swr: Fix type to match parameters of std::max()
Include propagation of comparisons further down.

Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-12-02 14:36:28 -06:00
Tim Rowley
f1ca377ab1 swr: [rasterizer jitter] include cstdarg in builder_misc.cpp
Fixes build problem with llvm-svn.

v2: use cstdarg instead of stdarg.h

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-12-02 14:36:28 -06:00
Jason Ekstrand
19a541f496 nir: Get rid of nir_constant_data
This has bothered me for about as long as NIR has been around.  Why do we
have two different unions for constants?  No good reason other than one of
them is a direct port from GLSL IR.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-12-02 10:53:32 -08:00
Timothy Arceri
c45d84ad83 Revert "st/mesa: get Version from gl_program rather than gl_shader_program"
This reverts commit 6bf63b0119.

A patch that adds a reference to gl_shader_program_data to gl_program
needs to land befor this one.
2016-12-02 16:44:44 +11:00
Timothy Arceri
6bf63b0119 st/mesa: get Version from gl_program rather than gl_shader_program
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-02 13:54:54 +11:00
Timothy Arceri
ab8c01386a st/mesa/glsl: move Version to gl_shader_program_data
This is mostly just used during linking however the st uses it
when updating textures.

In order to store gl_program in the CurrentProgram array
rather than gl_shader_program we need to move this field to
the shared gl_shader_program_data struct.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-02 13:54:47 +11:00
Rob Clark
534917495d freedreno: no-op render when we need a fence
If app tries to create a fence but there is no rendering to submit, we
need a dummy/no-op submit.  Use a string-marker for the purpose.. mostly
since it avoids needing to realize that the packet format changes in
later gen's (so one less place to fixup for a5xx).

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-01 20:24:59 -05:00
Rob Clark
0b98e84e9b freedreno: native fence fd support
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-01 20:24:46 -05:00
Rob Clark
16f6ceaca9 freedreno: some fence cleanup
Prep-work for next patch, mostly move to tracking last_fence as a
pipe_fence_handle (created now only in fd_gmem_render_tiles()), and a
bit of superficial renaming.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-01 20:16:31 -05:00
Rob Clark
026a7223a6 gallium: support for native fence fd's
This enables gallium support for EGL_ANDROID_native_fence_sync, for
drivers which support PIPE_CAP_NATIVE_FENCE_FD.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-12-01 20:16:31 -05:00
Rob Clark
72cc1ca58d gallium: wire up server_wait_sync
This will be needed for explicit synchronization with devices outside
the gpu, ie. EGL_ANDROID_native_fence_sync.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-01 20:16:31 -05:00
Rob Clark
0201f01dc4 egl: add EGL_ANDROID_native_fence_sync
With fixes from Chad squashed in, plus fixes for issues that Rafael
found while writing piglit tests.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Tested-by: Chad Versace <chadversary@chromium.org>
2016-12-01 10:57:35 -08:00
Rob Clark
21b1acfcfe dri: extend fence extension to support native fd fences
Required to implement EGL_ANDROID_native_fence_sync.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Tested-by: Chad Versace <chadversary@chromium.org>
2016-12-01 10:57:35 -08:00
Rob Clark
2ba4c7e154 egl: un-fallthrough sync attr parsing
Doesn't work so well when you start having more than one possible
attrib.  Prep-work for next patch.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Tested-by: Chad Versace <chadversary@chromium.org>
2016-12-01 10:57:24 -08:00
Rob Clark
cce04a4630 egl: initialize SyncCondition after attr parsing
Reduce the noise in the next patch.  For EGL_SYNC_NATIVE_FENCE_ANDROID
the sync condition is conditional on EGL_SYNC_NATIVE_FENCE_FD_ANDROID
attribute.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Tested-by: Chad Versace <chadversary@chromium.org>
2016-12-01 10:52:55 -08:00
Tim Rowley
05f35a868c tgsi: store writes_primid when scanning tgsi
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-01 11:33:01 -06:00
Ilia Mirkin
7c16552f8d mesa: only verify that enabled arrays have backing buffers
We were previously also verifying that no backing buffers were available
when an array wasn't enabled. This is has no basis in the spec, and it
causes GLupeN64 to fail as a result.

Fixes: c2e146f487 ("mesa: error out in indirect draw when vertex bindings mismatch")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-12-01 06:35:13 -05:00
Eric Anholt
51244859e3 vc4: Avoid false scheduling dependencies for LOAD_IMMs.
Noticed in shaders with branching, where we ended up scheduling delay
slots near the start of a block for the uniforms reset setup.

total instructions in shared programs: 93970 -> 93951 (-0.02%)
instructions in affected programs:     3117 -> 3098 (-0.61%)

3DMMES performance +0.423087% +/- 0.133521% (n=9,10)
2016-11-30 19:58:09 -08:00
Eric Anholt
6c34084d8e vc4: Try to schedule QIR instructions between writing to and reading math.
This helps us get the delay slots between SFU writes and reads filled.

total instructions in shared programs: 94494 -> 93970 (-0.55%)
instructions in affected programs:     59206 -> 58682 (-0.89%)

3DMMES performance +1.89967% +/- 0.157611% (n=10,9)
2016-11-30 19:58:09 -08:00
Eric Anholt
d182740ac8 vc4: Improve interleaving of texture coordinates vs results.
The latency_between was trying to handle the delay between the coordinate
write ("before") and the corresponding sample read ("after"), but we were
handing in the two instructions swapped.

This meant that we tried to fit things between a tex_s and its *preceding*
tex_result.  This made us only interleave normal texture coordinates by
accident, and pessimized UBO reads by pushing the tex_result collection
earlier until there was nothing but it (and then its preceding coordinate
setup) left.

In addition to latency reduction, things end up packing better (probably
due to reduced live ranges of the texture results):

total instructions in shared programs: 98121 -> 94775 (-3.41%)
instructions in affected programs:     91196 -> 87850 (-3.67%)

3DMMES performance +1.15569% +/- 0.124714% (n=8,10)
2016-11-30 19:58:09 -08:00
Eric Anholt
1f9daf7cd1 vc4: Fix stray "." on no-op MUL packs.
This happened when the PM bit was set for R4 unpacks, where the MUL pack
was NOP.
2016-11-30 19:58:09 -08:00
Eric Anholt
98d7e87488 vc4: Allow merging instructions with SF set where the other writes NOP.
I'm not sure how I managed to write the SF merge code
(7d8b79f398) without allowing merges with
NOPs.  *Everything* we try to merge with will have a NOP on one or the
other side of the instruction, and that's why that commit showed no
benefit.

total instructions in shared programs: 99347 -> 95128 (-4.25%)
instructions in affected programs:     91906 -> 87687 (-4.59%)

3DMMES performance +2.57105% +/- 0.135276% (n=6,8)
2016-11-30 19:58:09 -08:00
Eric Anholt
8e5ec33f11 vc4: In a loop break/continue, jump if everyone has taken the path.
This should be a win for most loops, which tend to have uniform control
flow.

More importantly, it exposes important information to live variables: that
the break/continue here means that our jump target may have access to
values that were live on our input.  Previously, we were just setting the
exec mask and letting control flow fall through, so an intervening def
between the break and the end of the loop would appear to live variables
as if it screened off the variable, when it didn't actually.

Fixes a regression in glsl-vs-loop-redundant-condition.shader_test when a
perturbing of register allocation caused a live variable to get stomped.

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
2016-11-30 19:58:09 -08:00
Ilia Mirkin
fda1d0187d anv: expose support for VK_KHR_sampler_mirror_clamp_to_edge
This is already supported in genX_state.c, expose the extension string.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-11-30 20:49:04 -05:00
Jason Ekstrand
27433b26b1 anv/cmd_buffer: Actually use the stencil dimension
In an attempt to fix 3DSTATE_DEPTH_BUFFER for stencil-only cases, I
accidentally kept setting the SurfaceType to 2D in the stencil-only case
thanks to a copy+paste error.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-11-30 17:42:42 -08:00
Ilia Mirkin
ef59cb0820 swr: add streamout buffer offset into pBuffer pointer
The buffer_size does not take the offset into account. Just add the
offset into the pointer which lines up the structures much better.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-30 20:36:03 -05:00
Ilia Mirkin
3d837a8871 swr: fix assertion for max number of so targets
The number has to be less than or equal to the max, not just less than.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-30 20:36:00 -05:00
Ilia Mirkin
02b2efa5eb swr: properly report max number of SO components
The components count the number of individual values, not the number of
slots.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-30 20:35:56 -05:00
Ilia Mirkin
ab3bbe06ed swr: turn off queries around blits
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-30 20:35:53 -05:00
Ilia Mirkin
d8ce8acdfa swr: don't advertise stream pause/resume
There is no support for resuming streamout. Furthermore, this also
controls glDrawTransformFeedback functionality which requires the same
ability to query how many primitives were sent out of TF.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-30 20:35:43 -05:00
Ilia Mirkin
632c11e857 swr: fix range computation for instanced client-side arrays
We need to take the instance divisor and number of instances into
account for instanced client-side arrays, rather than the vertex
parameters.

Loosely based on the comparable nvc0 logic.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-30 20:35:33 -05:00
Ilia Mirkin
3b736acf1b swr: [rasterizer memory] assert when trying to convert an unknown format
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-30 20:35:16 -05:00
Ilia Mirkin
763c015ce5 swr: remove warning about multi-layer surfaces
We now support clearing these, and actually rendering to multiple layers
would require GS support, which will fail in much more spectacular ways
for now. Once that is hooked up, there won't be anything else to do
here.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-30 20:35:06 -05:00
Ilia Mirkin
a9d292f5bd swr: [rasterizer core] don't attempt to load another RTAI when storing
Since we don't pass a renderTargetArrayIndex in, and the current hot
tile may be for a different index, we may end up loading the RTAI=0 into
the hot tile for no reason.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-30 20:34:55 -05:00
Marek Olšák
77014a0ad3 radeonsi: document a CP DMA bug that doesn't need a workaround yet
This one is easy to miss, because it's not documented in any internal doc.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-01 02:16:51 +01:00
Marek Olšák
bacf9b4e73 radeonsi: apply the double EVENT_WRITE_EOP workaround to VI as well
Internal docs don't mention it, but they also don't mention that the bug
has been fixed (like other CI bugs fixed in VI).

Vulkan does this too.

v2: also update r600_gfx_write_fence_dwords

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
2016-12-01 02:16:51 +01:00
Marek Olšák
a816c7fe07 radeonsi: add a tess+GS hang workaround for VI dGPUs
ported from Vulkan

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-01 02:16:51 +01:00
Marek Olšák
da7453666a radeonsi: don't apply the Z export bug workaround to Hainan
not needed

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-01 02:16:51 +01:00
Marek Olšák
78c4528ae7 radeonsi: apply a tessellation bug workaround for SI
Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-01 02:16:51 +01:00
Marek Olšák
72e46c9889 radeonsi: apply a TC L1 write corruption workaround for SI
Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-01 02:16:51 +01:00
Marek Olšák
72d48fcd8e radeonsi: apply a multi-wave workgroup SPI bug workaround to affected CIK chips
All codepaths are handled except for clover.

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-01 02:16:51 +01:00
Marek Olšák
ec36c63b4f radeonsi: consolidate max-work-group-size computation
The next commit will need this.

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-01 02:16:51 +01:00
Timothy Arceri
966567aa12 mesa: reset linked_stages bitmask when re-linking
34953f8907 added this bitmask but it wasn't being reset when
a program was relinked. If a stage was removed from the new
program then it could case a crash as we expect the linked shader
for that stage to not be null.

Fixes crashes in:
ESEXT-CTS.tessellation_shader.single.xfb_captures_data_from_correct_stage
ES31-CTS.core.tessellation_shader.single.xfb_captures_data_from_correct_stage

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98917
2016-12-01 10:24:16 +11:00
Rob Clark
45eef9af03 freedreno/a5xx: fix negative branches
Looks like immed branch offset size increased again.. making what we
think is a small negative number look to hw like a huge positive number.
And things go badly when shader tries to jump to hyperspace.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-30 17:32:54 -05:00
Rob Clark
ef30e91fe6 freedreno: fix android build with a5xx
Android doesn't build all the files that normal linux/autotools build
does (mainly standalond ir3_compiler).. but possibly we should pull
C_SOURCES + aNxx_SOURCES into a single variable picked up by both
Android.mk and Makefile.am?  (Suggested by Rob H.)

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-30 17:32:54 -05:00
Rob Clark
8b6f8f2576 freedreno/a5xx: fix discard
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-30 17:32:54 -05:00
Ville Syrjälä
676c0cf287 anv: Prefer in-tree headers to out-of-tree headers
Set the include paths to consider in-tree headers before out-of-tree
headers.

Avoids the build failing due to stale headers being present in
$prefix. Previosuly 'make -ki install' or something similar was required
to update the out-of-tree headers to allow the build to succeed.

Also avoids having to rebuild the entire thing after every 'make
install'.

Cc: Rob Clark <robdclark@gmail.com>
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-11-30 20:01:00 +02:00
Rob Clark
946cf4eb68 freedreno/a5xx: initial support
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-30 12:35:49 -05:00
Rob Clark
fcba3046e1 freedreno: update generated headers
Pull in a5xx

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-30 12:25:48 -05:00
Rob Clark
8c56789f60 freedreno: make gmem tile size alignment configurable
a5xx seems to prefer 64 pixel alignment, in at least some cases.  Make
this configurable per generation.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-30 12:25:48 -05:00
Rob Clark
728e2c4d38 freedreno/ir3: don't offset inloc by 8
On a3xx/a4xx, the SP_VS_VPC_DST_REG.OUTLOCn is offset by 8, so we used
to add this offset into fs->inputs[n].inloc.  But a5xx drops this extra
offset-by-8.  So instead make inloc zero based and add the offset when
we emit OUTLOCn values (for the gen's that need the offset).

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-30 12:25:48 -05:00
Rob Clark
7a59157287 freedreno/a3xx: use new shader linkage helper
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-30 12:25:48 -05:00
Rob Clark
98c83b5d1c freedreno/a4xx: use new shader linkage helper
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-30 12:25:48 -05:00
Rob Clark
1be5670c8d freedreno/ir3: add new helper for shader linkage
Helps simplify things on a5xx, where pos/psize get added to the vs-out
map.  And anyways, simplifies a3xx and a4xx.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-30 12:25:48 -05:00
Nicolai Hähnle
f60374aa68 st/mesa: skip lower_output_reads when possible
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-30 09:10:02 +01:00
Nicolai Hähnle
0a58b258ca st/glsl_to_tgsi: swizzle PROGRAM_OUTPUTs correctly in src_register translation
This is required for reading directly from fragment shader stencil and depth
outputs.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-30 09:09:59 +01:00
Nicolai Hähnle
611166b8ed gallium: add PIPE_CAP_TGSI_CAN_READ_OUTPUTS
Drivers that support this benefit by saving one lowering pass in the
GLSL-to-TGSI conversion.

radeonsi already supports this because all outputs are stored in temporary
variables before the export (except for TCS outputs, which have always
been readable in TGSI anyway due to their special semantics).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-30 09:09:50 +01:00
Bas Nieuwenhuizen
abc887faa1 ac/nir: Fix out of bounds array access.
With nir_intrinsic_ssbo_atomic_comp_swap we run out of params.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-11-30 07:09:38 +01:00
Kristian H. Kristensen
d3d7cab812 aubinator: Add support for enum types
Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-29 22:02:49 -08:00
Kristian H. Kristensen
7fc659d8d5 intel/genxml: Fix ksp for INTERFACE_DESCRIPTOR_DATA
This one was split across two dwords as "Kernel Start Pointer" and
"Kernel Start Pointer High", which looks like it works when the driver
only accesses "Kernel Start Pointer". This breaks, of course, with BO
offsets > 4G.

Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-29 22:02:49 -08:00
Kristian H. Kristensen
99e573b4e0 intel/genxml: Use enum 3D_Logic_Op_Function where applicable
Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-29 22:02:49 -08:00
Kristian H. Kristensen
374d19ac00 intel/genxml: Use blend function and factor enums where applicable
Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-29 22:02:49 -08:00
Kristian H. Kristensen
09fe8ad010 intel/genxml: Use enum 3D_Vertex_Component_Control where applicable
Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-29 22:02:49 -08:00
Kristian H. Kristensen
54e71e5851 intel/genxml: Use enum 3D_Stencil_Operation where applicable
Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-29 22:02:49 -08:00
Kristian H. Kristensen
193c1b72e0 intel/genxml: Use enum SURFACE_FORMAT where applicable
Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-29 22:02:49 -08:00
Kristian H. Kristensen
0799022bf9 intel/genxml: Use enum 3D_Prim_Topo_Type where applicable
Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-29 22:02:49 -08:00
Kristian H. Kristensen
993babc014 intel/genxml: Use 3D_Compare_Function for gen8+ test functions
When the state fields where shuffled around for gen8, the compare
function enums were downgraded to just uints. Change them to enum
3D_Compare_Function.

Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-29 22:02:49 -08:00
Kristian H. Kristensen
fc2225b1af intel/genxml: Emit genxml enums as C enums
The previous commits got rid of any clashes between #defines and enum
values and we can now emit the genxml enums as debugger friendly C
enums.

Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-29 22:02:49 -08:00
Kristian H. Kristensen
8fc74b879e intel/genxml: Remove duplicate COMPAREFUNCTION values
These values were defined both as an enum and as inline values. Remove
the inline values and reference the 3D_Compare_Function enum instead.

Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-29 22:02:49 -08:00
Kristian H. Kristensen
5814fc1bb7 intel/genxml: Allow referencing enums in type attributes
This lets us reference enums in the type attribute of a field.

Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-29 22:02:49 -08:00
Kristian H. Kristensen
3b6b6f6463 anv: Emit cherryview SF state without including gen9_pack.h
Cleaner this way and we avoid including gen9_pack.h when we compile with
gen8_pack.h. We also avoid the if (cherryview) condition for non-gen8
gens that don't need it.

Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-29 22:02:49 -08:00
Kristian H. Kristensen
908febcf21 anv: Don't include two different pack headers
The batch chain logic only needs the pre-gen8 size of
MI_BATCH_BUFFER_START, which seems like something we can make a special
case for. The other two gen7 references, MI_BATCH_BUFFER_END and
MI_NOOP, are the same on all gens.

Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-29 22:02:49 -08:00
Kristian H. Kristensen
be9c2ab23b intel/genxml: Move enums above structs
We'll need to define them before we can reference them in structs and
instructions. Enums have no dependencies, so move them first in the
file.

Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-29 22:02:49 -08:00
Kristian H. Kristensen
ce26486115 genxml: Add values for Barycentric Interpolation Mode
Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-29 22:02:49 -08:00
Ilia Mirkin
ed0b3cbd09 anv: remove per-sample shading from TODO
This was done some time ago.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-11-30 00:17:56 -05:00
Ilia Mirkin
be92b3f49d anv: clean up VkPhysicalDeviceFeatures list
Remove duplicate .alphaToOne, add missing .shaderResourceMinLod, and
reorder a few entries to match their vulkan.h order. All the sparse
features are still left out entirely.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-11-30 00:17:56 -05:00
Michel Dänzer
550cd272b4 vulkan/wsi/x11: Destroy Present event context when destroying swapchain
Without this, the X server may accumulate stale Present event contexts
if a client creates and destroys multiple swapchains using the same
window.

v2: Based on Chris Wilson's review:
* Use xcb_present_select_input_checked so that protocol errors
  generated by old X servers can be handled gracefully
* Use xcb_discard_reply() instead of free(xcb_request_check())
v3: Rebased on top of this code having been refactored out of anv

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-11-30 12:31:25 +09:00
Timothy Arceri
2ea021a1eb glsl: use linked_shaders bitmask to iterate stages for subroutine fields
This should be faster than looping over every stage and null checking, but
will also make the code a bit cleaner when we switch to getting more fields
from gl_program rather than from gl_linked_shader as we can just copy the
pointer and not need to worry about null checking then copying.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-11-30 14:13:52 +11:00
Timothy Arceri
6d3458cbfb mesa: optimise interleaved sso validation
Now that we have a linked_stages bitfield we can use this
to check if the program is used at a later stage.

This change is also required to be able to use gl_program
rather than gl_shader_program in the CurrentProgram array.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-11-30 14:13:52 +11:00
Timothy Arceri
34953f8907 mesa/glsl: add bitmask to track stages a program was linked against
This will be used to enable us to store the current gl_program
rather than gl_shader_program in the gl_pipline_object allowing
us to simplify handing of validation.

Also we should not be depending on _LinkedShader for this information
as it may contain shaders from a failed linking attempt rather than
the current program still in use.

We could also use this mask to iterate over the stages during linking
with _mesa_bit_scan() rather then the current method of NULL checking
each stage.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-11-30 14:13:52 +11:00
Ilia Mirkin
ddf0f097e7 swr: [rasterizer jit] use signed integer representation for logic op
Instead of (incorrectly) biasing the snorm value to make it look like a
unorm, just use signed integer math.

This fixes arb_color_buffer_float-render GL_RGBA8_SNORM

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-29 20:55:00 -05:00
Ilia Mirkin
8ed703cfa6 swr: add missing rgbx8_srgb variant
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-29 20:54:57 -05:00
Ilia Mirkin
d6a06228a6 swr: reorder renderable formats, add grouping comments
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-29 20:54:54 -05:00
Ilia Mirkin
53ca06be8f swr: use util_copy_framebuffer_state helper
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-29 20:54:50 -05:00
Ilia Mirkin
86f7932b1e swr: enable cubemap arrays
Everything is in place for these.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-29 20:54:46 -05:00
Ilia Mirkin
8dd9853516 swr: rearrange caps into limits/supported/unsupported groups
I find this a lot more readable and compact - much easier to scan
through the list and see what's on and what's off.

No functional change intended.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-29 20:54:43 -05:00
Ilia Mirkin
9f568e5db1 swr: only store up to the LOD size
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-29 20:54:36 -05:00
Tim Rowley
f7ab0e4b7e swr: [rasterizer common] add SwrTrace() and macros
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-29 19:36:46 -06:00
Marek Olšák
662b9c24d0 radeonsi: don't fetch 8 dwords for samplerBuffer and imageBuffer
The compiler doesn't shrink s_load_dwordx8, so we always wasted 4 SGPRs.
Also, the extraction of the descriptor created some really ugly asm code
with lots of VALU bitwise ops and v_readfirstlane.

Totals from *affected* shaders:
SGPRS: 13880 -> 13253 (-4.52 %)
VGPRS: 15200 -> 15088 (-0.74 %)
Code Size: 499864 -> 459816 (-8.01 %) bytes
Max Waves: 1554 -> 1564 (0.64 %)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-29 23:52:31 +01:00
Marek Olšák
dbbdc6bb5a radeonsi: disable XNACK to free 2 SGPRs on APUs
My LLVM commit disables it for dGPUs, but not APUs.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-29 23:52:31 +01:00
Marek Olšák
274fb601c2 radeonsi: count and report temp arrays in scratch separately
v2: only do this if debug output of shader dumping is enabled

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
2016-11-29 23:52:31 +01:00
Marek Olšák
a91add9369 radeonsi: don't try to eliminate trivial VS outputs for PS and CS
PS and CS don't have any param exports, so it's a no-op.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-29 23:52:31 +01:00
Marek Olšák
5e5573b1bf radeonsi: disable RB+ blend optimizations for dual source blending
This fixes dual source blending on Stoney. The fix was copied from Vulkan.
The problem was discovered during internal testing.

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-29 23:52:31 +01:00
Marek Olšák
ff50c44a5f radeonsi: set CB_BLEND1_CONTROL.ENABLE for dual source blending
copied from Vulkan

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-29 23:52:31 +01:00
Marek Olšák
87b208a54e radeonsi: always set all blend registers
better safe than sorry

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-29 23:52:31 +01:00
Marek Olšák
fc9f7fc9d0 radeonsi: set the smallest possible CB_TARGET_MASK
better safe than sorry; set_framebuffer_state always makes this dirty

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-29 23:52:31 +01:00
Marek Olšák
ea43d0b5e8 radeonsi: don't print bodies of header-only packets
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-29 23:52:31 +01:00
Marek Olšák
7abd94c9b0 radeonsi: print unknown registers with correct formatting
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-29 23:52:31 +01:00
Marek Olšák
9e1dc10432 ddebug: fix hang detection with deferred flushes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-29 23:52:31 +01:00
Dave Airlie
048143b9d9 radv: set spi_baryc_cntl.pos_float_location to 0
This fixes:
dEQP-VK.pipeline.multisample_interpolation.offset_interpolate_at_sample_position.*

This should probably be 2 when sample shading is enabled, but I'm
not sure.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-29 22:48:23 +00:00
Dave Airlie
f3a3fea973 radv: force persample shading when required.
We need to force persample shading when
a) shader uses sample_id
b) shader uses sample_position
c) shader uses sample qualifier.

Also since ps_iter_samples can now change independently of the
rasterizer samples we need to move setting the regs more often.

This fixes:
dEQP-VK.pipeline.multisample_interpolation.centroid_interpolate_at_consistency.*
dEQP-VK.pipeline.multisample_interpolation.centroid_qualifier_inside_primitive.137_191_1.*
dEQP-VK.pipeline.multisample_interpolation.sample_interpolate_at_distinct_values.*
dEQP-VK.pipeline.multisample_interpolation.sample_qualifier_distinct_values.128_128_1.*

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-29 22:48:03 +00:00
Dave Airlie
6a62026dd4 nir: print var binding in dumps.
This only useful for spir-v shaders, but I keep finding myself
having to add it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-29 22:07:13 +00:00
Eric Engestrom
fae5e1dc74 docs: fix small typo
Fixes: ba28f2136f ("docs: add note about r-b/other tags when resending")
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-29 22:02:57 +00:00
Matt Turner
218fec66cc i965/sched: Schedule trivial blocks.
In commit 45cd76e342 schedule_instructions(bblock_t *) began
setting bblock_t::cycle_count, but that function was not called on
trivial blocks.

Remove the code to skip trivial blocks so that cycle_count is set.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-11-29 11:53:36 -08:00
Matt Turner
cab0952d4b i965/sched: Make 'time' a local variable.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-11-29 11:53:36 -08:00
Matt Turner
b0156702fa i965/cfg: Initialize bblock_t::cycle_count.
schedule_instructions(bblock_t *) isn't called on blocks with a single
instruction, and since it is the only thing that set cycle_count,
cycle_count would be uninitialized.

A non-empty block with bblock_t::cycle_count == 0 is arguably a bug.
That'll be fixed in the next commit.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-11-29 11:53:36 -08:00
Matt Turner
ca9e30e002 i965/cfg: Initialize cfg_t::cycle_count.
This reverts commit b4001af174.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-11-29 11:53:36 -08:00
Bas Nieuwenhuizen
b8c9ce4459 ac/nir: Fix accessing an unitialized value.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-11-29 20:13:28 +01:00
Bas Nieuwenhuizen
029e8ff81c radv: Initialize the shader_stats_dump flag.
Meta was using it before it was set. I suspect we typically don't
want to dump meta shaders, so just set it to false in the beginning.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-11-29 20:13:28 +01:00
Eric Anholt
d40a3212ae vc4: Add a note for the future about texture latency calculation.
Debugging a shader-db reported cycle count regression from the tex
coalescing, I eventually figured out that the texture latencies were
totally bogus.  Really fixing it will probably involve mirroring
vc4_qir_schedule.c's texture fifo management here.
2016-11-29 09:01:23 -08:00
Eric Anholt
4690a93b12 vc4: Add support for coalescing ALU ops into tex_[srtb] MOVs.
This isn't as complete as I would like (can't merge interpolation because
of the implicit r5 dependency, doesn't work with control flow), but this
was cheap and easy.

Improves 3DMMES Taiji performance by 1.15353% +/- 0.299896% (n=29, 16)

total instructions in shared programs: 99810 -> 99059 (-0.75%)
instructions in affected programs:     10705 -> 9954 (-7.02%)
2016-11-29 08:52:50 -08:00
Eric Anholt
f4baf80993 vc4: Restructure VPM write optimization into two passes.
For texturing, there won't be a fixed limit on how many writes there are,
so we need to compute uses up front.
2016-11-29 08:38:59 -08:00
Eric Anholt
a025983dd9 vc4: Make qir_for_each_inst_inorder() safe against removal.
The dead code elimination wants it to be safe, and I actually got
segfaults due to it being unsafe with the new coalescing pass.
2016-11-29 08:38:59 -08:00
Eric Anholt
27544ea8d3 vc4: Split optimizing VPM writes from VPM reads.
The VPM write logic will be basically the same as the texture coordinate
write logic we need, and it's not really related to the VPM read logic
other than the reuse of the use_count array.
2016-11-29 08:38:59 -08:00
Eric Anholt
d4c20e82ae vc4: Restructure texture insts as ALU ops with tex_[strb] as the dst.
For now we're still just generating MOVs, but this will let us fold into
other ops in the future.  No difference on shader-db.
2016-11-29 08:38:59 -08:00
Eric Anholt
314f0c57e4 vc4: Refactor qir_get_op_nsrc(enum qop) to qir_get_nsrc(struct qinst *).
Every caller was dereffing the qinst, and this will let us make the number
of sources vary depending on the destination of the qinst so that we can
have general ALU ops that store to tex_[strb] and get an implicit uniform.
2016-11-29 08:38:59 -08:00
Eric Anholt
51087327f2 vc4: Replace the qinst src[] with a fixed-size array.
This may have made a tiny bit of sense when we had one 4-arg inst per
shader, but if we only ever put 2 things in, having a pointer to 2 things
almost every instruction is pointless indirection.
2016-11-29 08:38:59 -08:00
Eric Anholt
a220f1b5a9 vc4: Remove qir_inst4().
This was used originally for unorm4x8 packs, but we now represent those as
a series of packed movs.
2016-11-29 08:38:59 -08:00
Ilia Mirkin
7a8def8c18 anv: bump the texture gather offset limits
This matches what NVIDIA and AMD hardware expose, as well as what Intel
hardware supports.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-29 07:44:01 -08:00
Ilia Mirkin
62b8dbf35e i965/gen7: expose larger gather offsets
This matches the capabilities of the hardware.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-29 07:44:01 -08:00
Ilia Mirkin
4f2d1d6ea7 i965: support constant gather offsets larger than 4 bits
Offsets that don't fit into 4 bits need to force gather_po to be
selected. Adjust the logic so that this happens.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-29 07:44:01 -08:00
Jason Ekstrand
faf20df143 i965/fs: Refactor handling of constant tg4 offsets
Previously, we had an OFFSET_VALUE source for logical texture instructions
that was intended to mean exactly what it says, "offset".  In reality, we
only fully used it for tg4 offsets.  We used offset_value.file == IMM to
mean, "you have a constant offset, go look in instr->offset" and didn't
actually use the contents of the register at all in that case except for
in nir_emit_texture where we used it as a temporary before we copy it into
instr->offset.

This commit renames OFFSET_VALUE to TG4_OFFSET and restricts its usage to
indirect tg4 offsets only.  The nir_emit_texture code is refactored so that
we explicitly build a header_bits value which is placed in instr->offset
and the constant offset values (both for tg4 and regular texture
operations) are used to construct header_bits and don't go through the
offset source at all.  Finally, we stop passing offset_value in to
lower_sampler_logical_send_gen5 because we can't do indirect offsets until
gen7 anyway.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-29 07:44:01 -08:00
Bas Nieuwenhuizen
05533ce418 radv: Use different intrinsic for ubo loads.
Not sure about the deprecation path, but this intrinsic
can be lowered to SMEM loads. This results in a significant
Talos performance improvement.

v2: Fix for LLVM attribute changes.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-11-29 08:36:16 +01:00
Timothy Arceri
0303201dfb mesa: fix active subroutine uniforms properly
07fe2d565b introduced a big hack in order to return
NumSubroutineUniforms when querying ACTIVE_RESOURCES for
<shader>_SUBROUTINE_UNIFORM interfaces. However this is the
wrong fix we are meant to be returning the number of active
resources i.e. the count of subroutine uniforms in the
resource list which is what the code was previously doing,
anything else will cause trouble when trying to retrieve
the resource properties based on the ACTIVE_RESOURCES count.

The real problem is that NumSubroutineUniforms was counting
array elements as separate uniforms but the innermost array
is always considered a single uniform so we fix that count
instead which was counted incorrectly in 7fa0250f9.

Idealy we could probably completely remove
NumSubroutineUniforms and just compute its value when needed
from the resource list but this works for now.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: 13.0 <mesa-stable@lists.freedesktop.org>
2016-11-29 15:29:51 +11:00
Jason Ekstrand
f469235a6e anv/cmd_buffer: Remove the 1-D case from the HiZ QPitch calculation
The 1-D special case doesn't actually apply to depth or HiZ.  I discovered
this while converting BLORP over to genxml and ISL.  The reason is that the
1-D special case only applies to the new Sky Lake 1-D layout which is only
used for LINEAR 1-D images.  For tiled 1-D images, such as depth buffers,
the old gen4 2-D layout is used and the QPitch should be in rows.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-28 20:17:29 -08:00
Jason Ekstrand
d4ef87c1bb anv/cmd_buffer: Set the correct surface type for depth/stencil
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-11-28 20:17:16 -08:00
Ilia Mirkin
e6847f24f0 anv: enable drawIndirectFirstInstance
This was already piped through in the CmdDraw(Indexed)Indirect handling.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-28 19:32:14 -08:00
Ilia Mirkin
d2280a007a anv: expose depthBiasClamp, it is already set
The gen7/8_cmd_buffer logic already sets the clamp, and it's piped
through via the dynamic state.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-28 19:32:14 -08:00
Ilia Mirkin
e2c669a56b anv: bump maxFramebufferLayers to 2048
This matches maxImageArrayLayers, as well as the same setting in the GL
frontend.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-28 19:32:14 -08:00
Ilia Mirkin
76b97d544e anv: enable storage image extended formats
These are all regularly available in desktop GL, so the backend fully
supports them.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-28 19:32:14 -08:00
Ilia Mirkin
a34f89c5e6 anv: expose imageCubeArray functionality
This appears to be fully supported already.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-28 19:32:13 -08:00
Dave Airlie
eaf0768b8f radv: set maxFragmentDualSrcAttachments to 1
Reported-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-29 13:27:26 +10:00
Dave Airlie
f9ab60202d anv: set maxFragmentDualSrcAttachments to 1
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reported-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-29 13:26:53 +10:00
Ilia Mirkin
e0fc18a435 swr: [rasterizer memory] only clear up to the LOD size
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-28 20:14:48 -05:00
Ilia Mirkin
2fca08e550 swr: [rasterizer memory] hook up stencil clears for ClearTile
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-28 20:14:48 -05:00
Ilia Mirkin
5582610ea1 swr: [rasterizer memory] add support for clearing Z32F_X32 and Z16
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-28 20:14:48 -05:00
Jason Ekstrand
6bc8bef1a1 intel/aubinator: Pull useful information from the AUB header
This commit does two things.  One is to pull useful and/or interesting
information from the AUB file header and display it as a header above your
decoded batches.  Second, it is now capable of pulling the PCI ID from the
AUB file comment left by intel_aubdump.  This removes the need to use the
--gen flag all the time.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-28 16:45:09 -08:00
Jason Ekstrand
da5ebeffdf intel/aubinator: Wait to setup decoders until we parse the aub header
This requires that a few more state bits become global.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-28 16:45:09 -08:00
Jason Ekstrand
e6c01fb17d intel/aubinator: Rework handling of the --gen flag
This makes it just store the pci_id instead of a struct pointer

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-28 16:45:09 -08:00
Jason Ekstrand
12f2eae7e7 intel/aubinator: Trust the packet size in the header for SUBOPCODE_HEADER
We were reading from the "comment size" dword and incrementing by that
amount.  This never caused a problem because that field was always zero.
However, experimenting with actual aub file comments indicates, the
simulator seems to include the comment size in the packet size provided in
the header.  We should do the same.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-28 16:45:09 -08:00
Jason Ekstrand
89bb515e91 intel/aubinator: Add a get_offset helper
The helper automatically handles masking for us so we don't have to worry
about whether or not something is in the bottom bits.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2016-11-28 16:45:09 -08:00
Jason Ekstrand
318cf3ffa4 intel/aubinator: Fix the kernel start pointer for 3DSTATE_HS
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2016-11-28 16:45:09 -08:00
Jason Ekstrand
294daaa36f intel/aubinator: Add a get_address helper
This new helper is automatically handles 32 vs. 48-bit GTT issues.  It also
handles 48-bit canonical addresses on Broadwell and above.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2016-11-28 16:45:09 -08:00
Jason Ekstrand
d6cef32047 intel/aubinator: Properly handle batch buffer chaining
The original aubinator that Kristian wrote had a bug in the handling of
MI_BATCH_BUFFER_START that propagated into the version in upstream mesa.
In particular, it ignored the "2nd level" bit which tells you whether this
MI_BATCH_BUFFER_START is a subroutine call (2nd level) or a goto.  Since
the Vulkan driver uses batch chaining, this can lead to a very confusing
interpretation of the batches.  In some cases, depending on how things are
laid out in the virtual GTT, you can even end up with infinite loops in
batch processing.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2016-11-28 16:45:08 -08:00
Ilia Mirkin
0a5e1b02cf swr: don't clear all dirty bits when changing so targets
Among other things, blits would clear existing SO targets which would
cause a bunch of updates from u_blitter to be missed.

Fixes fbo-scissor-blit fbo, probably among many others.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-28 19:41:23 -05:00
Ilia Mirkin
8a70a4d984 swr: [rasterizer core] fix typo in scissor tile-alignment logic
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-28 19:41:13 -05:00
Kenneth Graunke
15d3fc167a anv: Fix cache UUID generation.
I asked Emil to switch from 0 (success) vs. -1 (fail) to use a boolean
in my review comments.  The "not" went missing.  Easy mistake, but the
result is that nothing runs at all :)

Fix whitespace while we're here too.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-28 13:40:04 -08:00
Gwan-gyeong Mun
65ea559465 vulkan/wsi: Fix resource leak in success path of wsi_queue_init()
It fixes leakage of pthread_condattr resource on wsi_queue_init()

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-11-28 21:11:25 +00:00
Gwan-gyeong Mun
b178652b41 anv: Update the teardown in reverse order of the anv_CreateDevice
This updates releasing of resource in reverse order of the anv_CreateDevice
to anv_DestroyDevice.
And it fixes resource leak in pthread_mutex, pthread_cond, anv_gem_context.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-28 21:11:25 +00:00
Gwan-gyeong Mun
ca4706960c anv: drop the return type for anv_queue_init()
anv_queue_init() always returns VK_SUCCESS, so caller does not need
to check return value of anv_queue_init().

Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-28 21:11:25 +00:00
Gwan-gyeong Mun
ecc618b0d8 anv: Add missing error-checking to anv_block_pool_init (v2)
When the memfd_create() and u_vector_init() fail on anv_block_pool_init(),
this patch makes to return VK_ERROR_INITIALIZATION_FAILED.
All of initialization success on anv_block_pool_init(), it makes to return
VK_SUCCESS.

CID 1394319

v2: Fixes from Emil's review:
  a) Add the return type for propagating the return value to caller.
  b) Changed anv_block_pool_init() to return VK_ERROR_INITIALIZATION_FAILED
     on failure of initialization.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-28 21:11:25 +00:00
Chandu Babu Namburu
02bf1bbe6e st/omx/dec/h264: consider POC as signed instead of unsigned
picture order count can be a negative value

Reviewed-by: Christian König <christian.koenig@amd.com>
2016-11-28 15:31:51 -05:00
Emil Velikov
7c277eae98 radv: don't return VK_SUCCESS if radv_device_get_cache_uuid() fails
If radv_device_get_cache_uuid() fails result will be VK_SUCCESS as set
by the radv_init_wsi() call above.

Fixes: d943839 (radv: Use library mtime for cache UUID.)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-11-28 19:51:31 +00:00
Emil Velikov
78707a15f2 radv: don't leak the fd if radv_physical_device_init() succeeds
radv_amdgpu_winsys_create() does not take ownership of the fd, thus we
end up leaking it as we return with VK_SUCCESS.

Cc: Dave Airlie <airlied@redhat.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-11-28 19:51:22 +00:00
Emil Velikov
a1cf494f77 anv: don't leak memory if anv_init_wsi() fails
brw_compiler_create() rzalloc-ates memory which we forgot to free.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-28 19:47:34 +00:00
Emil Velikov
3af8171547 anv: don't double-close the same fd
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-28 19:47:28 +00:00
Emil Velikov
2f1a1f589e configure.ac: remove no longer used TIMESTAMP_CMD
Good bye, you shall not be missed.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-28 19:47:24 +00:00
Emil Velikov
2d42a34566 anv: automake: don't generate anv_timestamp.h
No longer used as of last commit.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-28 19:47:17 +00:00
Emil Velikov
83548e1292 anv: Use library mtime for cache UUID.
Inspired by a similar commit for radv.

Rather than recomputing the timestamp on each make invocation, just
fetch it at runtime.

Thus we no longer get the constant rebuild of anv_device.c and the
follow-up libvulkan_intel.so link, when nothing has changed.

I.e. using make && make install is a little bit faster.

v2: Use bool return type (Ken).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-28 19:46:45 +00:00
Emil Velikov
de138e9ced anv: Store UUID in physical device.
Port of an equivalent commit for radv.

v2: Move the call just after MMAP_VERSION (Ken).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-28 19:46:05 +00:00
Emil Velikov
3f9397753b isl: Make isl_finishme only warn once per call-site
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-28 19:12:49 +00:00
Emil Velikov
f3a1c17b96 radv: Make radv_finishme only warn once per call-site
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-11-28 19:12:48 +00:00
Emil Velikov
7feac8bdb9 anv: use do { } while (0) in the anv_finishme macro
Use the generic construct instead of the currect GCC specific one.

Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-28 19:12:38 +00:00
Emil Velikov
6dae5be806 docs: add git tips how to do commit fixups and squash them
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-28 17:49:04 +00:00
George Kyriazis
ba28f2136f docs: add note about r-b/other tags when resending
[Emil Velikov: split from the typos fixes]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-28 17:48:50 +00:00
Andres Gomez
28158c3e54 docs: fix small typos in the submit patches page
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-28 17:46:12 +00:00
Emil Velikov
028d29b8b3 docs/releasing: use correct page title
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-28 17:46:12 +00:00
Emil Velikov
f9959ca92e docs/releasing: correctly document touch-testing
I've used an ancient version of the script which did not cover:
 - version expansion (cd mesa-* does not work)
 - --enable-glx-tls
 - EGL and es2* testing
 - Vulkan and DOTA2

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-28 17:46:12 +00:00
Emil Velikov
a7a416f347 docs/release: drop references to patchwork
The changes to release.sh have landed, so all we need is a recent
checkout of xorg-utils.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-28 17:45:11 +00:00
Emil Velikov
5ce7a32068 docs: add news item and link release notes for 13.0.2
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-28 15:30:13 +00:00
Emil Velikov
ad7879bbb4 docs: add sha256 checksums for 13.0.2
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 2722144bed)
2016-11-28 15:29:06 +00:00
Emil Velikov
bc5c299b4f docs: add release notes for 13.0.2
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit c9e993ba13)
2016-11-28 15:29:05 +00:00
Dave Airlie
09c0c17bc3 radv: fix 3D clears with baseMiplevel
This fixes:
dEQP-VK.api.image_clearing.clear_color_image.3d*

These were hitting an assert as the code wasn't taking the
baseMipLevel into account when minify the image depth.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-28 07:10:12 +00:00
Dave Airlie
020978af12 radv: brown-paper bag for a forgotten else.
This fixes the fix:
radv/ac/llvm: fix regression with shadow samplers fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-28 16:23:10 +10:00
Dave Airlie
b2e217369e radv/ac/llvm: fix regression with shadow samplers fix
This fixes b56b54cbf1:
radv/ac/llvm: shadow samplers only return one value

It makes sure we only do that for shadow sampling, as
opposed to sizing requests.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-28 15:43:59 +10:00
Dave Airlie
b56b54cbf1 radv/ac/llvm: shadow samplers only return one value.
The intrinsic engine asserts in llvm due to this.

Reported-by: Christoph Haag <haagch+mesadev@frickel.club>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-27 23:05:01 +00:00
Dave Airlie
9838db8f64 radv/si: fix optimal micro tile selection
The same fix was posted for radeonsi, so port it here.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-27 23:03:20 +00:00
Emil Velikov
a025c5b2c7 radv: honour the number of properties available
Cap up-to the number of properties available while copying the data.
Otherwise we might crash and/or leak data.

Cc: Dave Airlie <airlied@redhat.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-27 23:03:01 +00:00
Mun Gwan-gyeong
0a27dd458b radv: drop the return type for radv_queue_init()
radv_queue_init() always returns VK_SUCCESS, so caller does not need
to check return value of radv_queue_init().

Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-27 23:00:57 +00:00
Rob Clark
8cb965b112 freedreno: fix slice size for imported buffers
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-27 17:26:05 -05:00
Rob Clark
f4ffe2786b freedreno/a3xx: make _emit_const() static
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-27 17:26:05 -05:00
Rob Clark
b8b800d18a freedreno/a4xx: make _emit_const() static
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-27 17:26:05 -05:00
Jason Ekstrand
af98c6c31d anv/pipeline: Make is_dual_src_blend_factor inline
It's not used on gen8+ so it causes unused function warnings.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-11-26 11:58:59 -08:00
Jason Ekstrand
e41f7c3063 anv/pipeline: Make the temp blend attachment state pointer const
This fixes a "discards const" warning since blend is const.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-11-26 11:55:09 -08:00
Samuel Pitoiset
8fdb800bda gm107/ir: optimize 32-bit CONST load to mov
This is not allowed for indirect accesses because the source
GPR might be erased by a subsequent instruction (WaR hazard)
if we don't emit a read dep bar.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-11-26 19:05:11 +01:00
Samuel Pitoiset
948cce0196 gm107/ir: do not combine CONST loads
This will allow to use MOV instead of LD. The main advantage is
that MOV doesn't require a read dependency barrier while LD does,
and so this will both reduce barriers pressure and the number of
stall counts needed to read data from constant memory.

This is currently only for user uniform accesses. I should do
something similar when loading from the driver constant buffer
but it seems like a bit tricky to handle for now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-11-26 19:05:08 +01:00
Jason Ekstrand
fa6bbb5c00 anv/device: Remove a bogus finishme comment
We've been properly detecting bit6 swizzling for a long time now.
2016-11-25 21:46:11 -08:00
Ben Widawsky
2a7db18890 i965: Enable fast clears for multi-lod
On SKL (also fast clear is used for level 0, layer 0):

Manhattan 3.0:          3.88434% +/- 0.814659%
Manhattan 3.0 off:      3.25542% +/- 0.101149%
Trex:                   3.43501% +/- 0.31223%
Trex off:               4.13781% +/- 0.0993569%

ON BDW:

Manhattan 3.0:          1.37079% +/- 0.571208%
Manhattan 3.0 off:      1.74029% +/- 0.267499%

v2 (Ben, Matt): Fix rebase error by removing the perf warning
v3 (Topi): Rebased on top of revised eligibility logic

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-25 16:57:07 +02:00
Topi Pohjolainen
3aec6bce5b i965: Allow single-sampled miptree to be resolved and shared
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-25 16:57:07 +02:00
Topi Pohjolainen
17d7c5a037 i965/gen8: Relax asserts prohibiting arrayed/mipmapped fast clears
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-25 16:57:07 +02:00
Topi Pohjolainen
544ed74315 i965: Use ISL for CCS layouts
One can now also delete intel_get_non_msrt_mcs_alignment().

v2 (Jason): Do not leak aux buf but allocate only after getting
            ISL surfaces.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-25 16:57:07 +02:00
Topi Pohjolainen
96dbe765e1 i965: Resolve non-compressed fast clears prior layered rendering
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-25 16:57:07 +02:00
Topi Pohjolainen
dea8e7fb07 i965: Restrict fast color clear on first slice only
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-25 16:57:07 +02:00
Topi Pohjolainen
d41fc8dc9f i965: Track fast color clear state in level/layer granularity
Note that RESOLVED is not tracked in the map explicitly. Absence
of item implicitly means RESOLVED state.

v2: Added intel_resolve_map_clear() into intel_miptree_release()
v3 (Jason): Properly handle the assumption of resolve map not
            containing any items with state RESOLVED. Removed
            unnecessary intel_miptree_set_fast_clear_state() call
            in brw_blorp_resolve_color() preventing
            intel_miptree_set_fast_clear_state() from asserting
            against RESOLVED.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-25 16:57:07 +02:00
Topi Pohjolainen
28dc3f6199 i965: Move fast clear state enumeration into resolve map
Status is still tracked per miptree. Next patch will switch to
resolve map per slice/level.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-25 16:57:07 +02:00
Topi Pohjolainen
6859d2ba2e i965: Refactor check if color resolve is needed
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-25 16:57:07 +02:00
Topi Pohjolainen
ea2c419600 i965: Add plumbing for fast clear layer/level details
Until now fast clear has been supported only for non-layered and
non-mipmapped buffers. However, from gen8 onwards there is hardware
support also for layered/mipmapped. Once this is enabled, fast clear
operations target specific layer/level and call for the state to be
tracked in the same granularity. This is the first step providing
the details from callers to the state tracking.

Patch introduces new interface for reading and writing the state
hiding the upcoming bookkeeping changes in the call sites. There is
bunch of sanity checks added that will be relaxed per hardware
generation later on when the actual functionality is enabled.

v2: Rebased on top current master setting the state in
    blorp_surf_for_miptree().
v3: Replace open-coded resolved check in surface state emission
    with intel_miptree_has_color_unresolved().

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-25 16:57:07 +02:00
Topi Pohjolainen
d07cf68a97 i965: Add interface for checking multiple slices if any is unresolved
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-25 16:57:07 +02:00
Topi Pohjolainen
17e6a214fd i965: Provide slice details to renderbuffer fast clear state tracker
This patch also introduces getter and setter for fast clear state
preparing for tracking the state per slice.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-25 16:57:06 +02:00
Topi Pohjolainen
cec30a6669 i965: Split per miptree and per slice/level fast clear bits
Currently the status bits for fast clear include the flag telling
if non-multisampled mcs buffer should be used at all. Once the
state tracking is changed to follow individual levels/layers one
still needs to have the mcs enabling information in the miptree.
Therefore simply split it out to its own boolean.

Possible follow-up work is to combine disable_aux_buffers and
no_ccs into single enum.

v2 (Jason): Changed no_msrt_mcs to no_ccs and updated comment

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-25 16:57:06 +02:00
Topi Pohjolainen
9c7717c066 i965: Provide slice details to color resolver
v2: Make intel_miptree_resolve_color() take start layer and
    layer count.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-25 16:57:06 +02:00
Topi Pohjolainen
12010b9226 i965: Add new interface for full color resolves
Upcoming patches will introduce fast clear in level/layer
granularity like the driver does already for depth/hiz. This patch
introduces equivalent full resolve option.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-25 16:57:06 +02:00
Topi Pohjolainen
71d48d6f42 i965: Refactor lossless compression state tracking
Essentially this moves fast clear state update away from surface
state setup into brw_postdraw_set_buffers_need_resolve() that gets
called just after draw submission.
Calling intel_miptree_used_for_rendering() can be drop for gen6
and earlier as it is no-op.

v2: Rebased on top current master setting the state in
    blorp_surf_for_miptree().

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-25 16:57:06 +02:00
Andres Gomez
b27be186cb Revert "glsl: allow layout qualifier overrides with ARB_shading_language_420pack"
This reverts commit aaa69c79cd.

The commit was erroneous because the ast_layout_expression class is
meant to hold a list used for an after check that all the declared
values for a layout-qualifier-name are consistent.

Therefore, the check for the possibility of duplicated values was
previously fixed to happen much sooner, in the GLSL parser and the
merge of layout qualifiers, and the process_qualifier_constant method
only needs to check that the values are consistent.

By now, those layout-qualifier-name represented as a
ast_layout_expression are "max_vertices", "invocations", "vertices",
"local_size_[x|y|z]" and "xfb_stride".

From page 40 (page 46 of the PDF) of the GLSL 1.50 spec:

  " All geometry shader output layout declarations in a program must
    declare the same layout and same value for max_vertices."

From page 44 (page 50 of the PDF) of the GLSL 4.00 spec:

  " If an invocation count is declared, all such declarations must
    specify the same count."

From page 47 (page 53 of the PDF) of the GLSL 4.00 spec:

  " All tessellation control shader layout declarations in a program
    must specify the same output patch vertex count."

From page 60 (page 66 of the PDF) of the GLSL 4.30 spec:

  " Also, if such a layout qualifier is declared more than once in the
    same shader, all those declarations must set the same set of local
    work-group sizes and set them to the same values; otherwise a
    compile-time error results. If multiple compute shaders attached
    to a single program object declare local work-group size, the
    declarations must be identical; otherwise a link-time error
    results."

From page 73 (page 79 of the PDF) of the GLSL 4.40 spec:

  " While xfb_stride can be declared multiple times for the same
    buffer, it is a compile-time or link-time error to have different
    values specified for the stride for the same buffer."

Fixes GL44-CTS.enhanced_layouts.xfb_duplicated_stride

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
2016-11-25 13:18:31 +02:00
Andres Gomez
2a47c83d7e Revert "glsl: geom shader max_vertices layout must match."
This reverts commit 4c86399378.

The commit was erroneous because the ast_layout_expression class was
created to hold a list of values for a layout-qualifier-name which is
allowed to appear in more than one expression in the same
shader/program but not to hold different values.

In other words, the list is used for an after check that all the
declared values for a layout-qualifier-name are consistent.

Therefore, the values stored must match always, not just for
"max_vertices" or any other eventual layout-qualifier-name.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
2016-11-25 13:18:31 +02:00
Andres Gomez
e5041c6409 glsl: push layout-qualifier-name values from variable declarations to global
After the previous modifications in the merging of the
layout-qualifier-name values, we no longer push the final value in a
declaration to the global values.

This regression happens because we don't call for merging on the
right-most layout qualifier of a declaration which is also the
overriding one in case of multiple appearances.

Now, we add a new method to push these values to the global ones and
we call for this just after all the layout-qualifier collapsing has
happened in a declaration.

This simplifies how this was working in two ways; we make a clear
differentiation of when we are pushing this to the global values since
before it was mixed in the merging call and we only run this once all
the processing for layout-qualifiers in a declaration has happened.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
2016-11-25 13:18:30 +02:00
Andres Gomez
5132d0c7b6 glsl: simplified error checking for duplicated layout-qualifiers
The GLSL parser has been simplified to check for the needed
GL_ARB_shading_language_420pack extension just when merging the
qualifiers in the proper cases.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
2016-11-25 13:18:30 +02:00
Andres Gomez
93f90d7795 glsl: simplified ast_type_qualifier::merge_into_[in|out]_qualifier API
Since we modified the way in which multiple repetitions of the same
layout-qualifier-name in a single declaration collapse into the
ast_type_qualifier class, we can simplify the
merge_into_[in|out]_qualifier APIs through removing the create_node
parameter.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
2016-11-25 13:18:30 +02:00
Andres Gomez
be54a58da3 glsl: ignore all but the rightmost layout qualifier name from the rightmost layout qualifier
From page 46 (page 52 of the PDF) of the GLSL 4.20 spec:

  " More than one layout qualifier may appear in a single
    declaration. If the same layout-qualifier-name occurs in multiple
    layout qualifiers for the same declaration, the last one overrides
    the former ones."

Consider this example:

  " #version 150
    #extension GL_ARB_shading_language_420pack: enable

    layout(max_vertices=2) layout(max_vertices=3) out;
    layout(max_vertices=3) out;"

Although different values for "max_vertices" results in a compilation
error. The above code is valid because max_vertices=2 is ignored.

Hence, when merging qualifiers in an ast_type_qualifier, we now ignore
new appearances of a same layout-qualifier-name if the new
"is_multiple_layouts_merge" parameter is on, since the GLSL parser
works in this case from right to left.

In addition, any special treatment for the buffer, uniform, in or out
layout defaults has been moved in the GLSL parser to the rule
triggered just after any previous processing/merging on the
layout-qualifiers has happened in a single declaration since it was
run too soon previously.

Fixes GL44-CTS.shading_language_420pack.qualifier_override_layout

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
2016-11-25 13:18:30 +02:00
Andres Gomez
b95793b9a7 glsl: refactor duplicated validations between 2 layout-qualifiers
Several layout-qualifier validations are duplicated in the
merge_qualifier and validate_in_qualifier methods.

We would rather have them refactored into single calls.

Suggested by Timothy.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
2016-11-25 13:18:30 +02:00
Andres Gomez
ae1ce8ecd3 glsl: assert on incoherent point mode layout-id-qualifier validation
The point mode value in an ast_type_qualifier can only be true if the
flag is already set since this layout-id-qualifier can only be or not
be present in a shader.

Hence, it is useless to check for its value if the flag is already
set. Just replaced with an assert.

V2: assert instead of checking for coherence and raising a compilation
    error. Suggested by Timothy.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
2016-11-25 13:18:30 +02:00
Andres Gomez
a5d6ae2f51 glsl: remove unneeded check for incompatible primitive types in GS
The validation of the default in layout qualifier already assures that
we won't have 2 ast_gs_input_layout objects with different primitive
type values. In fact, the validation already assures that we won't
have 2 ast_gs_input_layout objects in the AST tree at all.

The check for an error in the shader has been replaced by an assert.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
2016-11-25 13:18:30 +02:00
Andres Gomez
0ecfff0d08 glsl: simplifies the merge of the default in layout qualifier
The merge into the default in layout qualifier duplicates a lot of
code that can be reused from the generic merge method.

Now, we use the generic merge method inside the specific merge for the
default in layout qualifier. The generic merge method has been
completed with some bits that were only present in the merge for the
default in layout qualifier and the specific validation bits have been
moved to the validation method for the default in layout qualifier.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
2016-11-25 13:18:30 +02:00
Andres Gomez
65df02c002 glsl: split default in layout qualifier merge
Currently, the default in layout qualifier merge performs specific
validation and merge.

We want to split out the validation from the merge so they can be done
independently.

Additionally, for simplification, the direction of the validation and
merge is changed so the ast_type_qualifier calling the method is the
one validated and merged against the default in qualifier.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
2016-11-25 13:18:30 +02:00
Andres Gomez
fe5c522edd glsl: split default out layout qualifier merge
Currently, the default out layout qualifier merge performs specific
validation and merge.

We want to split out the validation from the merge so they can be done
independently.

Additionally, for simplification, the direction of the validation and
merge is changed so the ast_type_qualifier calling the method is the
one validated and merged against the default out qualifier.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
2016-11-25 13:18:30 +02:00
Andres Gomez
70456aca8d glsl: merge layouts into the default one as the last step in interface blocks
Consider this example:

    " #version 150 core
      #extension GL_ARB_shading_language_420pack: require
      #extension GL_ARB_explicit_attrib_location: require

      layout(location=0) out vec4 o;
      layout(binding=2) layout(binding=3, std140) uniform U {
          vec4 a;
      } u[2];"

As there is 2 layout-qualifiers for the uniform U and the binding
layout-qualifier-id is duplicated, the rules set by the
ARB_shading_language_420pack spec state that the rightmost should
prevail.

Our ast_type_qualifier merges with others in a way that if the value
for a layout-qualifier-id is set in both, the object being merged
overwrites the value of the object invoking the merge. Hence, the
merge has to happen from the left layout towards the right one and
this was not happening for interface blocks because we were merging
into the default layout qualifier.

Now, the merge is done from left to right and, as a last step, we
merge into the default layout qualifier if needed, so the values of
the explicit layouts prevail over it.

V2: added a default_layout variable instead of a layout_helper and
    make the merge directly over the layout one. Suggested by Timothy.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
2016-11-25 13:18:30 +02:00
Andres Gomez
9f13d0c64b glsl: ignore all but the rightmost layout-qualifier-name
When a layout contains a duplicated layout-qualifier-name in a single
declaration, only the last occurrence should be taken into account.

From page 59 (page 65 of the PDF) of the GLSL 4.40 spec:

  " More than one layout qualifier may appear in a single
    declaration. Additionally, the same layout-qualifier-name can
    occur multiple times within a layout qualifier or across multiple
    layout qualifiers in the same declaration. When the same
    layout-qualifier-name occurs multiple times, in a single
    declaration, the last occurrence overrides the former
    occurrence(s)."

Consider this example:

  " #version 150
    #extension GL_ARB_enhanced_layouts: enable

    layout(max_vertices=2, max_vertices=3) out;
    layout(max_vertices=3) out;"

Although different values for "max_vertices" results in a compilation
error. The above code is valid because max_vertices=2 is ignored.

When merging qualifiers in an ast_type_qualifier, we now simply ignore
new appearances of a same layout-qualifier-name if the
"is_single_layout_merge" parameter is true, this works because the GLSL
parser processes qualifiers from right to left.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
2016-11-25 13:18:30 +02:00
Iago Toral Quiroga
b3fca51617 anv/state: if enabled, use anisotropic filtering also with VK_FILTER_NEAREST
Fixes multiple Vulkan CTS tests that combine anisotropy and VK_FILTER_NEAREST
in dEQP-VK.texture.filtering_anisotropy.*

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-25 08:20:28 +01:00
Vedran Miletić
95ddb37708 clover: Restore support for LLVM <= 3.9.
The commit 8e430ff8b0 broke support for
LLVM 3.9 and older versions in Clover. This patch restores it and
refactors the support using Clover compatibility layer for LLVM.

v2: merged #ifdef blocks
v3: added support for LLVM 3.6-3.8
v4: add missing #ifdef around <memory>
v5: simplify using templates and lambda

Signed-off-by: Vedran Miletić <vedran@miletic.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98740
Tested-by[v4]: Pierre Moreau <pierre.morrow@free.fr>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
2016-11-24 16:40:29 -08:00
Vinson Lee
f07da5aa5e scons: Recognize LLVM_CONFIG environment variable.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-11-24 13:37:33 -08:00
Bas Nieuwenhuizen
a794f09017 radv: Don't generate radv_timestamp.h
Not needed anymore.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-24 19:25:03 +01:00
Dave Airlie
bb8ac18340 radv: fix texel fetch offset with 2d arrays.
The code didn't limit the offsets to the number supplied, so
if we expected 3 but only got 2 we were accessing undefined memory.

This fixes random failures in:
dEQP-VK.glsl.texture_functions.texelfetchoffset.sampler2darray_*

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-24 18:06:05 +10:00
Eduardo Lima Mitev
116fed80ff mesa/getteximage: Add validation of target to glGetTextureImage
There is an specific list of texture targets that can be used with
glGetTextureImage. From OpenGL 4.5 spec, section '8.11 Texture Queries',
page 234 of the PDF:

   "An INVALID_ENUM error is generated if the effective target is
    not one of TEXTURE_1D , TEXTURE_2D , TEXTURE_3D , TEXTURE_1D_-
    ARRAY , TEXTURE_2D_ARRAY , TEXTURE_CUBE_MAP_ARRAY , TEXTURE_-
    RECTANGLE , one of the targets from table 8.19 (for GetTexImage
    and GetnTexImage only), or TEXTURE_CUBE_MAP (for GetTextureImage
    only)."

We are currently not validating the target for glGetTextureImage. As
an example, calling this function on a texture with target
GL_TEXTURE_2D_MULTISAMPLE should return INVALID_ENUM, but instead it
hits an assertion down the road in the i965 driver.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-24 08:24:07 +01:00
Eduardo Lima Mitev
89cbe0d21f main/texobj: Check that texture id > 0 before looking it up in hash-table
_mesa_lookup_texture_err() is not currently checking that the
texture-id can be zero, but _mesa_HashLookup() doesn't expect the key
to be zero, and will fail an assertion.

Considering that _mesa_lookup_texture_err() is called from
_mesa_GetTextureImage and _mesa_GetTextureSubImage with user provided
arguments, we must validate the texture-id before looking it up in the
hash-table.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-24 08:23:45 +01:00
Charmaine Lee
3233a9fe0b util: fix memory leak from the fragment shaders for SINT<->UINT blits
This patch deletes those fragment shaders in util_blitter_destroy().

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-23 22:53:08 -08:00
Kenneth Graunke
ec1f159ac8 i965: Always reserve clip distance VUE slots in SSO mode.
This fixes rendering in Dolphin on Vulkan since we enabled clip
distances.  (Dolphin on GL has a similar bug because the linker
fails to eliminate unused clip distance built-in arrays, but it
isn't using SSO...so that needs more fixing.)

Also fixes a Piglit test:
spec/glsl-1.50/execution/geometry.clip-distance-vs-gs-out-sso

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-23 21:23:38 -08:00
Ilia Mirkin
8cdf73c324 anv/gen7: only enable dual-source blending when there are dual-source factors
Apparently the hw wedges otherwise, as mentioned in i965 comments.

Reported-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-23 19:40:00 -08:00
Ilia Mirkin
a783b67e17 swr: clear every layer of the attached surfaces
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-23 20:34:02 -05:00
Ilia Mirkin
1a80ec0cd1 swr: [rasterizer core] pipe renderTargetArrayIndex through to clears
Currently clears only operate on the 0th array index (ignoring surface
layout parameters). Instead normalize to take a RTAI like all the
load/store tile logic does, and use ComputeSurfaceAddress to properly
take the surface state's lod/array index into account.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-23 20:33:50 -05:00
Ilia Mirkin
cec515999c swr: [rasterizer core] clear data now comes in as float
The non-fast-clear path was never updated after clear colors were passed
in as floats. Remove the now-harmful conversion from unorm8.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-23 20:33:36 -05:00
Ilia Mirkin
74943db82c swr: [rasterizer core] actually perform clear before store in GetHotTile
When switching render target array indexes (as might happen in a GS, or
in a future change, with layered clears), if the previous state is
HOTTILE_CLEAR, we should actually clear the tile before saving it off.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-23 20:33:32 -05:00
Kenneth Graunke
5da84a7e12 i965: Fix a mistake from porting the URB allocation code to arrays.
Commit 6d416bcd84 (i965: Use arrays in
Gen7+ URB code.) introduced a regression which caused us to fail to
allocate all of our URB space.

   -         total_wants -= ds_wants;
   +         total_wants -= additional;

The new line should have been total_wants -= wants[i].

Fixes a large performance regression in TessMark.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98815
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-11-23 16:57:29 -08:00
Kenneth Graunke
903056e016 i965: Use 3DSTATE_CLIP's User Clip Distance Enable bitmask on Gen8+.
Gen6-7.5 specify the user clip distance enable bitmask in 3DSTATE_CLIP.
Gen8+ normally uses the new internal signalling mechanism to select the
one specified in the last enabled shader stage (3DSTATE_VS, DS, or GS).

This is a pretty good fit for Vulkan, or even newer GL, where the
bitmask comes entirely from the shader.  But with glClipPlane(),
this is dynamic state, and we have to listen to _NEW_TRASNFORM.

Clip plane enables are the only reason the VS/DS/GS atoms need to
listen to _NEW_TRANSFORM.  3DSTATE_CLIP already has to listen to it
in order to support ARB_clip_control settings.

Setting the "Use the 3DSTATE_CLIP bitmask" force enable bit allows
us to drop _NEW_TRANSFORM from all the shader stage atoms, so we can
re-emit them less often.

Improves performance of OglBatch7 (version 6) by 2.70773% +/- 0.491257%
(n = 38) at 1024x768 on Cherryview.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-11-23 16:57:29 -08:00
Dave Airlie
3b6893b678 radv: fix flipped blits
This fixes:
dEQP-VK.api.copy_and_blit.blit_image.simple_tests.mirror*

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-23 23:49:32 +00:00
Dave Airlie
b06568873d radv/meta: just local vars for src/dst subresources.
This is just a cleanup before I rework this code to fix mirrored
blits.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-23 23:49:23 +00:00
Fredrik Höglund
28c781b574 radv: add support for VK_AMD_draw_indirect_count
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-11-24 08:19:27 +10:00
Fredrik Höglund
eff7bbc47e radv: add support for VK_AMD_negative_viewport_height
The driver already supports this extension in practice.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-11-24 08:19:24 +10:00
Fredrik Höglund
2c748c5c8a radv: add support for VK_KHR_sampler_mirror_clamp_to_edge
radv_tex_wrap() already supports VK_SAMPLER_ADDRESS_MODE_MIRROR_CLAMP_TO_EDGE,
so all that's needed is to advertise support for the extension.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-11-24 08:19:20 +10:00
Fredrik Höglund
5cbcbc75f4 radv: add support for anisotropic filtering on SI-CI
Ported from radeonsi.

Note that si_make_texture_descriptor() already sets img7 to the mask
value referred to in the comment.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-11-24 08:19:06 +10:00
Jordan Justen
72c00e7c47 i965/gen7: Only advertise 4 samples for RGBA32F on GLES
We can't render to 8x MSAA if the width is greater than 64 bits. (see
brw_render_target_supported)

Fixes ES31-CTS.sample_variables.mask.rgba32f.samples_8.mask_*

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-11-23 11:15:31 -08:00
Marek Olšák
76e953788a radeonsi: print new opt flags in si_dump_shader_key
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-23 18:49:10 +01:00
Marek Olšák
e5302ad936 radeonsi: add a debug flag that disables optimized shader variants
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-23 18:49:10 +01:00
Aaron Watry
ac458d2ae8 compiler/glsl/tests: Fix print format when building 32-bit binaries on 64-bit host
Avoids two warnings.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-11-23 10:15:00 -06:00
Aaron Watry
60c3a0a67c compiler/glsl/tests: Fix print format when building 32-bit binaries on 64-bit host
Avoids three warnings.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-11-23 10:15:00 -06:00
Emil Velikov
5cc07d854c anv: fix enumeration of properties
Driver should enumerate only up-to min2(num_available, num_requested)
properties and return VK_INCOMPLETE if the # of requested props is
smaller than the ones available.

Presently we assert out in such cases.

Inspired by a similar fix for RADV.

v2: Use MIN2 + typed_memcpy (Jason).

Should fix: dEQP-VK.api.info.device.extensions

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-23 14:13:47 +00:00
Ben Widawsky
0a0ce884ea i965: Restructure fast clear eligibility decision
v2 (Jason):
   - Use PRM citation for SKL now that it is available
   - Also return false for gen < 8 mipmapped/arrayed

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-23 11:06:53 +02:00
Topi Pohjolainen
f4c7989408 i965: Set initial msaa fast clear status explicitly
instead of in intel_miptree_init_mcs(). For lossless compression
the status is immediately overwritten in
intel_miptree_alloc_non_msrt_mcs() while the status for
non-compressed non-msaa miptrees is explicitly set in
do_blorp_clear().

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-23 11:06:53 +02:00
Topi Pohjolainen
dfd6088b3a i965: Declare read-only input to level/layer check const
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-23 11:06:53 +02:00
Topi Pohjolainen
07d070f324 i965/fbo: Prepare layer multiplier for render buffer compression
This path is not yet taken for fast cleared or compressed buffers
but later patches will enable it.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-23 11:06:53 +02:00
Topi Pohjolainen
a2d029dc5f i965: Add multi-slice getter for resolve maps
This is useful when checking if any slice is in unresolved state.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-23 11:06:53 +02:00
Topi Pohjolainen
7c75fd9a59 i965/meta: Split conversion of color and setting it
And fix a mangled comment while at it.

v2 (Ben): Return the converted color.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-23 11:06:53 +02:00
Topi Pohjolainen
f19e0967c9 intel/blorp: Fix rectangle size for level-not-zero resolves
Needed to prevent gpu hangs when mip-mapped compression gets
enabled.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-23 11:06:52 +02:00
Topi Pohjolainen
ca84e190a4 i965/miptree: Don't shrink textures when augmenting for more levels
This was detected when examining CCS_E failures with piglit test:
"fbo-generatemipmap-formats". Test creates a 2D texture with
dimensions 293x277. It manually loops over all levels and calls
glTexImage2D(). Level one triggers creation of full miptree:
intel_alloc_texture_image_buffer() realizes that there is only one
level in the miptree and calls intel_miptree_create_for_teximage()
to re-allocate the miptree with all 9 levels. However, the end result
is a miptree with level zero dimensions of 292x276.

Related, and possibly calling for treatment of its own is mip-map
generation:
After calling glTexImage2D() against every level test continues by
replacing content for levels one to eight with data derived from level
zero by calling glGenerateMipmapEXT(). This results into the miptree
being allocated anew for every level:
Mip-map generation goes thru meta which ends up validating the texture
(brw_validate_textures()->intel_finalize_mipmap_tree()->
intel_miptree_match_image()) where one finds texture with base level
size 292:276. This results into new miptree being created for the npot
size 293:277. Only here intel_finalize_mipmap_tree() is asked for only
one level, and therefore such is created. Generation for level one in
turn finds right base level size but only one level when two is needed.
And the same goes on for all eight levels.

This patch prevents the shrink maintaining the NPOT size of 293x277.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-23 11:06:52 +02:00
Eduardo Lima Mitev
6e8f12619f main/getteximage: Use the height argument to calculate memcpy copy size
In get_tex_memcpy, when copying texture data directly from source
to destination (when row strides match for both src and dst), the
copy size is currently calculated using the full texture height
instead of the sub-region height parameter that was passed.

This can cause a read past the end of the mapped buffer when y-offset
is greater than zero, leading to a segfault.

Fixes CTS test (from crash to pass):
* GL45-CTS/get_texture_sub_image/functional_test

v2: (Jason) Use the passed 'height' instead of copying til the
end of the buffer (tex-height - yoffset).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-23 09:22:32 +01:00
Iago Toral Quiroga
e062eb6415 nir/spirv: implement ordered / unordered floating point comparisons properly
Besides the logical operation involved, these also require that we test if the
operands are ordered / unordered.

For ordered operations, both operands must be ordered (and they must pass the
conditional test) while for unordered operations it is sufficient if only one
of the operands is unordered (or they pass the logical test).

Fixes the following Vulkan CTS tests:

dEQP-VK.spirv_assembly.instruction.compute.opfunord.equal
dEQP-VK.spirv_assembly.instruction.compute.opfunord.greater
dEQP-VK.spirv_assembly.instruction.compute.opfunord.greaterequal
dEQP-VK.spirv_assembly.instruction.compute.opfunord.less
dEQP-VK.spirv_assembly.instruction.compute.opfunord.lessequal

v2: Fixed typo: s/nir_eq/nir_feq

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2016-11-23 08:07:44 +01:00
Dave Airlie
9ce5926476 anv: fix segfault in anv_BindImageMemory
Since bind image memory started memsetting surfaces, the
device node can't be NULL, since we lookup device->info.has_llc.

Not sure why it ever was NULL before.

Fixes some things on my Ivybridge.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-23 16:11:03 +10:00
Tim Rowley
9c13cc9451 swr: [rasterizer core] fix cast for stencil clear value
Bad type cast for stencil clear value was picking up structure
padding bytes.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-11-22 20:06:17 -06:00
Ilia Mirkin
f6f644ea12 swr: color interpolation is also supposed to get perspective division
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-22 20:27:20 -05:00
Ilia Mirkin
7cbfe59cf3 swr: add sprite coord enable mask to fs key
This fixes gl-coord-replace-doesnt-eliminate-frag-tex-coords

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-22 20:27:20 -05:00
Ilia Mirkin
6d6ef3fb55 swr: rework vert <-> frag shader linkage logic
Fixes a few things:
 - sprite coords only apply to generic varyings, and are a bitmask
 - back color only applies in 2-sided lighting mode
 - handle some odd situations between only some front/back colors being
   there. This is only semi-legal in GL, but we shouldn't start
   crashing.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-22 20:27:20 -05:00
Ilia Mirkin
2595aebd91 swr: flatshading makes color outputs flat, it doesn't affect others
We were previously not marking the "regular" flat outputs as flat when
flatshading was enabled.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-22 20:27:20 -05:00
Ilia Mirkin
37be598dda swr: only broadcast color0 value, not all color values
The way that dual-source blending is described for GLES2 is very odd,
and we end up with a shader that both has this property set *and* has a
color1 value to be used as the second source. While changing the state
tracker is an option, it seems more reliable to verify that the
broadcast is only done on color0.

Fixes arb_blend_func_extended-fbo-extended-blend-pattern_gles2

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-22 20:27:20 -05:00
Ilia Mirkin
2234a4330e swr: report a reasonable max lod bias
This is the same value that llvmpipe uses. Since swr uses the same
sampler logic, makes sense for this value to also be the same. Most
applications don't care.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-22 20:27:20 -05:00
Ilia Mirkin
2b7bdff83f swr: avoid using exceptions for expected condition handling
I was getting a weird segfault from GCC 4.9.3:

 0x00007ffff54f27aa in strlen () from /lib64/libc.so.6
 (gdb) bt
 #0  0x00007ffff54f27aa in strlen () from /lib64/libc.so.6
 #1  0x00007ffff4f128e5 in get_cie_encoding (cie=cie@entry=0x7ffff6e09813)
     at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:272
 #2  0x00007ffff4f1318e in classify_object_over_fdes (ob=ob@entry=0xd7bb90, this_fde=0x7ffff7f11010)
     at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:628
 #3  0x00007ffff4f135ba in init_object (ob=0xd7bb90)
     at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:749
 #4  search_object (ob=ob@entry=0xd7bb90, pc=pc@entry=0x7ffff4f11f4d <_Unwind_RaiseException+61>)
     at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:961
 #5  0x00007ffff4f13e62 in _Unwind_Find_registered_FDE (bases=0x7fffffffd358, pc=0x7ffff4f11f4d <_Unwind_RaiseException+61>)
     at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:1025
 #6  _Unwind_Find_FDE (pc=0x7ffff4f11f4d <_Unwind_RaiseException+61>, bases=bases@entry=0x7fffffffd358)
     at /gcc-4.9.3/libgcc/unwind-dw2-fde-dip.c:450
 #7  0x00007ffff4f11197 in uw_frame_state_for (context=context@entry=0x7fffffffd2b0, fs=fs@entry=0x7fffffffd100)
     at /gcc-4.9.3/libgcc/unwind-dw2.c:1245
 #8  0x00007ffff4f11b15 in uw_init_context_1 (context=context@entry=0x7fffffffd2b0, outer_cfa=outer_cfa@entry=0x7fffffffd660, outer_ra=0x7ffff518d23b <__cxa_throw+91>)
     at /gcc-4.9.3/libgcc/unwind-dw2.c:1566
 #9  0x00007ffff4f11f4e in _Unwind_RaiseException (exc=0xd7c250)
     at /gcc-4.9.3/libgcc/unwind.inc:88
 #10 0x00007ffff518d23b in __cxa_throw () from /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/libstdc++.so.6
 #11 0x00007ffff51ed556 in std::__throw_out_of_range(char const*) ()
    from /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/libstdc++.so.6
 #12 0x00007fffea778be0 in std::map<pipe_format, SWR_FORMAT, std::less<pipe_format>, std::allocator<std::pair<pipe_format const, SWR_FORMAT> > >::at (
     this=0x7fffebeb4c40 <mesa_to_swr_format(pipe_format)::mesa2swr>,
     __k=@0x7fffffffd73c: PIPE_FORMAT_RGTC1_UNORM)
     at /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/include/g++-v4/bits/stl_map.h:549
 #13 0x00007fffea776aee in mesa_to_swr_format (format=PIPE_FORMAT_RGTC1_UNORM) at swr_screen.cpp:597

We can just void this whole issue by not using exceptions in the
first place.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-22 20:27:20 -05:00
Ilia Mirkin
946a7abd1c swr: remove formats from mapping table that don't have StoreTile impls
This table exists for the purpose of determining renderable formats.
Without a StoreTile implementation, that can't happen.

This basically removes rendering support to all L/LA/I formats. They can
be re-added when/if StoreTile implementations are added.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-22 20:27:20 -05:00
Ilia Mirkin
2e12d2ba72 swr: remove unnecessary -1 entries in format mapping table
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-22 20:27:20 -05:00
Ilia Mirkin
7cfb364b1a swr: rework resource layout and surface setup
This is a bit of a mega-commit, but unfortunately there's no great way
to break this up since a lot of different pieces have to match up. Here
we do the following:
 - change surface layout to match swr's Load/StoreTile expectations
 - fix sampler settings to respect all sampler view parameters
 - fix stencil sampling to read from secondary resource
 - respect pipe surface format, level, and layer settings
 - fix resource map/unmap based on the new layout logic
 - fix resource map/unmap to copy proper parts of stencil values in and
   out of the matching depth texture

These fix a massive quantity of piglits, including all the
tex-miplevel-selection ones.

Note that the swr native miptree layout isn't extremely space-efficient,
and we end up using it for all textures, not just the renderable ones. A
back-of-the-envelope calculation suggests about 10%-25% increased memory
usage for miptrees, depending on the number of LODs. Single-LOD textures
should be unaffected.

There are a handful of regressions as a result of this change:
 - Some textureGrad tests, these failures match llvmpipe. (There are
   debug settings allowing improved gallivm sampling accurancy.)
 - Some layered clearing tests as swr doesn't currently support that. It
   was getting lucky before because enough other things were broken.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-22 20:27:20 -05:00
Charmaine Lee
5d2b5996e1 util: fix missing swizzle components in the SINT <-> UINT conversion string
Fixes tgsi error introduced in commit 3817a7a. The error complains missing
swizzle component in the conversion string "UMIN TEMP[0], TEMP[0], IMM[0].x".

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-11-23 01:54:57 +01:00
Eric Anholt
414dbb2d5c vc4: Don't conditionalize the src1 mov of qir_SEL().
My thought in having both arguments conditionally moved was that it should
theoretically save some power by not doing work in those channels.
However, it ends up costing us instructions because we can't
register-coalesce the first of the MOVs, and it also introduces extra
scheduling dependencies.  The instruction cost would swamp whatever power
benefit I was hoping for.

shader-db results:
total instructions in shared programs: 100548 -> 99741 (-0.80%)
instructions in affected programs:     42450 -> 41643 (-1.90%)

With obvious outliers removed (I had an X11 emacs running over the network
in the "after" case), 3DMMES Taiji showed 1.07231% +/- 0.488241% fps
improvement (n=18, 30).
2016-11-22 16:46:03 -08:00
Eric Anholt
1f0ba902f0 vc4: Re-add R4 to the "any" register class.
I screwed this up in fdad4d2402 which was
supposed to be making this code more maintainable.  What's amazing is
multithreaded FS showed the wins it did despite this bug.

shader-db results:
total instructions in shared programs: 103535 -> 100548 (-2.89%)
instructions in affected programs:     83794 -> 80807 (-3.56%)
2016-11-22 16:46:03 -08:00
Eric Anholt
9728887e7f vc4: Disable MSAA rasterization when the job binning is single-sampled.
Gallium core just changed to start setting MSAA enabled in the rasterizer
state even with samples==1 buffers.  This caused disagreements in our
driver between binning and rasterization state, which the simulator threw
assertion failures about.  Keep the single-sampled samples==1 behavior for
now.
2016-11-22 16:46:03 -08:00
Eric Anholt
ff018e0979 vc4: Make sure we don't overflow texture input/output FIFOs when threaded.
I dropped the first hunk of this change last minute when I decided it
wasn't actually needed, and apparently failed to piglit it in simulation.
The simulator threw an an assertion in gl-1.0-drawpixels-color-index,
which queued up 5 coordinates (3 before a switch, two after) before
loading the result.
2016-11-22 16:46:03 -08:00
Dave Airlie
ea417f5335 radv: move pipeline barrier image transitions after src flushing
This seems like it would conform better with the spec.

noticed while digging into fast clears.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-23 10:16:34 +10:00
Jason Ekstrand
3fd79558be anv: Enable fast clears on gen7-8
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 14:24:29 -08:00
Jason Ekstrand
5e8069a572 anv: Add support for fast clears on gen9
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 14:24:29 -08:00
Jason Ekstrand
dae8e52030 anv/blorp: Rework flushing around resolves
It turns out that the flushing required around resolves is a bit more
extensive than I first thought.  You actually need render cache flush
and a CS stall both before *and* after the resolve.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 14:24:29 -08:00
Jason Ekstrand
8d1ccd6729 anv/cmd_buffer: Apply remaining flushes in EndCommandBuffer
Otherwise, some pipe flushes may just never happen.  This is unlikely to
cause problems depending on how the kernel schedules batches, but we
shouldn't count on it.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 14:24:29 -08:00
Jason Ekstrand
878499d323 anv/blorp: Use regular blorp clears for subpass clears
At vkCmdNextSubpass time, we have the actual framebuffer so we can use
regular blorp_clear for subpass clears.  For fast clears, there is no
attachment version, so this will make fast clears a bit easier.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 14:24:29 -08:00
Jason Ekstrand
772d223c9c anv: Add a vk_to_isl_color helper
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 14:24:29 -08:00
Jason Ekstrand
d1d6b78898 anv/cmd_buffer: Make setup_attachments take a RenderPassBeginInfo
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 14:13:53 -08:00
Jason Ekstrand
1d5ac0a462 anv: Set up binding tables and surface states for input attachments
This commit adds the last remaining bits to support input attachments in
the Intel Vulkan driver.  For color and depth attachments, we allocate an
input attachment surface state during vkCmdBeginRenderPass like we do for
the render target surface states.  This is so that we can incorporate the
clear color and aux information as used in rendering.  For stencil, we just
treat it like a regular texture because we don't there is no aux.  Also,
only having to worry about at most one input attachment surface for each
attachment makes some of the vkCmdBeginRenderPass code simpler.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 13:44:55 -08:00
Jason Ekstrand
140d041fac anv/pipeline: Handle depth/stencil self-dependencies
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 13:44:55 -08:00
Jason Ekstrand
0b01262844 anv: Use pass attachment information to insert flushes
Input and resolve attachments can cause an implicit dependency in the
pipeline.  It's our job to insert the needed flushes.  Fortunately, we can
easily reuse the usage tracking that we use for CCS resolves.

This fixes 159 Vulkan CTS tests on Haswell because we're now flushing in
between drawing and MSAA resolves.  I have no idea how they were passing
before on newer hardware.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 13:44:55 -08:00
Jason Ekstrand
57174d6042 anv/cmd_buffer: Fix pipeline barriers for input attachments
We were using VK_IMAGE_ACCESS_COLOR_ATTACHMENT_READ_BIT to detect an input
attachment read.  We should use VK_IMAGE_ACCESS_INPUT_ATTACHMENT_READ_BIT
instead.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 13:44:55 -08:00
Jason Ekstrand
0acb28e0cf anv/pipeline: Add a input_attachment_index to the bindings
This allows us to go from the binding to either the descriptor or the input
attachment at will.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 13:44:55 -08:00
Jason Ekstrand
3f1eda0b42 anv/pass: Calculate the combined image usage of attachments
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 13:44:55 -08:00
Jason Ekstrand
347f43c8ec anv: Add an input attachment lowering pass
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 13:44:55 -08:00
Jason Ekstrand
2e311e4211 i965/fs: Implement load_layer_id for fragment shaders
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 13:03:31 -08:00
Jason Ekstrand
08441dae59 nir: Add a layer_id system value intrinsic
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 13:03:29 -08:00
Jason Ekstrand
2e44799f50 spirv: Stop warning about input attachments
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 13:03:23 -08:00
Jason Ekstrand
c54097cc48 spirv: Handle the InputAttachmentIndex decoration
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 13:02:35 -08:00
Jason Ekstrand
111d57e7d2 compiler: Add the rest of the subpassInput types
There are actually 6 of them according to the GL_KHR_vulkan_glsl spec.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-11-22 13:02:29 -08:00
Jason Ekstrand
7a2cfd4adb anv/cmd_buffer: Emit CS push constants after binding tables
Emitting binding tables can cause push constants to be dirtied if the
shader uses images so we need to handle push constants later.
2016-11-22 10:10:38 -08:00
Marek Olšák
a3f6bea69a gallium: fix more occurences of u_hash.h
this fixes compile failures since 86514d84e0
2016-11-22 18:28:18 +01:00
Marek Olšák
d219720d19 mesa: use special checksums for unset checksums and fixed-func shaders
for debugging

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-22 18:07:16 +01:00
Marek Olšák
b818df1e71 glsl: add gl_linked_shader::SourceChecksum
for debugging

v2: wrap all checksums in #ifdef DEBUG

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-22 18:05:51 +01:00
Marek Olšák
6dfdf52b6a mesa: use util_hash_crc32 instead of _mesa_str_checksum
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-22 18:05:51 +01:00
Marek Olšák
86514d84e0 util: import CRC32 implementation from gallium
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-22 18:05:51 +01:00
Jason Ekstrand
3ef8dff865 anv/cmd_buffer: Add an assert on emit_binding_table failure
The != VK_SUCCESS case is really only capable of handling the one error.
This assert makes things a bit safer if something else goes wrong.

Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2016-11-22 08:50:27 -08:00
Lucas Stach
d9a3ad94ca gbm: request correct version of the DRI2_FENCE extension
There is no version 2 of the DRI2_FENCE extension. So only a request
for version 1 has a chance to succeed.

Fixes: 74b1969d71 (gbm: wire up fence extension)
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-22 15:56:44 +00:00
Jason Ekstrand
f680a01ad4 anv/cmd_buffer: Emit a CS stall before setting a CS pipeline
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-22 08:06:33 -08:00
Jason Ekstrand
054e48ee0e anv/cmd_buffer: Re-emit MEDIA_CURBE_LOAD when CS push constants are dirty
This can happen even if the binding table isn't changed.  For instance, you
could have dynamic offsets with your descriptor set.  This fixes the new
stress.lots-of-surface-state.cs.dynamic cricible test.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-22 08:06:33 -08:00
Jason Ekstrand
722ab3de9f anv/cmd_buffer: Handle running out of binding tables in compute shaders
If we try to allocate a binding table and fail, we have to get a new
binding table block, re-emit STATE_BASE_ADDRESS, and then try again.  We
already handle this correctly for 3D and blorp but it never got handled for
CS.  This fixes the new stress.lots-of-surface-state.cs.static crucible test.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-22 08:06:33 -08:00
Jason Ekstrand
a8ef92b031 i965/compiler: Disable trig workarounds on KBL+
The precision of our trig instructions appears to have been fixed on Kaby
Lake.  Neither Ben nor I can find any documentation for this.  However, the
dEQP precision tests now pass with INTEL_PRECISE_TRIG=0 where they fail on
Sky Lake.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-11-22 08:06:33 -08:00
Jason Ekstrand
767b163e47 intel/common: Add an is_kabylake field to gen_device_info
Most of the 3-D engine Kaby Lake is identical to Sky Lake.  However, there
are a few small differences that we need to be able to detect.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-11-22 08:06:33 -08:00
Gwan-gyeong Mun
e074a08a6d anv: Fix unintentional integer overflow in anv_CreateDmaBufImageINTEL
Since both pCreateInfo->strideInBytes and pCreateInfo->extent.height
are of uint32_t type 32-bit arithmetic will be used.

Fix unintentional integer overflow by casting to uint64_t before
multifying.

CID 1394321

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
[Emil Velikov: cast only of the arguments]
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-22 15:15:45 +00:00
Gwan-gyeong Mun
69cc7d90f9 util/disk_cache: close a previously opened handle in disk_cache_put (v2)
We're missing the close() to the matching open().

CID 1373407

v2: Fixes from Emil Velikov's review
    Update the teardown in reverse order of the setup/init.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v1)
2016-11-22 15:13:42 +00:00
Gwan-gyeong Mun
0e8dc81c3a docs: get rid of duplicated description from sourcetree.html
Fixes: 438086efb1 (docs: sourcetree.html misc updates)
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-22 15:13:42 +00:00
Emil Velikov
a2283b50e6 docs/submitting patches: mention get_reviewers.pl
Mention the script - why/how to use alongside a useful trick to make it
work interactively (thanks Rob!).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
2016-11-22 15:13:41 +00:00
Timothy Arceri
e260bfec04 docs/submitting patches: add git tips
v2: [Emil Velikov]
 - Add the shorthand git send-email -vX
 - Move to submittingpatches.html
 - Add to the TOC.

v3: [Emil Velikov]
 - Use @~8 instead of HEAD~8 (Nicolai)

Cc: Timothy Arceri <t_arceri@yahoo.com.au>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
2016-11-22 15:13:41 +00:00
Emil Velikov
29c8a4a4ce auxiliary/vl/dri: call get_xcb_screen() only once
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-11-22 15:13:41 +00:00
Emil Velikov
7c6babb22c egl/x11: store xcb_screen_t *screen instead of int screen
Just fetch and store it once, rather than doing the
xcb_setup_roots_iterator + get_xcb_screen dance five times.

v2: Call xcb_disconnect() on error (Eric)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
2016-11-22 15:13:41 +00:00
Emil Velikov
b9880d2e93 egl/x11: factor out dri2_get_xcb_connection()
Identical throughout dri2, dri3 and drisw. Next patch will add more
common code, so rather than duplicating it factor out the function.

Note: this also sets eglError on failure. Something that's quite
inconsistent throughout the codebase.

v2: Call xcb_disconnect() on error (Eric)

Note: use xcb_disconnect() even in the xcb_connection_has_error() case
as per the manual:
... memory will not be freed until xcb_disconnect...

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
2016-11-22 15:13:41 +00:00
Timothy Arceri
a56a505db7 mesa/glsl: remove unused uses_builtin_functions field
This has been unused since 943b69cddd

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-11-23 00:17:13 +11:00
Kenneth Graunke
38a8507f79 i965: Use NIR-based clip/cull lowering for OpenGL as well.
The old approach works fine, and this approach isn't necessarily better.
But it at least has the advantage that Vulkan and GL use the same
approach.  I originally wrote it to gain additional testing for the
new paths.

shader-db statistics show 0 instruction count changes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-22 00:29:24 -08:00
Kenneth Graunke
a4d7a5bd1e anv: Enable clip and cull distance support.
Everything is now in place, and we appear to pass the tests on Gen7+.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-22 00:29:24 -08:00
Kenneth Graunke
f182e5eafc i965/vec4: Handle component qualifiers on non-generic varyings.
ARB_enhanced_layouts only requires component qualifier support for
generic varyings, so this is all the vec4 backend knew how to handle.

This patch extends the backend to handle it for all varyings, so we
can use store_output intrinsics with a component set for things like
clip/cull distances.  We may want to use that for other VUE header
fields in the future as well.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-22 00:29:24 -08:00
Kenneth Graunke
b63f7671a3 i965/fs: Handle compact outputs.
We need to calculate the number of vec4 slots correctly.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-22 00:29:24 -08:00
Kenneth Graunke
536af43fe3 spirv: Silence unsupported capability warnings for Clip/CullDistance.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-22 00:29:24 -08:00
Kenneth Graunke
7471bb5fa4 anv: Set clip/cull distances fields in packets.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-22 00:29:23 -08:00
Kenneth Graunke
a9eabd539c anv: Combine ClipDistance and CullDistance arrays.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-22 00:29:23 -08:00
Kenneth Graunke
9a179f2db0 nir: add a pass to compact clip/cull distances.
v2: Use nir_is_per_vertex_io() rather than is_arrays_of_arrays().

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-22 00:29:23 -08:00
Kenneth Graunke
663b2e9a92 nir: Add a "compact array" flag and IO lowering code.
Certain built-in arrays, such as gl_ClipDistance[], gl_CullDistance[],
gl_TessLevelInner[], and gl_TessLevelOuter[] are specified as scalar
arrays.  Normal scalar arrays are sparse - each array element usually
occupies a whole vec4 slot.  However, most hardware assumes these
built-in arrays are tightly packed.

The new var->data.compact flag indicates that a scalar array should
be tightly packed, so a float[4] array would take up a single vec4
slot, and a float[8] array would take up two slots.

They are still arrays, not vec4s, however.  nir_lower_io will generate
intrinsics using ARB_enhanced_layouts style component qualifiers.

v2: Add nir_validate code to enforce type restrictions.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-22 00:29:23 -08:00
Dave Airlie
f395e3445d radv: add support for shader stats dump
I've started working on a shader-db alike for Vulkan,
it's based on vktrace and it records pipelines, this
adds support to dump the shader stats exactly like
radeonsi does, so I can reuse the shader-db scripts it
uses.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-22 07:20:17 +00:00
Dave Airlie
220912e214 radv: fix sample id loading
The sample id is packed into bits 8-12, so adjust
things properly.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-22 17:15:57 +10:00
Dave Airlie
3c6151ccaf radv/ac: add implementation of load_sample_pos intrinsic.
This fixes a bunch of crashes in CTS tests looking for this.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-22 17:15:54 +10:00
Dave Airlie
5697cfb7ec radv/ac: cleanup ddxy emission
This cleans up the ddxy emission along the same lines as
radeonsi. It also means we don't use LDS on VI chips we
use the dspermute interface, it also removes some duplicated
code.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-22 17:15:43 +10:00
Dave Airlie
fa57b77105 radv/meta: cleanup resolve vertex state emission
For the hw resolve there is no need to emit any sort
of texture coordinates, so drop them all in the meta path.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-22 17:15:37 +10:00
Bas Nieuwenhuizen
24427e31ef radv: Incorporate GPU family into cache UUID.
Invalidates the cache when someone switches cards.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
2016-11-22 07:58:35 +01:00
Bas Nieuwenhuizen
d94383970f radv: Use library mtime for cache UUID.
We want to also invalidate the cache when LLVM gets changed. As the
specific LLVM revision is not fixed at build time, we will need to
check at runtime. Computing a checksum for LLVM is going to be very
expensive, so just use the mtime.

Tested on my computer that the returned DSO for the LLVM symbol is
actually the LLVM DSO.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
2016-11-22 07:58:35 +01:00
Bas Nieuwenhuizen
43ee4917ca radv: Store UUID in physical device.
No sense in repeatedly determining it. Also, it might be dependent
on the device as shaders get compiled differently for SI/CIK/VI etc.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
2016-11-22 07:58:35 +01:00
Timothy Arceri
581bd1d12a glsl: fix NULL check
Fixes copy and paste error in 9d96d3803a
2016-11-22 14:40:26 +11:00
Ilia Mirkin
807bc6ea9e swr: calculate viewport width/height based on the scale
The former calculations were for min/max y. The width/height don't take
translate into account.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-21 21:11:26 -05:00
Ilia Mirkin
c3dd5b2e3f swr: don't claim to allow setting layer/viewport from VS
This may ultimately be possible to support, but for now it's not hooked
up and the swr core only supports this output from GS.

This normally wouldn't matter, but we lie about supporting GL 3.2, and
also the blitter and st/mesa will make use of this functionality if
claimed.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-21 21:11:26 -05:00
Ilia Mirkin
d48740568f swr: allocate all scratch space in one go for vertex buffers
Multiple buffers may reference client arrays. When this happens, we
might reach for scratch space multiple times, which could cause later
arrays to invalidate the pointers allocated for the earlier ones.

This fixes copyteximage 2D_ARRAY.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-21 21:11:26 -05:00
Ilia Mirkin
16d42f2f3d swr: call swr_update_derived unconditionally when drawing/clearing
Currently a sequence like draw/map/draw/map will cause the second map to
not wait for the second draw. This is because the first map will clear
the resource business bit, and the second draw won't reset it since no
state has changed.

swr_update_derived does a tiny bit of extra work, including updating the
SWR_BACKEND_STATE as well as waiting for prending fences. If that's a
problem, we could call swr_update_resource_status directly from
draw/clear handlers.

Fixes clearbuffer-stencil, clearbuffer-depth, clearbuffer-depth-stencil,
and clearbuffer-display-lists.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-21 21:11:26 -05:00
Ilia Mirkin
ee0b6597a9 swr: [rasterizer memory] minify texture width before alignment
The minification should happen before alignment, not after. See similar
logic on ComputeLODOffsetY. The current logic requires unnecessarily
large textures when there's an initial NPOT size.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-21 21:11:26 -05:00
Ilia Mirkin
c5a654786b swr: [rasterizer memory] minify original sizes for block formats
There's no guarantee that mip width/height will be a multiple of the
compressed block size. Doing a divide by the block size first yields
different results than GL expects, so we do the divide at the end.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-21 21:11:26 -05:00
Marek Olšák
bf75ef3f92 radeonsi: remove all varyings for depth-only rendering or rasterization off
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák
ef6c84b301 radeonsi: eliminate VS outputs that aren't used by PS at runtime
A past commit added the ability to compile "optimized" shader variants
asynchronously (not stalling the app).

This commit builds upon that and adds what is basically a runtime shader
linker. If a VS output isn't used by the currently-bound PS, a new VS
compilation is started without that output. The new shader variant
is used when it's ready.

All apps using separate shader objects I've seen had unused VS outputs.

Eliminating unused/useless VS outputs also eliminates the corresponding
vertex attribute loads.

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák
7e76f9a7a8 radeonsi: record information about all written and read varyings
It's just tgsi_shader_info with DEFAULT_VAL varyings removed.

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák
c7f3e5c647 radeonsi: make si_shader_io_get_unique_index stricter
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák
ed3190b3f3 radeonsi: don't export ClipVertex and ClipDistance[] if clipping is disabled
This is the first user of optimized monolithic shader variants.

Cull distances can't be disabled by states.

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák
d984a324bf radeonsi: add infrastr. for compiling optimized shader variants asynchronously
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák
d2a56985d7 radeonsi: don't set vs.epilog.export_prim_id if TES is bound
there is no VS epilog in this case

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák
fee71fec25 radeonsi: simplify checking for monolithic compilation
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák
e6aee45db4 radeonsi: print all flags in si_dump_shader_key
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák
6d5c2a8b5c radeonsi: split the shader key into 3 logical parts
key->part.*: prolog and epilog flags only
key->as_{ls,es}: special flags
key->mono.*: flags for monolithic compilation only

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák
d4e9f409e9 radeonsi: fix culling if clip & cull distances are used at the same time
Fixed piglits:
- arb_cull_distance/clip-cull-3
- arb_cull_distance/clip-cull-4

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák
9d8db805ef radeonsi: clean up si_emit_clip_regs
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák
e59389d738 radeonsi: assume that a VS without POSITION is LS
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák
7dbf83af54 tgsi/scan: record if a shader writes the position output
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák
8a2251911e tgsi/scan: use a big switch for scanning outputs
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák
bdd860e307 radeonsi: decrease the number of texture slots to 24
Company Of Heroes 2 needs only 24.

This saves 512 bytes of CE RAM per shader stage.

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák
fa476e0566 radeonsi: fast exit si_emit_derived_tess_state early
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák
79a8e674ae winsys/amdgpu: set addrlib flag opt4Space
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák
72d1669ed2 radeonsi: check for !is_linear in do_hardware_msaa_resolve
We don't want opt4Space here.

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Marek Olšák
49fa4a4e60 gallium/radeon: add RADEON_SURF_OPTIMIZE_FOR_SPACE
FORCE_TILING should disable it. It has no effect now, but that may change
soon.

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-21 21:44:35 +01:00
Mun Gwan-gyeong
44a3f2ee09 radeonsi: Add missing error-checking to si_create_compute_state (v2)
When the uploading of shader fails on si_shader_binary_upload(),
it returns -ENOMEM. We should handle si_shader_binary_upload() failure path
on si_create_compute_state().

CID 1394027

v2: Fixes from Edward O'Callaghan's review
 a) Update explicitly return value check with "si_shader_binary_upload() < 0"
 b) Update commit message.

Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-11-21 21:09:06 +01:00
Roland Scheidegger
e442db8e98 draw: drop some overflow computations
It turns out that noone actually cares if the address computations overflow,
be it the stride mul or the offset adds.
Wrap around seems to be explicitly permitted even by some other API (which
is a _very_ surprising result, as these overflow computations were added just
for that and made some tests pass at that time - I suspect some later fixes
fixed the actual root cause...). So the requirements in that other api were
actually sane there all along after all...
Still need to make sure the computed buffer size needed is valid, of course.
This ditches the shiny new widening mul from these codepaths, ah well...

And now that I really understand this, change the fishy min limiting
indices to what it really should have done. Which is simply to prevent
fetching more values than valid for the last loop iteration. (This makes
the code path in the loop minimally more complex for the non-indexed case
as we have to skip the optimization combining two adds. I think it should
be safe to skip this actually there, but I don't care much about this
especially since skipping that optimization actually makes the code easier
to read elsewhere.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-11-21 20:02:53 +01:00
Roland Scheidegger
2471aaa02f draw: simplify fetch some more
Don't keep the ofbit. This is just a minor simplification, just adjust
the buffer size so that there will always be an overflow if buffers aren't
valid to fetch from.
Also, get rid of control flow from the instanced path too. Not worried about
performance, but it's simpler and keeps the code more similar to ordinary
fetch.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-11-21 20:02:53 +01:00
Roland Scheidegger
4e1be31f01 draw: unify linear and elts draw jit functions
The code for elts and linear paths was nearly 100% identical by now - with
the elts path simply having some additional gather for the elements in the
main loop (with some additional small differences before the main loop).

Hence nuke the separate functions and decide this at jit shader execution
time (simply based on the presence of the elts pointer).

Some analysis shows that the generated vs jit functions seem to be just very
minimally more complex than the former elts functions, and almost none of the
additional complexity is in the main loop (basically just the branch logic
for the branch fetching the actual indices).
Compared to linear, the codesize of the function is of course a bit larger,
however the actual executed code in the main loop appears to be near 100%
identical (the additional code looking up indices is skipped as expected).

So, I would not expect a (meaningful) performance difference with the
generated code, neither with elts nor linear, this does however roughly
half the compilation time (the compiled shaders should also use only half
the memory of course).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-11-21 20:02:53 +01:00
Roland Scheidegger
8cf7edff7d draw: use same argument order for jit draw linear / elts functions
This is a bit simpler. Mostly to make it easier to unify the paths later...

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-11-21 20:02:53 +01:00
Roland Scheidegger
78a997f728 draw: drop unnecessary index overflow handling from vsplit code
This was kind of strange, since it replaced indices which were only
overflowing due to bias with MAX_UINT. This would cause an overflow later
in the shader, except if stride was 0, however the vertex id would be
essentially random then (-1 + eltBias). No test cared about it, though.
So, drop this and just use ordinary int arithmetic wraparound as usual.
This is much simpler to understand and the results are "more correct" or
at least more consistent (vertex id as well as actual fetch results just
correspond to wrapped around arithmetic).
There's only one catch, it is now possible to hit the cache initialization
value also with ushort and ubyte elts path (this wouldn't be an issue if
we'd simply handle the eltBias itself later in the shader). Hence, we need
to make sure the cache logic doesn't think this element has already been
emitted when it has not (I believe some seriously bad things could happen
otherwise). So, borrow the logic which handled this from the uint case, but
not before fixing it up...

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-11-21 20:02:53 +01:00
Roland Scheidegger
7a55c436c6 draw: simplify vsplit elts code a bit
vsplit_get_base_idx explicitly returned idx 0 and set the ofbit
in case of overflow. We'd then check the ofbit and use idx 0 instead of
looking it up. This was necessary because DRAW_GET_IDX used to return
DRAW_MAX_FETCH_IDX and not 0 in case of overflows.
However, this is all unnecessary, we can just let DRAW_GET_IDX return 0
in case of overflow. In fact before bbd1e60198
the code already did that, not sure why this particular bit was changed
(might have been one half of an attempt to get these indices to actual draw
shader execution - in fact I think this would make things less awkward, it
would require moving the eltBias handling to the shader as well).
Note there's other callers of DRAW_GET_IDX - those code paths however
explicitly do not handle index buffer overflows, therefore the overflow
value doesn't matter for them.

Also do some trivial simplification - for (unsigned) a + b, checking res < a
is sufficient for overflow detection, we don't need to check for res < b too
(similar for signed).

And an index buffer overflow check looked bogus - eltMax is the number of
elements in the index buffer, not the maximum element which can be fetched.
(Drop the start check against the idx buffer though, this is already covered
by end check and end < start).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-11-21 20:02:53 +01:00
George Kyriazis
9aae167e94 gallium: Add support for SWR compilation
Include swr library and include -DHAVE_SWR in the compile line.

v3: split to a separate commit

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-21 12:44:47 -06:00
George Kyriazis
5b4d1500dd gallium: swr: Added swr build for windows
v4: Add windows-specific gen_knobs.{cpp|h} changes
v5: remove aggresive squashing of gen_knobs.py to this commit; added
SConscript to EXTRA_DIST in Makefile.am

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-21 12:44:47 -06:00
George Kyriazis
9e4e1f5190 swr: Modify gen_knobs.{cpp|h} creation script
Modify gen_knobs.py so that each invocation creates a single generated
file.  This is more similar to how the other generators behave.

v5: remove Scoscript edits from this commit; moved to commit that first
adds SConscript

Acked-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-21 12:44:47 -06:00
George Kyriazis
9085f1a9cc scons: Add swr compile option
To buils The SWR driver (currently optional, not compiled by default)

v3: add option as opposed to target

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-21 12:44:47 -06:00
George Kyriazis
bc26e8d4a7 swr: Windows-related changes
- Handle dynamic library loading for windows
- Implement swap for gdi
- fix prototypes
- update include paths on configure-based build for swr_loader.cpp

v2: split to multiple patches
v3: split and reshuffle some more; renamed title
v4: move Makefile.am changes to other commit. Modify header files

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-21 12:44:46 -06:00
George Kyriazis
87bd28210f swr: renamed duplicate swr_create_screen()
There are 2 swr_create_screen() functions.  One in swr_loader.cpp, which
is used during driver init, and the other is hiding in swr_screen.cpp,
which ends up in the arch-specific .dll/.so.

Rename the second one to swr_create_screen_internal(), to avoid confusion
in header files.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-21 12:44:46 -06:00
George Kyriazis
974d280e81 swr: Handle windows.h and NOMINMAX
Reorder header files so that we have a chance to defined NOMINMAX before
mesa include files include windows.h

v3: split from bigger patch

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-21 12:44:46 -06:00
George Kyriazis
915b4b0d49 gallium: Added SWR support for gdi
Added hooks for screen creation and swap.  Still keep llvmpipe the default
software renderer.

v2: split from bigger patch
v3: reword commit message

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-21 12:44:46 -06:00
George Kyriazis
30ae2cbf82 scons: add llvm 3.9 support.
v2: reworded commit message

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-21 12:44:46 -06:00
George Kyriazis
2da28dbd11 scons: ignore .hpp files in parse_source_list()
Drivers that contain C++ .hpp files need to ignore them too, along
with .h files, when building source file lists.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-21 12:44:46 -06:00
George Kyriazis
c323180733 mesa: removed redundant #else
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-21 12:44:46 -06:00
Jordan Justen
44c5ed02d1 i965/hsw: Set integer mode in sampling state for stencil texturing
Fixes:

ES31-CTS.functional.texture.border_clamp.formats.depth24_stencil8_sample_stencil.nearest_size_pot
ES31-CTS.functional.texture.border_clamp.formats.depth24_stencil8_sample_stencil.nearest_size_npot
ES31-CTS.functional.texture.border_clamp.formats.depth32f_stencil8_sample_stencil.nearest_size_pot
ES31-CTS.functional.texture.border_clamp.formats.depth32f_stencil8_sample_stencil.nearest_size_npot
ES31-CTS.functional.texture.border_clamp.unused_channels.depth24_stencil8_sample_stencil
ES31-CTS.functional.texture.border_clamp.unused_channels.depth32f_stencil8_sample_stencil

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-21 10:10:53 -08:00
Emil Velikov
8e0e2478ba reviewers: add Rob H for the Android EGL+build parts
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-21 16:01:06 +00:00
Emil Velikov
7a39a0091d docs: recommend using --enable-mangling over the manual -DUSE...
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-21 15:08:27 +00:00
Emil Velikov
0fa854aea5 docs: rework/update install.html
Still far from perfect, but a few small steps in the right direction.

 - Split build systems, compilers, third party tools
 - Mention building mesa for Android (part of AOSP)
 - Drop explicit "other" dependencies. Reference to disto methods to
get them.
 - HTML 4.01 Traditional compliance fixes - mixed ul and br tags.
 - nuke dead links README.{CYGWIN,VMS}

v2: Squash typos, add note about buggy flex 2.6.2 (Eric), add Suse
zipper command (Tobias).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-21 15:08:23 +00:00
Emil Velikov
438086efb1 docs: sourcetree.html misc updates
A mixed bag of updates/fixes - mostly aiming at removing no longer
applicable directories.

Add a few more state-trackers, drivers, etc. alongside "XXX more" where
applicable. Attribute for the GLSL/NIR movement and nukage of
src/egl/docs.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-21 15:08:20 +00:00
Emil Velikov
2edc29ab1e docs: flesh out releasing.html
Properly document the whole process:
 - Brief on what, when, where
 - Picking, testing, branchpoints, pre-release announcement
 - Releasing, announcement, website and bugzilla updates

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-21 15:08:18 +00:00
Emil Velikov
b571c075e9 docs/submittingpatches: fix tags mis/abuse
Fix the odd tag so that we're HTML 4.01 Traditional compliant

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-21 15:08:14 +00:00
Emil Velikov
07384468af docs/submittingpatches: flesh out "how to nominate" methods
Currently they are buried within the text, making it hard to find.
Move them to the top and be clear what is _not_ a good idea.

v2: Minor commit polish, use only "resending" as suggested by Matt.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-21 15:08:12 +00:00
Emil Velikov
019f055f32 docs/autoconf: update glx driver / enable-debug text
With earlier commit we folded all the xlib handling in --enable-glx, but
we forgot to update the documentation.

Elaborate on --enable-debug and drop mentions about depenencies.

v2: Grammar - s|haven't|hasn't| (Eric)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-21 15:08:09 +00:00
Emil Velikov
49ac732651 docs/repository: refer to Submitting patches
v2: Improve grammar - add missing "to" (Eric).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-21 15:08:07 +00:00
Emil Velikov
259e65c03e docs: split Submitting Patches into separate document
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-21 15:08:05 +00:00
Emil Velikov
e561737c52 docs: split Codying style into separate document
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-21 15:08:04 +00:00
Emil Velikov
edbf3ebe1f docs: mention/suggest testing your patch against dEQP
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-21 15:08:02 +00:00
Emil Velikov
f2d9c7b60c docs: mention that coding style can differ between drivers
... and point people to use/honour the EditorConfig/Emacs files, where
applicable.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-21 15:07:59 +00:00
Emil Velikov
4fbeac398a revieweds: add Tomasz for the Android/EGL implementation
As mentioned/requested on the mailing list.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-21 14:46:40 +00:00
Emil Velikov
4f12fcb6d3 mesa: fold always true conditional
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-21 14:46:40 +00:00
Emil Velikov
e70d0d22a2 mesa: drop unneeded assert
As seen a couple  of lines above - there's no way for the assert to
trigger.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-21 14:46:40 +00:00
Emil Velikov
130b12f96a egl/wayland: remove non-applicable destroyDrawable from error path
If we fail to create the drawable there's not much point in attampting
to destroy it.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-11-21 14:46:40 +00:00
Emil Velikov
b421fec958 loader: automake: whitespace cleanup
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-11-21 14:46:40 +00:00
Emil Velikov
4ffa9b2847 gbm: automake: remove unused defines
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-11-21 14:46:40 +00:00
Gwan-gyeong Mun
d3780e2e4d intel: aubinator: Fix resource leak in gen_spec_load_from_path
This fixes resource leak in gen_spec_load_from_path XML_ParserCreate
failure path

CID 1373564

Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2016-11-21 14:38:11 +00:00
Tomasz Figa
51727b1cf5 egl/android: Use gralloc::lock_ycbcr for resolving YUV formats (v2)
There is an interface that can be used to query YUV buffers for their
internal format. Specifically, if gralloc:lock_ycbcr() is given no SW
usage flags, it's supposed to return plane offsets instead of pointers.
Let's use this interface to implement support for YUV formats in Android
EGL backend.

v2: Fixes from Emil's review:
 a) Added comments for parts that might be not clear,
 b) Changed get_fourcc_yuv() to return -1 on failure,
 c) Changed is_yuv() to use bool.

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-11-21 13:27:47 +00:00
Tomasz Figa
859d0b0121 egl/android: Get gralloc module in dri2_initialize_android() (v2)
Currently droid_open_device() gets a reference to the gralloc module
only for its own use and does not store it anywhere. To make it possible
to call gralloc methods from code added in further patches, let's
refactor current code to get gralloc module in dri2_initialize_android()
and store it in dri2_dpy.

v2: fixes from Emil's review:
 a) remove duplicate initialization of 'err'.

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-11-21 13:27:41 +00:00
Tomasz Figa
925ff0b534 egl/android: Remove handling of RGB_888 pixel format
It is currently completely broken, as it ends up using RGBX_8888 on
hardware side, due to no way of distinguishing between these two in the
DRI API, while HAL_PIXEL_FORMAT_RGB_888 is clearly defined to be the
3-byte per pixel RGB format.

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-21 13:27:01 +00:00
Gwan-gyeong Mun
9c5b1c7990 radeonsi: Fix resource leak in gs_copy_shader allocation failure path
CID 1394028

Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-22 00:04:59 +11:00
Nicolai Hähnle
0e11290ef5 glsl/lower_output_reads: remove unused mem_ctx
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-21 08:21:52 +01:00
Nicolai Hähnle
a3b98edf6f glsl/lower_output_reads: bail early in tessellation control shaders
This whole pass is a no-op.

Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-21 08:21:41 +01:00
Nicolai Hähnle
0d383a79a8 glsl/lower_output_reads: fix geometry shader output handling with conditional emit
Consider a geometry shader that contains code like this:

   some_out = expr;

   if (cond) {
      ...
      EmitVertex();
   } else {
      ...
      EmitVertex();
   }

Both branches should see the correct value of some_out.

Since this is a rather subtle and rare case, I'm submitting a piglit test
for this as well.

GLSL says that the values of output variables are undefined after
EmitVertex(). With this change, the values will now be defined and
unmodified. This may reduce optimization opportunities in the probably
quite rare case where subsequent compiler passes cannot prove that the
value of the output variable is overwritten.

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-21 08:21:31 +01:00
Nicolai Hähnle
42d5e91a2a radeonsi: store group_size_variable in struct si_compute
For compute shaders, we free the selector after the shader has been
compiled, so we need to save this bit somewhere else.  Also, make sure that
this type of bug cannot re-appear, by NULL-ing the selector pointer after
we're done with it.

This bug has been there since the feature was added, but was only exposed
in piglit arb_compute_variable_group_size-local-size by commit
9bfee7047b (which is totally unrelated).

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-21 08:19:43 +01:00
Nicolai Hähnle
47db6b4600 glsl: don't flatten if-blocks with dynamic array indices
This fixes the regression of radeonsi in
glsl-1.10/execution/variable-indexing/vs-output-array-vec3-index-wr
caused by commit 74e39de932.

Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-21 08:18:47 +01:00
Iago Toral Quiroga
39c47e7698 anv/state: enable coordinate address rounding for Min/Mag filters
This patch improves pass rate of dEQP-VK.texture.explicit_lod.2d.sizes.*
from 68.0% (98/144) to 83.3% (120/144) by enabling sampler address
rounding mode when the selected filter is not nearest, which is the same
thing we do for OpenGL.

These tests check texture filtering for various texture sizes and mipmap
levels. The failures (without this patch) affect cases where the target
texture has odd dimensions (like 57x35) and either the Min or the Mag filter
is not nearest.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-21 08:01:54 +01:00
Jason Ekstrand
a8b85f1f77 anv: Implement a depth stall restriction on gen7
Fixes around 60 Vulkan CTS tests on Haswell

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-20 20:40:40 -08:00
Ilia Mirkin
9145873b15 nvc0/ir: use levelZero flag when the lod is set to 0
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-11-20 18:13:12 -05:00
Dave Airlie
b1340fd708 radv: spir-v allows texture size query with and without lod.
The translation to llvm was failing here due to required lod.

This fixes some new  SteamVR shaders.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-21 09:00:22 +10:00
Dave Airlie
6d7be52d90 radv: fix image view creation for depth and stencil only
This fixes the image view for sampling just the depth.

It removes some pointless swizzle code, and adds
a missing case for the x8_d24 format.

Fixes:
dEQP-VK.renderpass.formats.d32_sfloat_s8_uint.input.*
dEQP-VK.renderpass.formats.d24_unorm_s8_uint.input.*
dEQP-VK.renderpass.formats.x8_d24_unorm_pack32.input.*

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-21 08:58:03 +10:00
Dave Airlie
51a44c0021 radv: make sure to flush input attachments correctly.
This fixes 9 of the
dEQP-VK.renderpass.attachment_allocation.input_output.*
tests.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-21 08:57:31 +10:00
Brian Paul
e899d47bc9 tnl: remove unneeded #include "util/simple_list.h"
Reviewed-by: Vinson Lee <vlee@freedesktop.org>
2016-11-20 06:41:42 -07:00
Brian Paul
a6e849c672 radeon: remove unneeded #include "util/simple_list.h"
Compile tested only.

Reviewed-by: Vinson Lee <vlee@freedesktop.org>
2016-11-20 06:41:42 -07:00
Brian Paul
36678e97e4 r200: remove unneeded #include "util/simple_list.h"
And include "util/simple_list.h" where it is needed in r200_state.c
Compile tested only.

Reviewed-by: Vinson Lee <vlee@freedesktop.org>
2016-11-20 06:41:41 -07:00
Brian Paul
5d7b5d8627 i915: remove unneeded #include "util/simple_list.h"
Compile tested only.

Reviewed-by: Vinson Lee <vlee@freedesktop.org>
2016-11-20 06:41:41 -07:00
Brian Paul
da0bc7b646 mesa: remove unneeded #includes in errors.c
Reviewed-by: Vinson Lee <vlee@freedesktop.org>
2016-11-20 06:41:41 -07:00
Brian Paul
0d1e240a4f mesa: remove trailing whitespace in errors.c
Reviewed-by: Vinson Lee <vlee@freedesktop.org>
2016-11-20 06:41:41 -07:00
Kenneth Graunke
9c1609f0d6 nir: Add a C wrapper for glsl_type::is_array_of_arrays().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-19 12:30:26 -08:00
Kenneth Graunke
a1a292d177 i965: Store a clip_distance_mask field similar to cull_distance_mask.
This isn't useful for legacy GL, but will be used in Vulkan.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-19 12:30:25 -08:00
Kenneth Graunke
19c652b29c i965: Use shader_info for brw_vue_prog_data::cull_distance_mask.
This also allows us to move it from a GL specific location to a
part of the compiler shared by both GL and Vulkan.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-19 12:30:25 -08:00
Kenneth Graunke
c447ca64c1 compiler: Store the clip/cull distance array sizes in shader_info.
We switched from a boolean to array lengths in gl_program a while back.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-19 12:30:25 -08:00
Kenneth Graunke
c4be6e0b8d i965: Fix GS push inputs with enhanced layouts.
We weren't taking first_component into account when handling GS push
inputs.  We hardly ever push GS inputs, so this was not caught by
existing tests.  When I started using component qualifiers for the
gl_ClipDistance arrays, glsl-1.50-transform-feedback-type-and-size
started catching this.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-19 12:30:25 -08:00
Kenneth Graunke
45aee6be02 i965: Delete unused variable.
I forgot to delete this in 9ef2b9277d.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-19 12:30:25 -08:00
Kenneth Graunke
9ef2b9277d intel: Share URB configuration code between GL and Vulkan.
This code is far too complicated to cut and paste.

v2: Update the newly added genX_gpu_memcpy.c; const a few things.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-19 11:40:01 -08:00
Kenneth Graunke
6d416bcd84 i965: Use arrays in Gen7+ URB code.
So much of this code was cut and pasted per stage.  We can accomplish
much of it by looping over shader stages.

Improves performance of OglBatch7 (version 6) by 1.50783% +/- 0.287049%
(n = 71) at 1024x768 on Cherryview.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-19 11:40:00 -08:00
Kenneth Graunke
6656dd4b92 i965: Drop brw->urb.{nr_*_entries,*_start} assignments from gen7_urb.c.
The context fields are for Gen4-5; setting them has always been useless.
There's no point in spending the cost in the hottest path in the driver.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-19 11:40:00 -08:00
Kenneth Graunke
74d8612eed i965: Switch to roundf in HS/DS URB code.
Matt intentionally switched the VS calculation to be float-based in
commit c1da15709a.  Tessellation support
was written before this and rebased forward, and missed the change.

Now it's consistent.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-19 11:39:59 -08:00
Kenneth Graunke
c87b5dee11 i965: Make URB code use prog_data for GS/tessellation enable checks.
If geometry/tessellation shaders are disabled, prog_data will be NULL
(see brw_state_upload.c).  This consolidates dirty bits a little.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-19 11:39:58 -08:00
Kenneth Graunke
639af2a7c6 intel: Convert devinfo->urb.min_*_entries into an array.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-19 11:39:56 -08:00
Kenneth Graunke
58c09e72b1 intel: Convert devinfo->urb.max_*_entries into an array.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-19 11:39:45 -08:00
Brian Paul
2acfd36479 docs: document MESA_DEBUG=context
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-11-19 08:44:03 -07:00
Ilia Mirkin
ea276512a0 swr: mark streamout buffers as written
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-19 10:40:37 -05:00
Timothy Arceri
203c8794a1 st/mesa/glsl/nir/i965: make use of new gl_shader_program_data in gl_shader_program
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-19 15:45:46 +11:00
Timothy Arceri
65cd0a0d7f mesa: create new gl_shader_program_data struct
This will be used to share data between gl_program and gl_shader_program
allowing for greater code simplification as we can remove a number of
awkward uses of gl_shader_program.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-19 15:45:46 +11:00
Timothy Arceri
0c85d2fea4 glsl: add new program driver function to standalone compiler
This fixes a regression with the standalone compiler caused by
9d96d3803a

Note that we change standalone_compiler_cleanup() to no longer
explicitly free the linked shaders as the will be freed when
we free the parent ctx whole_program.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98774
2016-11-19 15:00:12 +11:00
Kenneth Graunke
ff0253a5ed i965: Disable depth writes when depth test is GL_EQUAL.
There's no point in performing depth writes when the depth test
comparison function is set to GL_EQUAL - it would just write out
the same value that's already there (if it is written at all).  While
this is harmless from a functional perspective, it hurts performance.
Obviously, writing to memory is not free, but there's another more
subtle impact as well: it can prevent early depth optimizations.

Depth writes aren't supposed to happen for pixels that are killed
by fragment shader discard statements or the alpha test.  So, with
depth writes enabled and either of those, the pixel shader must be
invoked to determine whether or not to perform the write.  This is
fairly stupid in the EQUAL case - we're running a shader to decide
whether to replace the existing depth value with itself.

By disabling these pointless writes, we allow early depth even with
discards and alpha testing, allowing the hardware to skip the pixel
shader altogether if the depth test fails.

Improves performance of Unigine Valley:

- Skylake GT2:    +17.8%
- Broadwell GT3e: +11.5%
- Cherrytrail:    +19.4%

Huge thanks to Mark Janes for building frameretrace [1], the performance
analysis tool that helped us find this issue, and to Robert Bragg for
providing us performance metrics on Linux.  Mark also spent the time to
analyze Valley performance on Windows vs. Linux and discovered a
discrepancy in early depth test metrics.  Once he had isolated a draw
call and drawn attention to the problem, fixing it was pretty simple.

[1] https://github.com/janesma/apitrace/wiki/frameretrace-branch

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-18 14:48:52 -08:00
Timothy Arceri
adb3a83c09 glsl: tidy up entries temporary
Here we just move initialisation of entries to where it is needed i.e.
outside the loop and after the continue checks.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-19 09:35:58 +11:00
Timothy Arceri
c20564ae3e glsl/i965: move per stage AtomicBuffers list to gl_program
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-19 09:35:58 +11:00
Timothy Arceri
9d96d3803a glsl: create gl_program at the start of linking rather than the end
This will allow us to directly store metadata we want to retain in
gl_program this metadata is currently stored in gl_linked_shader and
will be lost if relinking fails even though the program will remain
in use and is still valid according to the spec.

"If a program object that is active for any shader stage is re-linked
unsuccessfully, the link status will be set to FALSE, but any existing
executables and associated state will remain part of the current
rendering state until a subsequent call to UseProgram,
UseProgramStages, or BindProgramPipeline removes them from use."

This change will also help avoid the double handing that happens in
_mesa_copy_linked_program_data().

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-19 07:42:33 +11:00
Timothy Arceri
2b8f97d0ff st/mesa/i965: simplify gl_program references and stop leaking
In i965 we were calling _mesa_reference_program() after creating
gl_program and then later calling it again with NULL as a param
to get the refcount back down to 1. This changes things to not
use _mesa_reference_program() at all and just have gl_linked_shader
take ownership of gl_program since refcount starts at 1.

The st and ir_to_mesa linkers were worse as they were both getting
in a state were the refcount would never get to 0 and we would leak
the program.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-19 07:42:33 +11:00
Nanley Chery
9db5cc829f anv/cmd_buffer: Enable stencil-only HZ clears
The HZ sequence modifies less state than the blorp path and requires
less CPU time to generate the necessary packets.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-18 12:12:55 -08:00
Nanley Chery
37c07d64b4 anv/cmd_buffer: Manage Anv state around HZ op emission
Move the assignment to a less surprising location.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-18 12:12:50 -08:00
Nanley Chery
6ff4c24fdd anv/cmd_buffer: Clarify HZ rectangle behavior
This behavior differs from what's described in the PRMs and was
observed by analyzing CTS test results.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-18 12:12:34 -08:00
Nanley Chery
63318d34ac mesa/fbobject: Update CubeMapFace when reusing textures
Framebuffer attachments can be specified through FramebufferTexture*
calls. Upon specifying a depth (or stencil) framebuffer attachment that
internally reuses a texture, the cube map face of the new attachment
would not be updated (defaulting to TEXTURE_CUBE_MAP_POSITIVE_X).
Fix this issue by actually updating the CubeMapFace field.

This bug manifested itself in BindFramebuffer calls performed on
framebuffers whose stencil attachments internally reused a depth
texture.  When binding a framebuffer, we walk through the framebuffer's
attachments and update each one's corresponding gl_renderbuffer. Since
the framebuffer's depth and stencil attachments may share a
gl_renderbuffer and the walk visits the stencil attachment after
the depth attachment, the uninitialized CubeMapFace forced rendering
to TEXTURE_CUBE_MAP_POSITIVE_X.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77662
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-18 11:58:19 -08:00
Lionel Landwerlin
9a806d2d15 mesa: add NV_image_formats extension support
This extension can be enabled automatically as it is a subset of
ARB_shader_image_load_store.

v2: Replace helper function by qualifier struct field (Ilia)
    Enable NV_image_formats using ARB_shader_image_load_store (Ilia)

v3: Drop extension field from gl_extensions (Ilia)
    Release notes (Ilia)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98480
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-11-18 13:27:28 +00:00
Timothy Arceri
88fe2c308e mesa: fix old classic drivers to use ralloc for ARB asm programs
These changes were missed in 0ad69e6b5.

Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98767
2016-11-18 23:39:40 +11:00
Nicolai Hähnle
da2a51129b st/mesa: silence warnings in optimized builds
Mark variables and static functions that only occur in assert()s as
MAYBE_UNUSED.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-11-18 09:49:22 +01:00
Nicolai Hähnle
9882ed85bd radeonsi: emit sample locations also when nr_samples == 1
Since the state tracker now enables MSAA in the hardware for the case
nr_samples == 1 as well, we need to set sample locations correctly for
this case.

The Polaris override is still needed for the non-MSAA case (when
nr_samples == 0).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-11-18 09:48:46 +01:00
Nicolai Hähnle
70454f5b55 radeonsi: allow sample mask export for single-sample framebuffers
This fixes GL45-CTS.sample_variables.mask.*.samples_1.*.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-11-18 09:48:43 +01:00
Nicolai Hähnle
ceac3397fb st/mesa: remove a redundant call to _mesa_is_multisample_enabled
We called it immediately prior, so re-use the previously returned value.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-11-18 09:48:39 +01:00
Nicolai Hähnle
adba706122 mesa/main: consider multisampling enabled when number of samples == 1
There are some differences between how non-multisampled framebuffers (i.e.
samples == 0) and multisampled framebuffers with a single sample should be
treated.  For example, alpha to coverage and writing to gl_SampleMask has an
effect with single-sample multisample framebuffers, but not on
non-multisample framebuffers.

This fixes GL45-CTS.sample_variables.mask.*.samples_1.* at least for
Gallium drivers (and possibly others, though at least radeonsi needs an
additional fix).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-11-18 09:48:14 +01:00
Kenneth Graunke
14af96007f i965: Delete fs_visitor::nir_setup_single_output_varying prototype.
I deleted this function in 59864e8e02.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-18 00:29:11 -08:00
Tapani Pälli
ec4e71f75e mesa: fix empty program log length
In case we have empty log (""), we should return 0. This fixes
Khronos WebGL conformance test 'program-infolog'.

From OpenGL ES 3.1 (and OpenGL 4.5 Core) spec:
   "If pname is INFO_LOG_LENGTH , the length of the info log, including
    a null terminator, is returned. If there is no info log, zero is
    returned."

v2: apply same fix for get_shaderiv and _mesa_GetProgramPipelineiv (Ian)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v1)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97321
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-18 07:42:41 +02:00
Roland Scheidegger
5ec3a7333f draw: finally optimize bool clip mask generation
lp_build_any_true_range is just what we need, though it will only produce
optimal code with sse41 (ptest + set) - but even without it on 64bit x86
the code is still better (1 unpack, 2 movq + or + set), on 32bit x86 it's
going to be roughly the same as before.
While here also make it a "real" 8bit boolean - cuts one instruction but
more importantly similar to ordinary booleans.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-11-18 01:25:21 +01:00
Roland Scheidegger
b16f06fd05 draw: use vectorized calculations for fetch (v2)
Instead of doing all the math with scalars, use vectors. This means the
overflow math needs to be done manually, albeit that's only really
problematic for the stride/index mul, the rest has been pretty much
moved outside the shader loop (albeit the mul could actually be optimized
away too), where things are still scalar.
To eliminate control flow in the main shader loop fetch, provide fake
buffers (so index 0 is always valid to fetch).
Still uses aos fetch though in the end - mostly because some more code
would be needed to handle unaligned fetches in that path, and because for
most formats it won't make a difference anyway (we generate some truly
horrendous code for things like R16G16_something for instance).

Instanced fetch however stays roughly the same as before, except that
no longer the same element is fetched multiple times (I've seen a reduction
of ~3 times in main shader loop size due to llvm not recognizing it's all
the same fetch, since it would have been possible some of the fetches
getting replaced with zeros in case vector size exceeds remaining fetch
count - the values of such fetches don't matter at all though).

Also, for elts gathering, use vectorized code as well.

The generated shaders are smaller and faster to compile (not entirely sure
about execution speed, but generally unless there's just single vertices
to handle I would expect it to be faster - there's more opportunities
for future improvements by using soa fetch).

v3: skip the fake index buffer, not needed due to the jit code never seeing
the real index buffer in the first place.
Fix a bug with mask expansion (needs SExt, not ZExt).
Also, be really really careful to keep the behavior the same, even in cases
where it looks wrong, and add comments why the code is doing the seemingly
wrong stuff... Fortunately it's not actually more complex in the end...
Also change function order slightly just to make the diff more readable.

No piglit change. Passes some internal testing with another api too...

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-11-18 01:25:21 +01:00
Jordan Justen
0cee3fd5c7 i965/gen7: Minify blit size for stencil tree copy
Found by the piglit 'fbo-depth-array stencil-clear' test when
implementing blorp blit splitting for gen7.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-17 14:15:44 -08:00
Kenneth Graunke
9bfee7047b mesa: Drop PATH_MAX usage.
GNU/Hurd does not define PATH_MAX since it doesn't have such arbitrary
limitation, so this failed to compile.  Apparently glibc does not
enforce PATH_MAX restrictions anyway, so it's kind of a hoax:

https://www.gnu.org/software/libc/manual/html_node/Limits-for-Files.html

MSVC uses a different name (_MAX_PATH) as well, which is annoying.

We don't really need it.  We can simply asprintf() the filenames.
If the filename exceeds an OS path limit, presumably fopen() will
fail, and we already check that.  (We actually use ralloc_asprintf
because Mesa provides that everywhere, and it doesn't look like we've
provided an implementation of GNU's asprintf() for all platforms.)

Fixes the build on GNU/Hurd.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98632
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-17 14:14:37 -08:00
Kenneth Graunke
ca76e6b521 i965: Fix compute shader crash.
Fixes crashes when starting Deus Ex: Mankind Divided.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-11-17 14:14:06 -08:00
Jason Ekstrand
3da7adc755 anv/TODO: Check off render buffer compression
There's still a tiny bit of work to do for storage images but it's
otherwise pretty much done at this point.
2016-11-17 12:03:24 -08:00
Jason Ekstrand
4e91f158e6 anv: Enable "permanent" compression for immutable format images
This commit extends our support of color compression to surfaces without
the VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT set.  These images will never have
an image view created with a different format then the one set at image
creation time so it's safe to always use compression.  We still bail if the
image is used as a storage image because that sometimes ends up using a
different format.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-17 12:03:24 -08:00
Jason Ekstrand
2b5644e94d intel/blorp: Properly handle color compression in blorp_copy
Previously, blorp copy operations were CCS-unaware so you had to perform
resolves on the source and destination before performing the copy.  This
commit makes blorp_copy capable of handling CCS-compressed images without
any resolves.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-17 12:03:24 -08:00
Jason Ekstrand
89f9c46a74 intel/blorp: Always use UINT formats on SKL+
Many of these UINT formats aren't available prior to Sky Lake so we used
UNORM formats.  Using UINT formats is a bit nicer because it guarantees we
don't run into rounding issues.  Also, we will need it in the next commit
for handling copies with CCS enabled.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-17 12:03:24 -08:00
Jason Ekstrand
c8357b5d34 i965/blorp: Rework resolve handling
This commit moves the handling of resolves into blorp_surf_for_miptree().
Instead of each helper doing resolves and checks itself, it simply tells
blorp_surf_for_miptree which aux modes are supported by the given blorp
operation and blorp_surf_for_miptree will resolve as-needed.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-17 12:03:24 -08:00
Jason Ekstrand
edb7f67bd9 anv/image: Add an aux_usage field for "default" aux
Initially, the field is set to ISL_AUX_USAGE_NONE so this commit shouldn't
bring any functional changes.  Setting this field to something else will
cause all sampled and storage image views to be created with AUX and blorp
will start trying to respect it so set with care.
2016-11-17 12:03:24 -08:00
Jason Ekstrand
338cdc172a anv: Add initial support for Sky Lake color compression
This commit adds basic support for color compression.  For the moment,
color compression is only enabled within a render pass and a full resolve
is done before the render pass finishes.  All texturing operations still
happen with CCS disabled.
2016-11-17 12:03:24 -08:00
Jason Ekstrand
e2f5880839 anv/pass: Precompute some subpass usage information 2016-11-17 12:03:24 -08:00
Jason Ekstrand
9b9fb6d212 util/vk_alloc: Add a vk_zalloc2 helper
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-17 12:03:24 -08:00
Jason Ekstrand
a512565b2b anv/image: Memset all aux surfaces (not just HiZ) to 0
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-17 12:03:24 -08:00
Jason Ekstrand
c3eb58664e anv/image: Rename hiz_surface to aux_surface 2016-11-17 12:03:24 -08:00
Jason Ekstrand
ccdf9af392 anv/blorp: Ignore clears for attachments first used as resolve destinations
Otherwise, we'll try to clear it the first time it's used as a draw so if
you do some multisampled rendering, resolve to an attachment, and then draw
on top of the single-sampled attachment, we might accidentally clear it.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-17 12:03:24 -08:00
Jason Ekstrand
1ba2f05bc0 intel/blorp: Take a fast_clear_op in ccs_resolve
Eventually, we may want to just have a single blorp_ccs_op function that
does both clears and resolves.  For now we'll stick to just making the
ccs_resolve function we have now a bit more configurable.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-17 12:03:24 -08:00
Pohjolainen, Topi
7c560e8ccc intel/blorp: Add plumbing for color resolve slice details
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-17 12:03:24 -08:00
Jason Ekstrand
d7bd8c15d6 intel/isl: Allow non-2D CCS surfaces
The CCS calculations in ISL are already correct for 1-D and 3-D CCS
surfaces since they have exactly the same layout as 2-D array surfaces (at
least on Sky Lake).  The only problem was that we weren't passing in the
right dimensionality and we weren't passing in the depth.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-17 12:03:24 -08:00
Jason Ekstrand
26c8bb7bc0 intel/isl: Rework the asserts and fails in isl_surf_get_ccs
There are some invariants such as number of samples on which we should
assert.  However, most other things should silently return false since
they're much easier for isl_surf_get_ccs to check than the caller.  We also
update the checking to be a bit more complete.
2016-11-17 12:03:24 -08:00
Jason Ekstrand
818c7bfb31 anv/cmd_buffer: Refactor surface state relocation handling
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-17 12:03:24 -08:00
Jason Ekstrand
9be9f5f1c7 anv/cmd_buffer: Pull add_surface_state_reloc into genX_cmd_buffer.c
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-17 12:03:24 -08:00
Jason Ekstrand
0c403df310 anv/image: Stop force-disabling AUX
Auxiliary surfaces have to be created manually anyway so force-disabling it
does nothing whatsoever at the moment.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-17 12:03:24 -08:00
Tom Stellard
929fcee47e mesa: Add missing call to _mesa_unlock_debug_state(ctx); v2
cd724208d3 added a call to
_mesa_lock_debug_state(ctx) but wasn't unlocking the debug state.

This fixes a hang in glsl-fs-loop piglit test with MESA_DEBUG=context.

v2:
  - Remove unrelated changes.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-17 18:32:35 +00:00
Eric Engestrom
9702f91366 egl: fix helper function name
I introduced this code last month, but didn't follow the naming
convention. Fix this.

Fixes: 0a606a400f ("egl: add eglSwapBuffersWithDamageKHR")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
2016-11-17 09:33:25 +02:00
Eric Engestrom
8b780a543a egl/x11: misc style fixes
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-11-17 09:32:48 +02:00
Eric Engestrom
41b5d98b28 egl: fix function name in debug string
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-11-17 09:32:11 +02:00
Jason Ekstrand
9557147592 nir/spirv: Fix handling of gl_PrimitiveId
Before, we were always treating it as an output which bogus.  The only
stage in which this it can be an output is the geometry stage.  In all
other stages, it's an input which, in the back-end, we actually want to be
a system value.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-11-16 20:07:23 -08:00
Jason Ekstrand
1c97432ce8 anv/fence: Handle ANV_FENCE_CREATE_SIGNALED_BIT
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-11-16 20:07:23 -08:00
Jason Ekstrand
49f08ad77f anv: Handle null in all destructors
This fixes a bunch of new CTS tests which look for exactly this.  Even in
the cases where we just call vk_free to free a CPU data structure, we still
handle NULL explicitly.  This way we're less likely to forget to handle
NULL later should we actually do something less trivial.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-11-16 20:07:23 -08:00
Jason Ekstrand
d0646c8015 util/vk_alloc: Ensure NULL is handled correctly in vk_free
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-11-16 20:07:23 -08:00
Jason Ekstrand
18266247a0 anv/device: Silence a 32-bit warning 2016-11-16 20:07:20 -08:00
Eric Anholt
80786a67cf nir: Avoid an extra NIR op in integer divide lowering.
NIR bools are ~0 for true, so ((unsigned)a >> 31) != 0 -> ((int)a >> 31).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-16 19:45:01 -08:00
Eric Anholt
7f27ad5597 vc4: Try compiling our FSes in multithreaded mode on new kernels.
Multithreaded fragment shaders let us hide texturing latency by a
hyperthreading-style switch to another fragment shader.  This gets us up
to 20% framerate improvements on glmark2 tests.
2016-11-16 19:45:01 -08:00
Eric Anholt
45c022f2b0 vc4: Add support for ETC1 textures if the kernel is new enough.
The kernel changes for exposing the param have now been merged, so we can
expose it here.
2016-11-16 19:45:01 -08:00
Eric Anholt
7130260d12 vc4: Fix simulator mode missing-GETPARAM debug info.
The value is 0 since we didn't set it, we wanted to see the param.
2016-11-16 19:45:01 -08:00
Mun Gwan-gyeong
20c1623a11 vc4: Fix resource leak in register allocation failure path.
CID 1394322

Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
2016-11-16 19:45:01 -08:00
Timothy Arceri
686dad657f glsl: stub out _mesa_reference_program() in standalone compiler
The follow patch will call this directly from the linker, the
shader cache will also start calling these from the compiler.
2016-11-17 12:53:12 +11:00
Timothy Arceri
c3df65c123 st/mesa/r200/i915/i965: move ARB program fields into a union
It's common for games to compile 2000 programs or more so at

32bits x 2000 programs x 22 fields x 2 (at least) stages

This should give us something like 352 kilobytes in savings
once we add some more glsl only fields.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-17 12:53:12 +11:00
Timothy Arceri
d6bdb3a862 st/mesa: stop initialing Instructions and NumInstructions
Since gl_program is now created with rzalloc() they should
already be initialised.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-17 12:52:59 +11:00
Timothy Arceri
0ad69e6b51 mesa: make use of ralloc when creating ARB asm gl_program fields
This will allow us to move the ARB asm fields in gl_program into
a union as we will be able call ralloc_free() on the entire struct
when destroying the context.

In this change we switch over to using ralloc for the Instructions,
String and LocalParams fields of gl_program.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-17 12:52:24 +11:00
Timothy Arceri
9c9589f1e2 mesa: remove unused Comment field in prog_instruction
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-17 12:52:24 +11:00
Timothy Arceri
67b9c26342 i965: get num_abos from shader_info rather than gl_linked_shader
This is a step towards freeing gl_linked_shader after linking.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-17 12:52:24 +11:00
Timothy Arceri
5581f2a8f2 mesa/glsl: copy num_abos to gl_program
We should be able to free gl_linked_shader after linking in order to
do so we need to switch to getting values from gl_program instead.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-17 12:52:24 +11:00
Timothy Arceri
ba40c8b03c i965: get num_images from shader_info rather than gl_linked_shader
This is a step towards freeing gl_linked_shader after linking.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-17 12:52:24 +11:00
Timothy Arceri
9c2042f2ce mesa/glsl: copy num_images to gl_program
We should be able to free gl_linked_shader after linking in order to
do so we need to switch to getting values from gl_program instead.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-17 12:52:24 +11:00
Timothy Arceri
6b82e957be nir: add support for counting AoA uniforms in nir_shader_gather_info()
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-17 12:52:24 +11:00
Timothy Arceri
c3b8bf9bc9 i965: only try print GLSL IR once when using INTEL_DEBUG to dump ir
Since we started releasing GLSL IR after linking the only time we can
print GLSL IR is during linking. When regenerating variants only NIR
will be available.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-17 12:52:24 +11:00
Jason Ekstrand
2e2160969e anv/descriptor_set: Put the whole state in the state free list
We're not really saving much by just putting the offset in there.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-11-16 17:07:35 -08:00
Jason Ekstrand
37537b7d86 anv/descriptor_set: Write the state offset in the surface state free list.
When Kristian reworked descriptor set allocation, somehow he forgot to
actually store the offset in the free list.  Somehow, this completely
missed CTS testing until now... This fixes all 2744 of the new
'dEQP-VK.texture.filtering.* tests in the latest CTS.

Cc: "12.0 13.0" <mesa-dev@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-11-16 17:07:29 -08:00
Timothy Arceri
8af1b2a2ce compiler: remove now unused copy_shader_info() declaration
Left over from 4ac66861

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-17 11:02:25 +11:00
Timothy Arceri
29ade71af9 compiler: include shader_enums.h in shader_info.h
We make use of some enums here.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-17 11:02:19 +11:00
Tim Rowley
a456ea17fb swr: [rasterizer core] fix clear with multiple color attachments
Fixes fbo-mrt-alphatest

v2: styling fixes

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-16 14:21:04 -06:00
Ben Widawsky
0272f76741 Partial revert "i965: "Fix" aux offsets"
This partially reverts commit 0d241085f7.

HiZ buffer cannot do this properly now, and it's not required, so remove
it.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-16 11:40:12 -08:00
Ben Widawsky
0d241085f7 i965: "Fix" aux offsets
When 1 BO is used for aux data, it needs to point to the correct offset,
which will not be the BOs offset but instead an offset from the BOs
offset. Since today there are always multiple BOs for aux, this doesn't
actually change anything.

Cc: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2016-11-16 11:24:33 -08:00
Jason Ekstrand
e50bf059b0 anv/blorp: Handle VK_ATTACHMENT_UNUSED in CmdClearAttachments
From the Vulkan 1.0.29 spec for vkCmdClearAttachments:

   "If the subpass’s depth/stencil attachment is VK_ATTACHMENT_UNUSED,
   then the clear has no effect."

and

   "If colorAttachment is VK_ATTACHMENT_UNUSED then the clear has no
   effect."

I have no idea why it's spec'd this way; it seems very anti-Vulkan to me,
but that's what it says and it's really not much work to support.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-11-16 10:32:20 -08:00
Jason Ekstrand
633677194f Allocate a null state whenever there is depth/stencil
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-16 10:32:20 -08:00
Jason Ekstrand
a380f95461 anv: Set framebuffer to NULL in secondary command buffers
Nothing that is allowed to be called within a secondary now relies on the
framebuffer.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-11-16 10:32:15 -08:00
Jason Ekstrand
9fcaf4e37a anv/blorp: Use the new clear_attachments entrypoint for attachment clears
This allows us to re-use the surface states emitted from the Vulkan driver
instead of blorp creating its own.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-16 10:11:29 -08:00
Jason Ekstrand
e371850d94 anv/blorp: Break the guts of alloc_binding_table into a shared helper
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-16 10:11:29 -08:00
Jason Ekstrand
3c1ee052bd anv: Bring back anv_cmd_buffer_emit_state_base_address
This reverts most of commit 52904ba85c.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-16 10:11:29 -08:00
Jason Ekstrand
72878f9f53 intel/blorp: Add a clear_attachments entrypoint
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-16 10:11:29 -08:00
Jason Ekstrand
0aea29cc1c intel/blorp: Add capability to use pre-baked binding tables
When a pre-baked binding table is requested, no binding table is created,
instead the binding table offset (relative to surface state base address)
provided by the user is used verbatim.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-16 10:11:29 -08:00
Jason Ekstrand
f7f768d195 intel/blorp: Add support for vertex shaders
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-16 10:11:29 -08:00
Jason Ekstrand
768c8dd718 intel/blorp: Use an actual chunk of vertex buffer for the VUE header
We're about to start passing other things in as a sort of "VS header" for
vertex shaders and we need a place to put them.  Since we want the instance
id to be one of them, it makes sense to have one vec4 that's either VUE
header or VS header.  Always uploading some handy zeros makes the code a
bit simpler.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-16 10:11:29 -08:00
Jason Ekstrand
8c8095c260 blorp/exec: Use uint32_t for copying varying data
Some things may not be floats and intel CPUs are known for mangling bits
when a float type is used for copying integers.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-16 10:11:29 -08:00
Jason Ekstrand
21943c35f7 intel/blorp: Handle NIR clear inputs the same way as blit inputs
By using offsetof() we can ensure that adding fiels to wm_inputs is always
safe as long as we maintain alignment.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-16 10:11:29 -08:00
Jason Ekstrand
570a0e844b intel/blorp: Remove NIR support for uniforms
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-16 10:11:29 -08:00
Jason Ekstrand
99b436ae5c intel/blorp: Add a shader type to make keys more unique
Depending on how the driver using blorp implements its shader caching,
there is a small chance of shader collisions due to identical keys between
blit and clear programs.  Adding a small shader type at the top of the key
alleviates this problem.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-16 10:11:29 -08:00
Jason Ekstrand
1acebeb191 intel/blorp: Make the number of samples an explicit parameter
Previously, we always inferred it from params->dst which meant that
references to params->dst were scattered all throughout the state upload
code.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-16 10:11:29 -08:00
Jason Ekstrand
6614234fc9 anv/cmd_buffer: Stop relying on the framebuffer for 3DSTATE_SF on gen7
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-16 10:11:07 -08:00
Jason Ekstrand
d2b4a9da03 anv: Rework the way render target surfaces are allocated
This commit moves the allocation and filling out of surface state from
CreateImageView time to BeginRenderPass time.  Instead of allocating the
render target surface state as part of the image view, we allocate it in
the command buffer state at the same time that we set up clears.  For
secondary command buffers, we allocate memory for the surface states in
BeginCommandBuffer but don't fill them out; instead, we use our new
SOL-based memcpy function to copy the surface states from the primary
command buffer.  This allows us to handle secondary command buffers without
the user specifying the framebuffer ahead-of-time.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-16 10:11:07 -08:00
Jason Ekstrand
e283cd549c anv/cmd_buffer: Expose add_surface_state_reloc as an inline helper
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-16 10:11:07 -08:00
Jason Ekstrand
858b75563f anv/cmd_buffer: Use the surface state alloc helper in null_surface_state
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-16 10:11:07 -08:00
Jason Ekstrand
3d9747780b anv: Add a helper for doing buffer copies with nothing but VF and SOL.
This method of doing copies has the advantage of touching very little of
the GPU state.  While it does disable all the shader stages, it doesn't
have to blow away binding tables, viewports, scissors, or any other bits of
dynamic state other than VBO 32 which is already reserved.  All of the
state that it does touch is contained within a pipeline anyway so that's
the only thing that has to be dirtied.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-16 10:11:07 -08:00
Jason Ekstrand
184bbfd69b intel/genxml: Add SO_WRITE_OFFSET registers for gen7-9
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-16 10:10:26 -08:00
Jason Ekstrand
b3bc806855 intel/isl: Add some basic info about RENDER_SURFACE_STATE to isl_device
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-16 10:10:26 -08:00
Jason Ekstrand
ba349e106e anv/pipeline: Use get_scratch_space/address for compute shaders
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-16 10:09:18 -08:00
Jason Ekstrand
d33e2ad67c anv: Move INTERFACE_DESCRIPTOR_DATA setup to the pipeline
There are a few dynamic bits, namely binding table and sampler addresses,
but most of it is static and really belongs in the pipeline.  It certainly
doesn't belong in flush_compute_descriptor_set.  We'll use the same state
merging trick we use for gen7 DEPTH_STENCIL.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-16 10:09:16 -08:00
Jason Ekstrand
8db6f2e6eb anv/pipeline: Roll genX_pipeline_util.h into genX_pipeline.c
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-16 10:09:14 -08:00
Jason Ekstrand
68c58edcfa anv/pipeline: Unify graphics_pipeline_create
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-16 10:09:12 -08:00
Jason Ekstrand
9359835fcb anv/pipline: Re-order state emission and make it consistent
This commit makes both gen7 and gen8 pipeline setup emit state packets
in exactly the same order.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2016-11-16 10:09:10 -08:00
Jason Ekstrand
5706d2590f anv/pipeline: Rework the 3DSTATE_VF_TOPOLOGY helper
It gets a new name and moved to genX_pipeline_util.h.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2016-11-16 10:09:08 -08:00
Jason Ekstrand
3f480d5dd3 anv/pipeline: Move 3DSTATE_PS_EXTRA setup into a helper
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2016-11-16 10:09:07 -08:00
Jason Ekstrand
8be164d05a anv/pipeline: Unify 3DSTATE_WM emission
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-16 10:09:05 -08:00
Jason Ekstrand
1587ac1edc intel/genxml: Make 3DSTATE_WM more consistent across gens
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-16 10:09:03 -08:00
Jason Ekstrand
23ad998246 anv/pipeline: Unify 3DSTATE_PS emission
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-16 10:09:01 -08:00
Jason Ekstrand
f989d04f39 anv/pipeline/gen7: Properly set 3DSTATE_PS::DualSourceBlendEnable
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-16 10:08:59 -08:00
Jason Ekstrand
fb02d2d13b intel/genxml: Make some 3DSTATE_PS fields more consistent
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2016-11-16 10:08:58 -08:00
Jason Ekstrand
5a10ab8a15 anv/pipeline: Make emit_3dstate_sbe safe to call without a FS
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-16 10:08:56 -08:00
Jason Ekstrand
7fe6655aad anv/pipeline: Unify 3DSTATE_GS emission
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2016-11-16 10:08:54 -08:00
Jason Ekstrand
f3783f1249 anv/pipeline/gen8: Set 3DSTATE_GS::InstanceControl
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2016-11-16 10:08:53 -08:00
Jason Ekstrand
9da442b63a intel/genxml: Make some 3DSTATE_GS fields more consistent
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2016-11-16 10:08:51 -08:00
Jason Ekstrand
4a48d19d93 anv/pipeline: Unify 3DSTATE_VS emission
With this commit, a few fields are now specified on gen7 which weren't
before.  However, the values specified are zero which is the default so the
final hardware packet remains the same.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2016-11-16 10:08:48 -08:00
Jason Ekstrand
c3e908e9d3 anv/pipeline/gen8: Enable VS statistics
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-16 10:08:44 -08:00
Jason Ekstrand
23d1919fe3 anv/pipeline: Stop claiming to support running without a vertex shader
From the Vulkan spec version 1.0.32 docs for vkCreateGraphicsPipelines:

    The stage member of one element of pStages must be
    VK_SHADER_STAGE_VERTEX_BIT

Since a vertex shader is always required, this hasn't been used since we
deleted meta.  Let's get rid of the complexity.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2016-11-16 10:08:42 -08:00
Jason Ekstrand
bda247d3fd intel/genxml: Make some VS/GS fields consistent across gens
We use the names from gen8+

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2016-11-16 10:08:40 -08:00
Jason Ekstrand
623e1e06d8 anv/pipeline: Get rid of the kernel pointer fields
Now that we have anv_shader_bin, they're completely redundant with other
information we have in the pipeline.  For vertex shaders, we also go
through way too much work to put the offset in one or the other field and
then look at which one we put it in later.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2016-11-16 10:08:38 -08:00
Jason Ekstrand
0087064f26 anv/pipeline: Set correct binding table and sampler counts
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2016-11-16 10:08:36 -08:00
Brian Paul
cd724208d3 mesa: if MESA_DEBUG=context, create a debug context
A number of drivers report useful debug/perf information accessible
through GL_ARB_debug_output and with debug contexts (i.e. setting the
GLX_CONTEXT_DEBUG_BIT_ARB flag).  But few applications actually use
the GL_ARB_debug_output extension.

This change lets one set the MESA_DEBUG env var to "context" to force-set
a debug context and report debug/perf messages to stderr (or whatever
file MESA_LOG_FILE is set to).  This is a useful debugging tool.

The small change in st_api_create_context() is needed so that
st_update_debug_callback() gets called to hook up the driver debug
callbacks when ST_CONTEXT_FLAG_DEBUG was not set, but MESA_DEBUG=context.

v2: use %.*s format string instead of allocating temporary buffer.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-16 09:34:10 -07:00
Nicolai Hähnle
fb17b7f99d u_simple_shaders: try to un-break the Windows build
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-11-16 13:25:35 +01:00
Nicolai Hähnle
6403a9e074 radeonsi: fix a subtle bounds checking corner case with 3-component attributes
I'm also sending out a piglit test, gl-2.0/vertexattribpointer-size-3,
which exposes this corner case.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-11-16 10:31:42 +01:00
Nicolai Hähnle
50c95d0c54 radeonsi: reject some 3-component formats as buffer textures
Fixes parts of GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-11-16 10:31:39 +01:00
Nicolai Hähnle
78314c57cb st/mesa: swap bytes in the fallback format translation path of GetTexImage
Fixes parts of GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pixelstore.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-11-16 10:31:36 +01:00
Nicolai Hähnle
22360406f7 st/mesa: simplify and fix st_GetTexSubImage
By using _mesa_image_address, the code becomes simpler _and_ fixes the bug
that GL_PACK_SKIP_IMAGES was applied even on non-3D textures.

Also, converting a whole slice at a time simplifies the format translation
fallback path.

Fixes parts of GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pixelstore.

v2: fix a silly mistake during code movement

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-11-16 10:31:34 +01:00
Nicolai Hähnle
7cdf292dc3 st/mesa: fix SINT <-> UINT conversion during PBO upload / download
This fixes use cases like glReadPixels from an RGBA8I framebuffer into
a PBO with type GL_INT by clamping values appropriately when they fall
outside the range of the destination format.

Fixes parts of GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-11-16 10:31:31 +01:00
Nicolai Hähnle
5e10a3d6e5 st/mesa: change st_pbo_create_upload_fs to st_pbo_get_upload_fs
For consistency with st_pbo_get_download_fs.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-11-16 10:31:28 +01:00
Nicolai Hähnle
2fb4b5bdf6 st/mesa: fix ReadPixels into packed formats with PBO
When using the GPU download path, we bind the PBO as a buffer texture,
so call is_format_supported accordingly. On radeonsi, this means that
GPU downloads aren't used for UNSIGNED_SHORT_5_6_5 destinations, for
example.

Fixes parts of GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-11-16 10:31:25 +01:00
Nicolai Hähnle
3817a7a1d7 util/blitter: add clamping during SINT <-> UINT blits
Even though glBlitFramebuffer cannot be used for SINT <-> UINT blits, we
still need to handle this type of blit here because it can happen as part
of texture uploads / downloads, e.g. uploading a GL_RGBA8I texture from
GL_UNSIGNED_INT data.

Fixes parts of GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-11-16 10:31:21 +01:00
Nicolai Hähnle
ab5fd10eaa util/blitter: index texfetch_col shaders by type
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-11-16 10:31:07 +01:00
Lionel Landwerlin
25a8e8bbd5 i965: miptree: prevent potential NULL pointer access
If the mcs buffer allocation fails we might get a NULL pointer. This
was reported by Coverity and should only happen if we run out of
memory.

v2: return failure at the point of allocation (Chris)

CID: 1394290
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2016-11-16 08:56:08 +00:00
Jordan Justen
615ccf44cf intel/blorp: Use designated initializers in surf_convert_to_single_slice
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-15 22:51:19 -08:00
Liu Zhiquan
b663753f3b EGL/android: pbuffer implementation
Android path didn't support pbuffer, so add pbuffer support to fix
most failing dEQP and CTS pbuffer test cases.

Patch adds a single buffer config to support pbuffer, and creates
image in getBuffers for pbuffer when surface type is front surface.

The EGL 1.5 spec states that pbuffers have a back buffer but no front
buffer, single-buffered surfaces with no front buffer confuse Mesa;
so we deviate from the spec, following the precedent of Mesa's EGL
X11 platform.

V3: update commit message and code review changes.

Signed-off-by: Liu Zhiquan <zhiquan.liu@intel.com>
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-16 08:15:59 +02:00
Eric Engestrom
25c60fa6a2 egl: add missing error-checking to eglReleaseTexImage()
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-11-16 08:02:16 +02:00
Ben Widawsky
19a01f8139 i965: Fix KBL typo in string
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-15 17:34:37 -08:00
Ben Widawsky
37370f6bfc i965: Consolidate GEN9 LP definition
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-15 17:34:37 -08:00
Ben Widawsky
2193fb0e1f i965/glk: Add basic Geminilake support
v2: s/bdw/gen; Add the 2x6 config
v3: Add min_ds_entries

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-15 17:34:37 -08:00
Vinson Lee
ed6694d511 util: Fix Clang trivial destructor check.
Check for Clang before GCC.

Clang defines __GNUC__ == 4 and __GNUC_MINOR__ == 2 and matches the GCC
check but not the GCC version for trivial destructor.

Fixes: 98ab905af0 ("mesa: Define introspection macro to determine
whether a type is trivially destructible.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98526
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-11-15 17:35:56 -08:00
Ilia Mirkin
dafffd2f11 swr: mark color clamping as unsupported
There is no functionality in swr to clamp either vertex or frag colors.
This could be added in swr_shader, at which point these could be
re-enabled.

Fixes arb_color_buffer_float-render

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-15 20:26:32 -05:00
Ilia Mirkin
2b6b15ab3f swr: always enable adding start/base vertex to gl_VertexId
Fixes gl-3.2-basevertex-vertexid

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-15 20:26:29 -05:00
Ilia Mirkin
6364491a0b swr: add support for upper-left fragcoord position
Fixes glsl-arb-fragment-coord-conventions.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-15 20:26:11 -05:00
Ilia Mirkin
a2c1d58ddb swr: make sure that all rendering is finished on shader destroy
Rendering could still be ongoing (or have yet to start) when the shader
is deleted. There's no refcounting on the shader text, so insert a
pipeline stall unconditionally when this happens.

[Note, we should instead introduce a way to attach work to
fences, so that the freeing can be done in the current fence.]

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-15 20:25:48 -05:00
Ilia Mirkin
7caed50ff4 swr: disable blending for integer formats
The EXT_texture_integer test says that blending and alphatest should
all be disabled. st/mesa takes care of alphatest already.

Fixes the ext_texture_integer-fbo-blending piglit test.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-15 20:25:43 -05:00
Ilia Mirkin
2f19a974a5 swr: mark rgb9_e5 as unrenderable
The support in swr requires shaders to output the components as UINTs.
This is not how GL or Gallium work, and since this is not a
required-renderable format, just leave it out.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-15 20:25:35 -05:00
Ilia Mirkin
6fd398f48e swr: no support for shader stencil export
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-15 20:25:28 -05:00
Ilia Mirkin
96291478ea swr: mark both frag and vert textures read, don't forget about cbs
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-15 20:25:22 -05:00
Ilia Mirkin
8c0f76e961 swr: fix texture layout for compressed formats
Fixes the texsubimage piglit and lets the copyteximage one get further.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-15 20:25:15 -05:00
Ilia Mirkin
00efbbc38c swr: add archrast generated files to gitignore
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-15 20:25:08 -05:00
Ilia Mirkin
b53a33feef swr: [rasterizer jitter] don't bother quantizing unused channels
In a BGR10X2 or BGR5X1 situation, there's no need to try to quantize the
X channel - the default will have the proper quantization required.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-15 20:24:50 -05:00
Ilia Mirkin
5dd0b8d3c6 swr: [rasterizer memory] fix store tile for 128-bit ymajor tiling
Noticed by inspection.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-15 20:24:41 -05:00
Ilia Mirkin
45d9cd36fe swr: [rasterizer memory] add support for R32_FLOAT_X8X24 formats
This is the format used for the primary surface of a
PIPE_FORMAT_Z32_FLOAT_S8X24_UINT resource.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-15 20:24:18 -05:00
Dave Airlie
713522fb8d ac/nir/llvm: fix channel in texture gather lowering code.
This fixes a number of CTS tests like:
dEQP-VK.glsl.texture_gather.basic.2d.rgba8ui.size_npot.clamp_to_edge_repeat

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-16 09:18:15 +10:00
Dave Airlie
38ab625c5f radv: don't crash on null swapchain destroy.
Just return if the passed in swapchain is NULL.

Fixes: dEQP-VK.wsi.xlib.swapchain.destroy.null_handle

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-16 09:18:03 +10:00
Dave Airlie
253fa25d09 wsi: fix VK_INCOMPLETE for vkGetSwapchainImagesKHR
This fixes the x11 and wayland backends to not assert:
dEQP-VK.wsi.xcb.swapchain.get_images.incomplete

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-16 09:17:34 +10:00
Jordan Justen
0ac57afa6f isl: Fix height calculation in isl_msaa_interleaved_scale_px_to_sa
No known fixed tests, but it looks like a typo from:

commit 8ac99eabb6

    intel/isl: Add a helper for getting the size of an interleaved pixel

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-15 14:27:27 -08:00
Emil Velikov
75a39cca8d amd: automake: android: rename sources lists to foo_FILES
Autotools goes smart on us warning that foo_SOURCES variable is present
yet a target with name foo is missing. Rename things (like we do
throughout the build) to silence the warnings.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-15 20:04:37 +00:00
Mauro Rossi
95ed2d9d2c amd: flatten amd/common makefile structure
This pulls amd/common build rules into upper level makefile,
along with amd/addlib which is already there.

v2: [Emil Velikov]
 - Move NEED_RADEON_LLVM conditional, drop amd/common from SUBDIRS
 - Drop AM_ from common_libamd_common_la*

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-15 20:04:37 +00:00
Marek Olšák
74e39de932 radeonsi: set IF_THRESHOLD to 3
Piglit regressions (radeonsi or LLVM bugs, they pass on softpipe):
- glsl-1.10/execution/variable-indexing/vs-output-array-vec3-index-wr
- glsl-1.10/execution/variable-indexing/vs-output-array-vec4-index-wr
- glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-col-row-wr
- glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-row-wr

Totals:
SGPRS: 1132185 -> 1168801 (3.23 %)
VGPRS: 907856 -> 906204 (-0.18 %)
Spilled SGPRs: 2011 -> 2425 (20.59 %)
Spilled VGPRs: 368 -> 96 (-73.91 %)
Scratch VGPRs: 1344 -> 1060 (-21.13 %) dwords per thread
Code Size: 35916164 -> 35705372 (-0.59 %) bytes
LDS: 767 -> 767 (0.00 %) blocks
Max Waves: 194010 -> 194921 (0.47 %)
Wait states: 0 -> 0 (0.00 %)

Before:
 VGPR SPILLING APPS   Shaders SpillVGPR ScratchVGPR
 alien_isolation         2938        38        40
 bioshock-infinite       1769       245       732
 dirt-showdown            548        85        72
 f1-2015                  776         0       320
 ue4_lightroom_inter..     74         0       180

After:
 VGPR SPILLING APPS   Shaders SpillVGPR ScratchVGPR
 alien_isolation         2938        38        40
 bioshock-infinite       1769         0       480
 dirt-showdown            548        58        40
 f1-2015                  776         0       320
 ue4_lightroom_inter..     74         0       180

Bioshock and DiRT benefit.

If I set IF_THRESHOLD=4, tesseract starts spilling VGPRs

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-15 20:23:40 +01:00
Marek Olšák
537b897f51 glsl_to_tgsi: lower small branches based on the CAP
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-15 20:23:40 +01:00
Marek Olšák
72217d4335 gallium: add PIPE_SHADER_CAP_LOWER_IF_THRESHOLD
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-15 20:23:40 +01:00
Marek Olšák
e33440070a glsl/lower_if: conditionally lower if-branches based on their size
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-15 20:23:39 +01:00
Marek Olšák
83d9b8a6f6 glsl/lower_if: don't lower branches touching tess control outputs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-15 20:23:35 +01:00
Marek Olšák
654e9466b5 glsl/lower_if: check more node types in check_control_flow -> check_ir_node
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-15 20:22:52 +01:00
Marek Olšák
68f35005ed glsl/lower_if: move and rename found_control_flow
I'll want to update more variables in check_control_flow, so using
the visitor is convenient.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-15 20:22:52 +01:00
Marek Olšák
a6ff2a3378 util/disk_cache: use unambiguous naming
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-15 20:22:28 +01:00
Marek Olšák
31727300e1 util: import cache.c/h from glsl
It's not dependent on GLSL and it can be useful for shader caches that don't
deal with GLSL.

v2: address review comments
v3: keep the other 3 lines in configure.ac

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-15 20:22:28 +01:00
Marek Olšák
5b8876609e gallivm: limit use of setFastMathFlags to LLVM 3.8 and later
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-15 20:22:28 +01:00
Kenneth Graunke
341fc0073a intel: Set min_ds_entries on Broxton.
This was missing.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
2016-11-15 10:45:47 -08:00
Christian Gmeiner
0c73a3b7d0 dri: make use of loader_get_extensions_name(..) helper
Changes since v1:
 - removed not needed includes
 - use the loader version of the helper

v2 [Emil Velikov]
 - Keep the includes - they are required.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-15 18:15:16 +00:00
Emil Velikov
fb10c89877 Revert "dri: make use of dri_get_extensions_name(..) helper"
This reverts commit 1a21d21580.

Pushed the wrong version of the patch.
2016-11-15 18:15:15 +00:00
Marek Olšák
358079da2d radeonsi: set unsafe fpmath on FP instructions when allowed by R600_DEBUG
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-15 19:17:56 +01:00
Marek Olšák
41d20d4920 gallivm: add lp_create_builder with an unsafe_fpmath option
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-15 19:17:56 +01:00
Marek Olšák
171e349782 radeonsi: fold some shader context initialization to si_llvm_context_init
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-15 19:17:56 +01:00
Christian Gmeiner
e4b01c97c4 loader: fixup driver names if needed
This makes it possible to 'use' the imx-drm driver. Remeber that it
is not possible to have sysmbol names in C/C++ with a '-' in it.

Changes since v1:
 - move the fix to loader.c

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-15 15:59:23 +00:00
Christian Gmeiner
1a21d21580 dri: make use of dri_get_extensions_name(..) helper
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-15 15:56:52 +00:00
Christian Gmeiner
0890aa6f7f loader: add loader_get_extensions_name(..) helper
Changes since v1:
 - renamed function to loader_get_extensions_name
 - moved function into loader

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>

V2: [Emil Velikov]
 - Use local define.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-15 15:55:33 +00:00
Gurchetan Singh
0639e253a5 egl: Use pkg-config for Android NDK build
It's possible to build Mesa for Android using the traditional
autotools workflow [1]. ChromiumOS fetches Android prebuilts and
puts them in a sysroot. We now want to use pkg-config to specify
the location of system headers and libraries [2].

To enable this, let's add the required pkg-config checks and link
against them.

[1] https://developer.android.com/ndk/guides/standalone_toolchain.html
[2] https://chromium-review.googlesource.com/#/c/403237/

v2: Bundle pkg-config checks together (Emil)
v3: Provide further context on standalone NDK Mesa build (Emil)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-15 15:49:20 +00:00
Gurchetan Singh
e23608db1c configure.ac: Don't look for pthreads in Android platform
In Android, the pthreads libs are in bionic.  When building
Mesa for Android with the autotools workflow, we shouldn't
set -lpthread or -pthread.

[Emil Velikov]
Other platforms could use a similar fix, although that is left as
separate exercise.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-15 15:48:02 +00:00
Eduardo Lima Mitev
e73513f3c8 meta/GetTexSubImage: Account for GL_PACK_SKIP_IMAGES on compressed textures
This option was being ignored when packing compressed 3D and cube textures.

Fixes CTS test (on gen8+):
* GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pixelstore

v2: Drop API checks.
v3 (Ken): Just apply the existing code in more cases.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-15 12:26:13 +01:00
Iago Toral Quiroga
277f868e66 anv/format: handle unsupported formats earlier
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-15 08:50:48 +01:00
Samuel Iglesias Gonsálvez
308b06d471 main: return error if asking for GL_TEXTURE_BORDER_COLOR in TEXTURE_2D_MULTISAMPLE{_ARRAY} through TexParameter{i,Ii,Iui}v()
OpenGL ES 3.2 says in section 8.10. "TEXTURE PARAMETERS", at the end of
the section:

"An INVALID_ENUM error is generated if target is TEXTURE_2D_-
MULTISAMPLE or TEXTURE_2D_MULTISAMPLE_ARRAY , and pname is any
sampler state from table 21.12."

GL_TEXTURE_BORDER_COLOR is present in that table.

v2:
- Add check to _mesa_texture_parameteriv() (Kenneth)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98250

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-15 07:44:12 +01:00
Lionel Landwerlin
a46bc3f70a anv: fix multi level clears with VK_REMAINING_MIP_LEVELS
A commit from the CTS suite on the 1.0-dev branch started using
VK_REMAINING_MIP_LEVELS, we're not dealing with it properly for clears.

Fixes:
   dEQP-VK.api.image_clearing.clear_color_image.*

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-14 17:40:58 +00:00
Andres Gomez
5923088d75 dir-locals.el: Adds White Space support
Trailing white spaces will be now always highlighted, not just in
prog-mode.

Also, the White Space package, which is available since GNU Emacs 22,
is loaded and activated locally in prog-mode.

Additionally, using White Space variables, we set highlighting through
faces on wrong indentation and the maximum length of a coding line.

Notice that:
 - The highlighting for the characters beyond the set length of a
   coding line is not activated by default, only for wrong
   indentations.
 - If the White Space package is not available, errors on loading or
   activation are ignored.
 - If the White Space mode is not activated the set variables would
   not have any effect.

v2: Removed too long lines trail highlighting, as suggested by Ilia
    Mirkin.

Signed-off-by: Andres Gomez <agomez@igalia.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-11-14 19:17:49 +02:00
Iago Toral Quiroga
9730f2734b anv/format: support VK_FORMAT_R8G8B8_SRGB
Fixes dEQP-VK.api.image_clearing.clear_color_image.1d_r8g8b8_srgb

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2016-11-14 17:13:42 +01:00
Iago Toral Quiroga
35deeda66f anv/format: handle unsupported formats properly
According to the spec for vkGetPhysicalDeviceImageFormatProperties:

"If format is not a supported image format, or if the combination of format,
 type, tiling, usage, and flags is not supported for images, then
 vkGetPhysicalDeviceImageFormatProperties returns VK_ERROR_FORMAT_NOT_SUPPORTED."

Makes the following Vulkan CTS tests report 'Not Supported' instead of crashing:

dEQP-VK.api.image_clearing.clear_color_image.1d_b8g8r8_unorm
dEQP-VK.api.image_clearing.clear_color_image.1d_b8g8r8_snorm
dEQP-VK.api.image_clearing.clear_color_image.1d_b8g8r8_uscaled
dEQP-VK.api.image_clearing.clear_color_image.1d_b8g8r8_sscaled
dEQP-VK.api.image_clearing.clear_color_image.1d_b8g8r8_uint
dEQP-VK.api.image_clearing.clear_color_image.1d_b8g8r8_sint
dEQP-VK.api.image_clearing.clear_color_image.1d_b8g8r8_srgb
dEQP-VK.api.image_clearing.clear_color_image.1d_b8g8r8a8_unorm
dEQP-VK.api.image_clearing.clear_color_image.1d_b8g8r8a8_snorm
dEQP-VK.api.image_clearing.clear_color_image.1d_b8g8r8a8_uscaled
dEQP-VK.api.image_clearing.clear_color_image.1d_b8g8r8a8_sscaled
dEQP-VK.api.image_clearing.clear_color_image.1d_b8g8r8a8_uint
dEQP-VK.api.image_clearing.clear_color_image.1d_b8g8r8a8_sint
dEQP-VK.api.image_clearing.clear_color_image.1d_b8g8r8a8_srgb
dEQP-VK.api.image_clearing.clear_color_image.1d_r4g4_unorm_pack8
dEQP-VK.api.image_clearing.clear_color_image.1d_r8_srgb
dEQP-VK.api.image_clearing.clear_color_image.1d_r8g8_srgb
dEQP-VK.api.image_clearing.clear_color_image.1d_r8g8b8_srgb
dEQP-VK.api.image_clearing.clear_color_image.1d_b5g5r5a1_unorm_pack16

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2016-11-14 17:13:42 +01:00
Vedran Miletić
8e430ff8b0 clover: adapt to new error API since LLVM r286752
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2016-11-14 15:50:29 +00:00
Tim Rowley
c8a51fa75d swr: [rasterizer core] remove driverType
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-14 09:03:10 -06:00
Tim Rowley
ddc898aaf3 swr: [rasterizer archrast] move to pass by value
Move to pass by value since most events are very small in size.

We can look at pass by reference but will need to create multiple
versions to handle temp objects.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-14 09:03:04 -06:00
Tim Rowley
23e459b606 swr: [rasterizer core] add mode for aux buffer in the SWR_SURFACE_STATE
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-14 09:02:59 -06:00
Tim Rowley
e9a3ad164d swr: [rasterizer common] don't bleed NOMINMAX definition after <windows.h>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-14 09:02:53 -06:00
Tim Rowley
cd8d840ce1 swr: [rasterizer archrast] add events
Added events for tracking early/late Depth and stencil events,
TE patch info, GS prim info, and FrontEnd/BackEnd DrawEnd events.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-14 09:02:48 -06:00
Tim Rowley
7c3ca2e704 swr: [rasterizer core] fix culling issues
- Do proper culling of wireframe triangles (including non-culling of
  degenerates)
- Fix degenerate culling of CCW front-facing triangles in wireframe and
  conservative rast

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-14 09:02:42 -06:00
Tim Rowley
cee66dd2aa swr: [rasterizer core/jitter] fix alpha test bug
Alpha from render target 0 should always be used for alpha test for all
render targets, according to GL and DX9 specs. Previously we were using
alpha from the current render target.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-14 09:02:36 -06:00
Tim Rowley
5912552947 swr: [rasterizer core] various code style changes
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-14 09:02:31 -06:00
Tim Rowley
584b65ad44 swr: [rasterizer archrast] don't generate empty files
Don't generate files when no events have been generated outside
the header events.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-14 09:02:25 -06:00
Tim Rowley
e6f7d8a094 swr: [rasterizer archrast] fix open file handle limit issue
Buffer events ourselves and then when that's full or we're destroying
the context then write the contents to file. Previously, we're relying
ofstream to buffer for us.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-14 09:02:17 -06:00
Tim Rowley
2c697754a9 swr: [rasterizer archrast] fix double free issue
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-14 09:02:11 -06:00
Tim Rowley
dc8408920c swr: [rasterizer core] separate frontend/backend stats enables
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-14 09:02:04 -06:00
Tim Rowley
937b7d8e5a swr: [rasterizer core] 16-wide tile store nearly completed
* All format combinations coded
* Fully emulated on AVX2 and AVX
* Known issue: the MSAA sample locations need to be adjusted for 8x2

Set ENABLE_AVX512_SIMD16 and USD_8x2_TILE_BACKEND to 1 in knobs.h to enable

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-14 09:00:59 -06:00
Emil Velikov
f233bcda89 docs: add news item and link release notes for 13.0.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-14 11:39:01 +00:00
Emil Velikov
0a2b7c16c4 docs: add sha256 checksums for 13.0.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit b47ce6ddb8)
2016-11-14 11:37:52 +00:00
Emil Velikov
eeedb52f75 docs: add release notes for 13.0.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit f2f487ebbb)
2016-11-14 11:37:51 +00:00
Juan A. Suarez Romero
7b9a9a0c5d i965/vec4: skip registers already marked as no_spill
Do not evaluate spill costs for registers that were already marked as
no_spill.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-14 10:09:30 +01:00
Kenneth Graunke
151aecabe4 glsl: Don't crash on function names with invalid identifiers.
Karol Herbst's fuzzing efforts noticed that we would segfault on:

   void bug() {
      2(0);
   }

We just need to bail if the function name isn't an identifier.

Based on a bug fix by Karol Herbst.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97422
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-12 22:08:15 -08:00
Kenneth Graunke
9c676a6427 glsl: Fix assert fails when assignment expressions are in array sizes.
Karol Herbst's fuzzing efforts discovered that we would hit the
following assert:

   assert(dummy_instructions.is_empty());

when processing an illegal array size expression of

   float[(1=1)?1:1] t;

In do_assignment, we realized we needed an rvalue for (1 = 1), and
generated a temporary variable and assignment from the RHS.  We've
already flagged an error (non-lvalue in assignment), and return a bogus
value as the rvalue.  But process_array_size sees the bogus value, which
happened to be a constant expression, and rightly assumes that
processing a constant expression shouldn't have generated any code.
instructions.

To handle this, make do_assignment not generate any temps or assignments
when it's already raised an error - just return an error value directly.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98694
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-12 22:08:15 -08:00
Jonas Pfeil
5debfeb86f vc4: Add simulator kernel validation for multithreaded fragment shaders.
This is Jonas Pfeil's code from the kernel, brought back to Mesa by
anholt.
2016-11-12 19:21:46 -08:00
Eric Anholt
96ffee2d02 vc4: Mark threaded FSes as non-singlethread in the CL. 2016-11-12 19:21:46 -08:00
Eric Anholt
ace0d810e5 vc4: Flag the last thread switch in the program as the last.
We don't allow the last thread switch to be inside control flow, to be
sure that we hit the last state exactly once.  If the last texturing was
in control flow, fall back to single threaded.
2016-11-12 19:21:46 -08:00
Eric Anholt
67f72c5f5d vc4: Add THRSW nodes after each tex sample setup in multithreaded mode.
This is a suboptimal implementation, but Jonas Pfeil found that it was
still a massive performance gain.
2016-11-12 19:21:46 -08:00
Eric Anholt
e3c620e868 vc4: Add some spec citations about texture fifo management. 2016-11-12 18:46:35 -08:00
Eric Anholt
fd2aff858b vc4: Use ra14/rb14 as the spilling registers.
This makes the raddr fixups compatible with FS threading.
2016-11-12 18:46:35 -08:00
Eric Anholt
755037173d vc4: Add support for register allocation for threaded shaders.
We have two major requirements: Make sure that only the bottom half of the
physical reg space is used, and make sure that none of our values are live
in an accumulator across a switch.
2016-11-12 18:46:35 -08:00
Eric Anholt
fdad4d2402 vc4: Split register class setup for physical files from accumulators. 2016-11-12 18:46:35 -08:00
Eric Anholt
8e704dca7f vc4: Use register allocator CLASS_BIT_R0_R3 to clean up CLASS_B.
We have had no reason to separate ability to store in an accumulator from
ability to store in B, but with FS threading, we need to be able to force
values to be stored only in the physical regfiles.
2016-11-12 18:46:35 -08:00
Eric Anholt
1ee503c74d vc4: Add support for QPU scheduling of thread switch instructions.
This is vaguely based off of Jonas Pfeil's thread switch support branch.
2016-11-12 18:46:35 -08:00
Eric Anholt
4f527f1260 vc4: Add a thread switch QIR instruction.
This will eventually be generated at the QIR level, so that
vc4_qir_schedule.c can arrange the separation of tex_strb from tex_result
correctly.  It will also be important so that register allocation set the
register classes appropriately for values that are live across the switch.
2016-11-12 18:46:35 -08:00
Eric Anholt
93cdae44de vc4: Add a bit of QPU validation for threaded shaders.
These are both bugs we've run into along the way writing multithreaded FS
support.
2016-11-12 18:46:35 -08:00
Eric Anholt
977d8b526b vc4: Fix register class handling of DDX/DDY arguments.
I had this exactly backwards, but apparently the piglit tests were all
landing in r0-r3 anyway.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-12 18:46:35 -08:00
Darren Salt
9b121512ac radv/pipeline: Don't dereference NULL dynamic state pointers
This is a port of commit a4a5917248:

   Add guards to prevent dereferencing NULL dynamic pipeline state. Asserts
   of pCreateInfo members are moved to the earliest points at which they
   should not be NULL.

This fixes a segfault, related to pColorBlendState, seen in Talos Principle
which I've observed after startup is completed and when exiting the menus,
depending on when Vulkan rendering is selected.

v2: moved the NULL check in radv_pipeline_init_blend_state to after the
declarations.
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-11-12 15:06:27 +01:00
Rob Clark
dfc001dccc freedreno/ir3: fixup ralloc fallout
Fixes fallout from acc23b04 ("ralloc: remove memset from ralloc_size").
We were still depending on zero'd allocations in a couple of places.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-12 08:57:03 -05:00
Steinar H. Gunderson
2e2562cabb Fix races during _mesa_HashWalk().
There is currently no protection against walking a hash (using
_mesa_HashWalk()) and modifying it at the same time, for instance by inserting
or deleting elements. This leads to segfaults in multithreaded code if e.g.
someone calls glTexImage2D (which may have to walk the list of FBOs) while
another thread is calling glDeleteFramebuffers on another thread with the two
contexts sharing lists.

The reason for this is that _mesa_HashWalk() doesn't actually take the mutex
that normally protects the hash; it takes an entirely different mutex.
Thus, walks are only protected against other walks, and there is also no
outer lock taking this. There is an old comment saying that this is to fix
problems with deadlock if the callback needs to take a mutex; we solve this
by changing the mutex to be recursive.

A demonstration Helgrind hit from a real application:

==13412== Possible data race during write of size 8 at 0x3498C6A8 by thread #1
==13412== Locks held: 2, at addresses 0x1AF09530 0x2B3DF400
==13412==    at 0x1F040C99: _mesa_hash_table_remove (hash_table.c:395)
==13412==    by 0x1EE98174: _mesa_HashRemove_unlocked (hash.c:350)
==13412==    by 0x1EE98174: _mesa_HashRemove (hash.c:365)
==13412==    by 0x1EE2372D: _mesa_DeleteFramebuffers (fbobject.c:2669)
==13412==    by 0x6105AA4: movit::ResourcePool::cleanup_unlinked_fbos(void*) (resource_pool.cpp:473)
==13412==    by 0x610615B: movit::ResourcePool::release_fbo(unsigned int) (resource_pool.cpp:442)
[...]
==13412== This conflicts with a previous read of size 8 by thread #20
==13412== Locks held: 2, at addresses 0x1AF09558 0x1AF73318
==13412==    at 0x1F040CD9: _mesa_hash_table_next_entry (hash_table.c:415)
==13412==    by 0x1EE982A8: _mesa_HashWalk (hash.c:426)
==13412==    by 0x1EED6DFD: _mesa_update_fbo_texture.part.33 (teximage.c:2683)
==13412==    by 0x1EED9410: _mesa_update_fbo_texture (teximage.c:3043)
==13412==    by 0x1EED9410: teximage (teximage.c:3073)
==13412==    by 0x1EEDA28F: _mesa_TexImage2D (teximage.c:3105)
==13412==    by 0x166A68: operator() (mixer.cpp:454)

There are many more interactions than just these two possible.

Cc: 11.2 12.0 13.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Steinar H. Gunderson <steinar+mesa@gunderson.no>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-12 12:42:07 +11:00
Kenneth Graunke
07566ad4b6 i965: Drop tabs in brw_state.h. 2016-11-11 17:12:46 -08:00
Daniel Scharrer
0b98e885e7 ac/nir/llvm: Fix setting function attributes for intrinsics
This fixes a NULL pointer dereference for intrinsics with more than
one function attribute introduced in commit 2fdaf38.
The fix is ported from the lp_build_intrinsic changes in commit 8bdd52c.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-11-11 22:40:32 +01:00
Kenneth Graunke
74d5d393df i965: Update a comment: s/brw_state_cache/brw_program_cache/g
Tim renamed this recently - stop referring to it by the old name.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-11 13:19:18 -08:00
Laurent Carlier
3ff9f8c532 clover: fix building since llvm r286566
pretty trivial fix
2016-11-11 19:45:22 +00:00
Emil Velikov
6ff948ece1 egl/wayland: fix return value in dri2_wl_swrast_commit_backbuffer
The function returns "void" rather than int. We could rework that, yet
again there will be no benefit since all the callers have no use of it.

Fixes: 9ca6711faa ("Revert "wayland: Block for the frame callback in
get_back_bo not dri2_swap_buffers"")
Reviewed-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-11-11 17:33:37 +00:00
Brian Paul
92ec47a6ba glsl: define __STDC_FORMAT_MACROS to get PRIx64 macro
Otherwise, inttypes.h may not define the macro for C++ on MinGW.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98681
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-11 09:43:29 -07:00
Brian Paul
f9052536c9 mesa: fix comment indentation in bind_buffers_check_offset_and_size()
Trivial.
2016-11-11 09:43:29 -07:00
Emil Velikov
db45f1eaab glsl: automake: add opt_add_neg_to_sub.h to the sources list
Otherwise it'll be missing in the release tarball.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-11 14:46:12 +00:00
Timothy Arceri
e36f0878cf i965: update gl_PrimitiveIDIn to be a system variable
Now that we have switched to using nir_shader_gather_info() we
can remove the hacks and just use the system variable.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-11 20:39:09 +11:00
Timothy Arceri
00620782c9 i965: use nir_shader_gather_info() over do_set_program_inouts()
This takes us one step closer to being able to drop the GLSL IR
optimisation passes during linking in favour of the NIR passes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-11 20:39:09 +11:00
Timothy Arceri
e86fc2c285 i965: remove remaining tabs in brw_program_cache.c
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-11 20:39:09 +11:00
Timothy Arceri
663fc64965 i965: rename brw_state_cache_check_size() to brw_program_cache_check_size()
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-11 20:39:09 +11:00
Timothy Arceri
0d897be973 i965: rename brw_state_cache.c -> brw_program_cache.c
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-11 20:39:09 +11:00
Jason Ekstrand
a5e88e66e6 i965/gs: Allow primitive id to be a system value
This allows for gl_PrimitiveId to come in as a system value rather than as
an input.  This is the way it will come in from SPIR-V. We keeps the input
path working for now so we don't break GL.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-10 22:43:59 -08:00
Jason Ekstrand
e73d136a02 vulkan/wsi/x11: Implement FIFO mode.
This implements VK_PRESENT_MODE_FIFO_KHR for X11.  Unfortunately, due to
the way the present extension works, we have to manage the queue of
presented images in a separate thread.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-10 22:40:59 -08:00
Jason Ekstrand
4fa0ca80ee vulkan/wsi: Report the correct min/maxImageCount
From the Vulkan spec 1.0.32 section 29.6 docs for vkAcquireNextImageKHR:

   "Let n be the total number of images in the swapchain, m be the value of
   VkSurfaceCapabilitiesKHR::minImageCount, and a be the number of
   presentable images that the application has currently acquired (i.e.
   images acquired with vkAcquireNextImageKHR, but not yet presented with
   vkQueuePresentKHR).  vkAcquireNextImageKHR can always succeed if a ≤ n -
   m at the time vkAcquireNextImageKHR is called. vkAcquireNextImageKHR
   should not be called if a > n - m with a timeout of UINT64_MAX; in such
   a case, vkAcquireNextImageKHR may block indefinitely."

With minImageCount == 2 (as it was previously, the client is allowed to
acquire all but one image withoutblocking.  If we really need 4 images for
mailbox mode + pageflipping, then we need to request a minimum of 4 images
up-front.  This is a bit unfortunate because it means we will always
consume 4 images.  In the future, we may be able to optimize this a bit by
waiting until the server starts to flip and returning OUT_OF_DATE to get
the client to re-allocate with more images or something like that.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-10 22:40:51 -08:00
Kevin Strasser
932bb3f0dd vulkan/wsi: Add a thread-safe queue implementation
In order to support FIFO mode without blocking the application on calls
to vkQueuePresentKHR it is necessary to enqueue the request and defer
calling the server until the next vblank period. The xcb present api
doesn't offer a way to register a callback, so we will have to spawn a
worker thread that will wait for a request to be added to the queue, call
to the server, and then make the image available for reuse.  This commit
introduces the queue data structure needed to implement this.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-10 22:40:44 -08:00
Tapani Pälli
3ca600fe71 android/i965: add libmesa_i965_compiler static library
this will be shared between OpenGL and Vulkan drivers

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-11 07:59:29 +02:00
Tapani Pälli
1c2de8977b android: add SPIRV_FILES to libmesa_nir
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-11 07:59:29 +02:00
Tapani Pälli
23d1799f7d anv: use STATIC_ASSERT instead of static_assert
fixes following compilation warnings on Android build:

"warning: implicit declaration of function 'static_assert' is invalid in
C99 [-Wimplicit-function-declaration]"

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-11 07:59:29 +02:00
Tapani Pälli
ec725dc140 util: use STATIC_ASSERT instead of static_assert
fixes following compilation warnings on Android build:

"warning: implicit declaration of function 'static_assert' is invalid in
C99 [-Wimplicit-function-declaration]"

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-11 07:59:29 +02:00
Dave Airlie
98969808ff vulkan: import latest public vulkan headers + and fix drivers.
I just noticed the new vulkan headers changed a prototype,
so I've decided to import them and fix the drivers to use the
new API.

Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-11 12:33:07 +10:00
Emil Velikov
e4c465e230 docs: add news item and link release notes for 12.0.4
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-11 01:57:50 +00:00
Emil Velikov
33c2958930 docs: add sha256 checksums for 12.0.4
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 7b9d7257b2)
2016-11-11 01:56:19 +00:00
Emil Velikov
d82bbf34df docs: add release notes for 12.0.4
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 3776e97f9d)
2016-11-11 01:56:18 +00:00
Brian Paul
d881e1c024 glsl: include inttypes.h for PRIx64 macro
To fix MinGW build.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-11-10 17:59:18 -07:00
Dave Airlie
2de85eb97a radv: fix texturesamples to handle single sample case
We can only read the valid samples if this is an MSAA
texture, which means the type field must be 0x14 or 0x15.

This fixes:
dEQP-VK.glsl.texture_functions.query.texturesamples.*

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-11 09:35:43 +10:00
Jason Ekstrand
a6c3d0f92b anv/cmd_buffer: Enable a CS stall workaround for Sky Lake gt4
This fixes hangs in Dota2

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org>
2016-11-10 15:21:18 -08:00
Jason Ekstrand
1e3e347fd5 anv/cmd_buffer: Take a command buffer instead of a batch in two helpers
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org>
2016-11-10 15:21:18 -08:00
Ian Romanick
e9acae8486 glsl/standalone: Add the ability to generate ir_builder code
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-11-10 14:30:49 -08:00
Ian Romanick
191d9a5195 glsl: Add a C++ code generator that uses ir_builder to rebuild a program
This is only in libstandalone currently because it will only be used in
the stand-alone compiler.

v2: Change the signature of the generated function.  The ir_factory is
created in the generator, and an availability predicate is taken as a
parameter.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-10 14:30:49 -08:00
Ian Romanick
984f16bbd7 glsl: Generate strings that are the enum names without the ir_*op_ prefix
For many expressions, this is different from the printable name.  The
printable name for ir_binop_add is "+", but we want "add".  This is
needed for ir_builder_print_visitor.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-10 14:30:49 -08:00
Ian Romanick
d0028b2e1c glsl/standalone: Enable par-linking
If the user did not request full linking, link the shader with the
built-in functions, inline them, and eliminate them.  Previous to this
you'd see all these calls to "dot" and "max" in the output.  This
prevented a lot of expected optimizations and cluttered the output.
This gives it some chance of being useful.

v2: Rebase on top of Ken's "built-ins now" work.

v3: Don't do_common_optimizations if par-linking fails.  Update expected
output of warnings tests to prevent 'make check' regressions.

v4: Optimize harder.  Most important, do function inlining.  Otherwise
it's quite impractical for one function in a file to call another
function in the same file.

v5: Add some code simplifications and an assertion suggested by Iago.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-11-10 14:30:49 -08:00
Ian Romanick
4dc759c8c2 glsl/standalone: Optimize dead variable declarations
We didn't bother with this in the regular compiler because it doesn't
change the generated code.  In the stand-alone compiler, this can
clutter the output with useless variables.  It's especially bad after
functions are inlined but the foo_retval declarations remain.

v2: Use set_foreach.  Suggested by Tapani.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-11-10 14:30:49 -08:00
Ian Romanick
f45a2a93ae glsl/standalone: Optimize add-of-neg to subtract
This just makes the output of the standalone compiler a little more
compact.

v2: Fix indexing typo noticed by Iago.  Move the add_neg_to_sub_visitor
to it's own header file.  Add a unit test that exercises the visitor.
Both the neg_a_plus_b and neg_a_plus_neg_b tests reproduced the bug that
Iago discovered.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-10 14:30:49 -08:00
Ian Romanick
9788b3b6f3 glsl/linker: Allow link_intrastage_shaders when there is no main()
This enables a sort of par-linking.  The primary use for this feature is
resolving built-in functions in the stand-alone compiler.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-11-10 14:30:49 -08:00
Timothy Arceri
7372d2153a nir: update nir_gather_info to only mark used array/matrix elements
This is based on the code from the GLSL IR pass however unlike the GLSL IR
pass it also supports arrays of arrays.

As well as implementing the logic from the GLSL IR pass we add some
additional intrinsic cases to catch more system values.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-11 09:17:07 +11:00
Kenneth Graunke
53d1f4251f mesa/compiler: move MAX_VARYING to shader_enums.h
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-11 09:17:07 +11:00
Timothy Arceri
cd52b4fb16 nir: add more helpers to nir_types.cpp
These new helpers will be used in nir_gather_info.c in a following patch.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-11 09:17:07 +11:00
Kenneth Graunke
ad9d4a4f8d nir: Generalize the "is per-vertex variable?" helpers and export them.
I want this function for nir_gather_info(), and realized it's basically
the same as the ones in nir_lower_io().

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-11 09:17:07 +11:00
Samuel Pitoiset
561f2208bd nvc0: support MP performance counters on Maxwell
This adds some performance counters/metrics for SM50/SM52.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
2016-11-10 22:13:49 +01:00
Tim Rowley
b9578b683d gallium: detect avx512 cpu features
v3: fix check for xmm/ymm test
v2: style code, add avx512 to cpu dump

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-11-10 15:03:21 -06:00
Ian Romanick
c8c46641af glsl: Parse 0 as a preprocessor INTCONSTANT
This allows a more reasonable error message for '#version 0' of

    0:1(10): error: GLSL 0.00 is not supported. Supported versions are: 1.10, 1.20, 1.30, 1.00 ES, 3.00 ES, 3.10 ES, and 3.20 ES

instead of

    0:1(10): error: syntax error, unexpected $undefined, expecting INTCONSTANT

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97420
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Cc: Juan A. Suarez Romero <jasuarez@igalia.com>
Cc: Karol Herbst <karolherbst@gmail.com>
2016-11-10 10:57:59 -08:00
Ian Romanick
e85a747e29 glcpp: Handle '#version 0' and other invalid values
The #version directive can only handle decimal constants.  Enforce that
the value is a decimal constant.

Section 3.3 (Preprocessor) of the GLSL 4.50 spec says:

    The language version a shader is written to is specified by

        #version number profile opt

    where number must be a version of the language, following the same
    convention as __VERSION__ above.

The same section also says:

    __VERSION__ will substitute a decimal integer reflecting the version
    number of the OpenGL shading language.

Use a separate flag to track whether or not the #version line has been
encountered.  Any possible sentinel (0 is currently used) could be
specified in a #version directive.  This would lead to trying to
(internally) redefine __VERSION__.  Since there is no parser location
for this addition, NULL is passed.  This eventually results in a NULL
dereference and a segfault.

Attempts to use -1 as the sentinel would also fail if '#version
4294967295' or '#version 18446744073709551615' were used.  We should
have piglit tests for both of these.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97420
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Cc: Juan A. Suarez Romero <jasuarez@igalia.com>
Cc: Karol Herbst <karolherbst@gmail.com>
2016-11-10 10:57:59 -08:00
Ian Romanick
cbba5e13ac linker: Remove unnecessary overload of program_resource_visitor::visit_field
It looks like I added this version as a short-hand for users that didn't
need the fuller version.  I don't think there's any real utility in
that.  I'm not sure what my thinking was there.  Maybe if those users
overloaded the recursion function could just call the compact version to
avoid passing some parameters?  None of the users do that.

Either way, having this extra overload is not useful.  Delete it.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-11-10 10:57:46 -08:00
Emil Velikov
b359f62456 radv: automake: list correct file in the EXTRA_DIST
Earlier commit renamed the file radeon_icd.json{,.in} but missed one
reference of the file - in EXTRA_DIST.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Fixes: 0f434a68a ("radv: Suffix the radeon_icd file with the host CPU")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-10 18:50:13 +00:00
Marek Olšák
f500c36339 mesa: remove LowerShaderSharedVariables
always true for compute shaders

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-10 18:34:55 +01:00
Marek Olšák
0f6360eedb glsl: handle partial swizzles in opt_dead_code_local correctly
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-10 18:34:55 +01:00
Marek Olšák
e27333a568 glsl: don't run loop passes if loop unrolling is disabled
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-10 18:34:55 +01:00
Marek Olšák
ce3f453f01 radeonsi: fix r600_texture::tc_compatible_htile
htile_size is now always non-zero if HTILE is allocated.

It seems to have caused no issues.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-10 18:34:55 +01:00
Marek Olšák
ce3189cbe6 radeonsi: accept is_store in image_fetch_rsrc instead of dcc_off
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-10 18:34:55 +01:00
Marek Olšák
f83b2f524a radeonsi: don't rely on tgsi_scan::images_buffers
the instruction knows the target

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-10 18:34:55 +01:00
Marek Olšák
4e00e20074 radeonsi: re-order cases in si_get_shader_param
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-10 18:34:55 +01:00
Marek Olšák
3f6e0063c8 radeonsi: increase MAX_CONTROL_FLOW_DEPTH AKA MaxIfDepth
we don't want to lower deep IFs unconditionally

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-10 18:34:55 +01:00
Emil Velikov
0187f88453 Revert "configure.ac: honour LLVM_LIBDIR when linking against LLVM"
This reverts commit a39ad18593.

The commit aims to address "missing" -L/foo/bar during linking stage. At
the same time it doesn't add the -L and yet the LLVM_LDFLAGS [which
provide -L/foo/bar] are already used throughout.

Seems like something pretty unique (broken?) on my end. Since the commit
introduces issues (due to the missing -L) revert until we get to the
root of it (PEBKAC or a genuine issue).
2016-11-10 15:10:34 +00:00
Nicolai Hähnle
b21912e2e9 radeonsi: fix/silence unused variable warnings in optimized builds
I'm leaving num_out_sgpr around since it's not in a fast path, and besides
the compiler should be able to optimize it away easily. The alternative
with #if/#endif would be extremely ugly.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-10 13:18:16 +01:00
Nicolai Hähnle
b46a9c570f gallivm: fix [IU]MUL_HI regression harder
The fix in commit 88f791db75 was insufficient
for radeonsi because the vector case was not handled properly. It seems
piglit only covers the scalar case, unfortunately.

Fixes GL45-CTS.shader_bitfield_operation.[iu]mulExtended.*

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-11-10 13:17:10 +01:00
Daniel Stone
9ca6711faa Revert "wayland: Block for the frame callback in get_back_bo not dri2_swap_buffers"
This reverts commit 25cc889004, though
since the code has changed, it was applied manually.

The intent of moving blocking from SwapBuffers to get_back_bo, was to
avoid unnecessary triple-buffering by ensuring that the compositor had
fully processed the previous frame before we started rendering. This
means that the only time we would have to resort to triple-buffering
would be when the buffer is directly scanned out, thus saving an extra
buffer for composition anyway.

The 'repaint window' changes introduced in Weston since then, however,
have narrowed the window of time between the frame event being sent and
the repaint loop needing to conclude, to 7ms by default, in order to
reduce latency. This means however that blocking in get_back_bo gives a
maximum of 7ms for the entire GL submission to begin and complete.

Not only this, but if a client is using buffer_age to avoid full
repaints, the buffer-age request will stall in get_back_bo until the
frame callback completes, meaning that the client cannot even calculate
the repaint area before the 7ms window.

The combination of the two meant that WebKit-GTK+ was failing to
achieve full framerate on a Minnowboard, due to spending a great deal of
its time attempting to query the age of the next buffer before redraw.

Revert to the previous behaviour of allowing rendering to begin but
delaying SwapBuffers, unless and until we can find a more gentle
behaviour.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jonas Ådahl <jadahl@gmail.com>
Reviewed-by: Derek Foreman <derekf@osg.samsung.com>
Tested-by: Derek Foreman <derekf@osg.samsung.com>
Cc: Kristian Høgsberg <krh@bitplanet.net>
2016-11-10 10:25:03 +00:00
Iago Toral Quiroga
8933417565 glsl: validate output blocks against input blocks
Until now were validating in/out blocks by listing the inputs in the
consumer stage and then, for each output of the producer, we checked that
it was a match if it was consumed. This method does not catch the case
where the consumer has an input that is not present as an output in the
producer stage, because it only generates link errors for outputs present
in the producer stage that don't match the inputs in the consumer stage.
The current method does catch the case were an output from the producer
stage is not consumed, which is irrelevant and is ignored.

By reversing the way we do this, we can detect this situation, so this
patch lists the outputs of the producer stage and then validates inputs
of the consumer stage against them. If we see an input in the consumer
for which there is no associated output in the producer, we produce a
link error.

The only exception to this is the special built-in input block gl_in[],
since this is implicitly generated for geometry and tessellation stages,
but we don't generate it if the producer stage does not write to any of
the pre-defined outputs (for example, if the vertex shader does not write
to gl_Position, etc). Since writing to these is not mandatory, do not
produce a link error in that case. There is a CTS tessellation test
(GL45-CTS.tessellation_shader.program_object_properties) that has an
empty vertex shader (so it does not produce gl_in[]) and would fail to
link if we don't do this.

This fixes the following dEQP test:
dEQP-GLES31.functional.shaders.linkage.io_block.missing_output_block

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98245
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-10 08:08:07 +01:00
Dave Airlie
19decd8ce4 radv: fixup botched llvm API changes.
Reported-by: Jan Vesely <jan.vesely@rutgers.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-10 14:12:32 +10:00
Dave Airlie
2fdaf38c01 ac/nir/llvm: adopt to new LLVM attribute API.
Ported from corresponding changes to gallivm.

tested build against 3.9 and master.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-10 13:29:12 +10:00
Jason Ekstrand
302f641d14 vulkan/wsi/wayland: Clean up some error handling paths
This gets rid of all the memory leaks reported by the WSI CTS tests.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-09 18:18:42 -08:00
Jason Ekstrand
3b6abfc69a vulkan/wsi/wayland: Include pthread.h
We use pthreads and, for some reason, it wasn't getting included

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-09 18:18:36 -08:00
Jason Ekstrand
b1217eada9 anv/device: Implicitly unmap memory objects in FreeMemory
From the Vulkan spec version 1.0.32 docs for vkFreeMemory:

   "If a memory object is mapped at the time it is freed, it is implicitly
   unmapped."

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "12.0 13.0" <mesa-dev@lists.freedesktop.org>
2016-11-09 18:17:55 -08:00
Jason Ekstrand
920f34a2d9 anv/device: Return the right error for failed maps
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "12.0 13.0" <mesa-dev@lists.freedesktop.org>
2016-11-09 18:17:48 -08:00
Jason Ekstrand
73ef9c8f04 anv/device: Add some asserts to MapMemory
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-09 18:17:41 -08:00
Jason Ekstrand
843775bab7 anv: Rework fences
Our previous fence implementation was very simple.  Fences had two states:
signaled and unsignaled.  However, this didn't properly handle all of the
edge-cases that we need to handle.  In order to handle the case where the
client calls vkGetFenceStatus on a fence that has not yet been submitted
via vkQueueSubmit, we need a three-status system.  In order to handle the
case where the client calls vkWaitForFences on fences which have not yet
been submitted, we need more complex logic and a condition variable.  It's
rather annoying but, so long as the client doesn't do that, we should still
hit the fast path and use i915_gem_wait to do all our waiting.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-09 18:17:29 -08:00
Jason Ekstrand
73701be667 anv/wsi: Set the fence to signaled in AcquireNextImageKHR
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-09 18:17:21 -08:00
Jason Ekstrand
71397042fe anv/gen8: Stall when needed in Cmd(Set|Reset)Event
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-09 18:17:06 -08:00
Ilia Mirkin
73f53c097a glsl: record number of components used in each slot for varying packing
Instead of packing varyings into vec4's, keep track of how many
components each slot uses and create varyings with matching types. This
ensures that we don't end up using more components than the orginal
shader, which is especially important for geometry shader output limits.

This comes up for NVIDIA hw, where the limit is 1024 output components
for a GS, and the hardware complains *loudly* if you even think about
going over.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-09 20:26:48 -05:00
Ilia Mirkin
885c788017 glsl: fix slot_end calculations and simplify reserved_slots check
The previous code was confused about whether slot_end was inclusive or
exclusive. Make it so that it is inclusive consistently, and use it for
setting the new location. This also avoids discrepancies in how
num_components is calculated vs the more manual approach taken for the
former reserved_slots check.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-09 20:25:15 -05:00
Ilia Mirkin
828faaef40 swr: correct setting of independentAlphaBlendEnable
This setting is for whether color and alpha have different blend
settings, not for whether blending is enabled on a per-RT basis.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-09 20:11:57 -05:00
Ilia Mirkin
5be635d5e4 swr: [rasterizer] add a .dir-locals.el to support 4-space indents
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-09 20:11:39 -05:00
Ilia Mirkin
36e5d68cad swr: set halfz rasterizer setting
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-09 20:11:10 -05:00
Ilia Mirkin
4b5b87e7ab swr: [rasterizer core] allow an OpenGL driver to specify halfz clipping
With ARB_clip_control, GL may also do 0..1 depth clipping, not just
-1..1. This removes clip's reliance on driver type. DX users will need
to be updated to set the new clipHalfZ flag to get proper clipping
functionality.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-09 20:10:52 -05:00
Ilia Mirkin
4af25e7131 swr: fix support for inverted depth scales
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-09 20:10:44 -05:00
Ilia Mirkin
aed517f985 swr: [rasterizer jitter] fix logic op to work with unorm/snorm
Most logic op usage is probably going to end up with normalized
textures. Scale the floating point values and convert to integer before
performing the logic operations.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-09 20:10:25 -05:00
Eric Anholt
08d51487e3 vc4: Clamp the shadow comparison value.
Fixes piglit glsl-fs-shadow2D-clamp-z.

Cc: <mesa-stable@lists.freedesktop.org>
2016-11-09 15:33:56 -08:00
Eric Anholt
e887341d3f vc4: Don't pair up TLB scoreboard locking instructions early in QPU sched.
Jonas Pfeil noticed that we were putting passthrough tlb_z writes early in
the shader, despite QIR and QPU scheduling both trying to delay scoreboard
locking for as long as possible.

The problem was that when trying to pair up QPU instructions, at some
point the passthrough tlb_z would be the last one available and it would
get paired, even if the other half would open up other instructions to be
scheduled and we could have paired tlb_z with something later in the
program.  Also, since passthrough z is just a mov, it pairs up really
easily.

The proper fix would probably be to flip the order of scheduling
instructions so we went from bottom to top (also relevant for branch delay
slot scheduling).

However, we can do a quick fix here to just not schedule a TLB lock until
there's nothing but TLB left in the program, at a slight instruction cost
(est .61% cycle count in shader-db) but a major fragment shader
parallelism win.

glmark2 results:
  texture:texture-filter=linear: +1.24481% +/- 0.626117% (n=15)
  bump:bump-render=height: 1.24991% +/- 0.154793% (n=136,133 -- screensaver
    outliers removed)
2016-11-09 15:33:56 -08:00
Eric Anholt
695a2e2ffa vc4: Print a reg pressure estimate in our reg allocation failure dump. 2016-11-09 15:33:56 -08:00
Eric Anholt
4d019bd703 vc4: Don't abort when a shader compile fails.
It's much better to just skip the draw call entirely.  Getting this
information out of register allocation will also be useful for
implementing threaded fragment shaders, which will need to retry
non-threaded if RA fails.

Cc: <mesa-stable@lists.freedesktop.org>
2016-11-09 15:33:56 -08:00
Kenneth Graunke
aaee3daa90 mesa: Fix pixel shader scratch space allocation on Gen9+ platforms.
We had missed a bit of errata - PS scratch needs to be computed as if
there were 4 subslices per slice, rather than 3.

                          Skylake      Broxton        Kabylake
                      GT1 GT2 GT3 GT4  2x6 3x6  GT1 GT1.5 GT2 GT3 GT4
Actual Slices          1   1   2   3    1   1    1    1    1   2   3
Total Subslices        3   3   6   9    2   3    2    3    3   6   9
Subsl. for PS Scratch  4   4   8   12   4   4    4    4    4   8   12

Note that Skylake GT1-3 already worked because we allocated 64 * 9
(trying to use a value that would work on GT4, with 9 subslices),
and the actual required values were 64 * 4 or 64 * 8.  However, all
others (Skylake GT4, Broxton, and Kabylake GT1-4) underallocated,
which can lead to scratch writes trashing random process memory,
and rendering corruption or GPU hangs.

Fixes GPU hangs and rendering corruption on Skylake GT4 in shaders that
spill.  Particularly, dEQP-GLES31.functional.ubo.all_per_block_buffers.*
now runs successfully with no hangs and renders correctly.  This may
fix problems on Broxton and Kabylake as well.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-11-09 15:30:59 -08:00
Kevin Strasser
1d6fe13c13 mesa/extensions: expose OES_vertex_half_float for ES2
Half float support already exists for desktop GL. Reuse the
ARB_half_float_vertex enable bit and account for the different enum to
enable the extension for ES2.

Signed-off-by: Kevin Strasser <kevin.strasser@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-09 14:35:20 -08:00
Emil Velikov
aeaf21ab3e Revert "egl: remove explicit config_id management from dri2_add_config()"
This reverts commit 3652d1d594.

Self nack/reject on this one. The base.ConfigID is overwritten
immediately after we store the current value, thus one memcpy [further
down] the wrong value will be copied.
2016-11-09 21:48:50 +00:00
Brian Paul
5b92008ae2 util: add MSVC HAS_TRIVIAL_DESTRUCTOR implementation
Based on a patch by George Kyriazis but changed to test for
_MSC_VER >= 1800 (Visual Studio 2015).

This fixes the failed CANARY assertion in src/util/ralloc.c:get_header()
on Windows.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98595
Tested-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
2016-11-09 14:55:10 -07:00
Emil Velikov
0f434a68a3 radv: Suffix the radeon_icd file with the host CPU
Port of the anv commit d96345de98 ("anv: Suffix the intel_icd file with
the host CPU").

v2: s/intel_icd/radeon_icd/ in commit summary (Gražvydas)

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com> (IRC)
2016-11-09 21:36:45 +00:00
Emil Velikov
abe110df01 radv: use correct .specVersion for extensions
Analogous to previous commit.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com> (IRC)
2016-11-09 21:36:36 +00:00
Emil Velikov
f373a91a52 anv: use correct .specVersion for extensions
Vulkan has introduced the consept of .specVersion which can be used to
attribute changes of the said extension.

The current loader does not check the value, thus it have gone unnoticed
that the driver exposes an old version of the following extensions:

VK_KHR_xcb_surface        (Rev 6)
VK_KHR_xlib_surface       (Rev 6)
VK_KHR_wayland_surface    (Rev 5)
- Updated the surface create function to take a pCreateInfo structure

VK_KHR_swapchain          (Rev 68)
- Moved the "validity" include for vkAcquireNextImage to be in its proper
  place, after the prototype and list of parameters.
...

According to the documentation:

  * pname:specVersion is the version of this extension.
    It is an integer, incremented with backward compatible changes.

Based on the history of vk.xml the above (latest) revision has been
available since Vulkan 1.0 so even if they were any backwards
incompatible change(s) [as hinted by the revision log] those should be
safe.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-09 21:36:35 +00:00
Emil Velikov
190bae7685 amd/addrlib: limit fastcall/regparm to GCC i386
The use of regparm causes an error on arm/arm64 builds with clang.
fastcall is allowed, but still throws a warning. As both options only
have effect on 32-bit x86 builds, limit them to that case.

v2: keep the __i386__ within GCC (Nicolai)

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Nicolai Hähnle <nhaehnle@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Rob Herring <robh@kernel.org>
2016-11-09 21:36:35 +00:00
Emil Velikov
a39ad18593 configure.ac: honour LLVM_LIBDIR when linking against LLVM
Currently if one uses a non-default prefix, the path won't get
propagated and we'll fail at link-time.

A very quick and easy example is to install to /usr/local.
At this point, llvm-config will be picked even without the
--with-llvm-prefix, but regardless of the latter linking will fail.

Currently people can workaround that via LD_LIBRARY_PATH.

Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org>
Cc: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-09 21:36:14 +00:00
Emil Velikov
3652d1d594 egl: remove explicit config_id management from dri2_add_config()
Currently we only saved the id to memcpy the whole _EGLConfig to write
back the exact same id value.

Remove the unneeded and confusing/misleading code.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-11-09 21:18:48 +00:00
Ian Romanick
084105c213 linker: Accurately track gl_uniform_block::stageref
As the linked per-stage shaders are processed, mark any block that has a
field that is accessed as referenced.  When combining all the linked
shaders, combine the per-stage stageref masks.

This fixes a number of GLES CTS tests:

    ES31-CTS.core.geometry_shader.program_resource.program_resource
    ES32-CTS.core.geometry_shader.program_resource.program_resource
    ESEXT-CTS.geometry_shader.program_resource.program_resource
    piglit.gl45-cts.geometry_shader.program_resource.program_resource

However, it makes quite a few more fail:

    ES31-CTS.functional.program_interface_query.buffer_variable.random.6
    ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.compute.unnamed_block.float
    ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.separable_fragment.unnamed_block.float
    ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_fragment_only_fragment.unnamed_block.float
    ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_fragment.unnamed_block.float
    ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_geo_fragment_only_fragment.unnamed_block.float
    ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_geo_fragment.unnamed_block.float
    ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_tess_fragment_only_fragment.unnamed_block.float
    ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_tess_fragment.unnamed_block.float
    ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_tess_geo_fragment_only_fragment.unnamed_block.float
    ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_tess_geo_fragment.unnamed_block.float
    ES32-CTS.functional.program_interface_query.buffer_variable.random.6
    ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.compute.unnamed_block.float
    ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.separable_fragment.unnamed_block.float
    ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_fragment_only_fragment.unnamed_block.float
    ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_fragment.unnamed_block.float
    ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_geo_fragment_only_fragment.unnamed_block.float
    ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_geo_fragment.unnamed_block.float
    ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_tess_fragment_only_fragment.unnamed_block.float
    ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_tess_fragment.unnamed_block.float
    ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_tess_geo_fragment_only_fragment.unnamed_block.float
    ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_tess_geo_fragment.unnamed_block.float

I have diagnosed the failures, but I'm not sure whether we or the
tests are wrong.  After optimizations are applied, all of the tests
are of the form:

    buffer X {
        float f;
    } x;

    void main()
    {
        x.f = x.f;
    }

The test then queries that x is referenced by that shader stage.  We
eliminate the assignment of x.f to itself, and that removes the last
reference to x.  We report that x is not referenced, and the test fails.
I do not know whether or not we are allowed to eliminate that assignment
of x.f to itself.

After discussions with the OpenGL ES group in Khronos, we believe that
Mesa's behavior is correct.  I will provide patches to the CTS tests
to Khronos.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-09 12:47:51 -08:00
Ian Romanick
392fabcfee linker: Slight code rearrange to prevent duplication in the next commit
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-09 12:47:51 -08:00
Ian Romanick
a529acfb2b linker: Trivial coding standards fixes
v2: Revert the unreachable to assert in
parcel_out_uniform_storage::visit_field.  Suggested by Ilia.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-09 12:47:51 -08:00
Ian Romanick
fbc1a4b7d2 glsl: Add some comments to methods of ir_variable_refcount_visitor
It was not obvious from the just the .h file what the hash table
contained.  It was also not obvious that get_variable_entry would create
a new entry in the hash table.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-09 12:47:51 -08:00
Aaron Watry
1492633070 llvmpipe: Fix build after removal of deprecated attribute API v2
Applies on top of v3 of Tom's gallivm change.

v2:
  - Tom Stellard: Use enums instread of strings.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Aaron Watry <awatry@gmail.com>
CC: Tom Stellard <thomas.stellard@amd.com>
CC: Jan Vesely <jan.vesely@rutgers.edu>
2016-11-09 20:13:27 +00:00
Tom Stellard
8bdd52c8f3 gallivm: Fix build after removal of deprecated attribute API v3
v2:
  Fix adding parameter attributes with LLVM < 4.0.

v3:
  Fix typo.
  Fix parameter index.
  Add a gallivm enum for function attributes.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-09 20:13:27 +00:00
Dave Airlie
fb50245ac1 radv: fix GetFenceStatus for signaled fences
if a fence is created pre-signaled we should return that
in GetFenceStatus even if it hasn't been submitted.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-09 19:49:26 +00:00
Dave Airlie
3c9af7578f radv: enable conditional discard optimisation on radv.
This fixes a bunch of GPU hangs introduced in some CTS
tests like
dEQP-VK.memory.pipeline_barrier.host_write_uniform_buffer.65536

It works around an issue seen in the LLVM backend, but
also makes the radv code work more like the radeonsi stack.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-10 05:46:49 +10:00
Dave Airlie
b16dff2d88 nir: add conditional discard optimisation (v4)
This is ported from GLSL and converts

if (cond)
	discard;

into
discard_if(cond);

This removes a block, but also is needed by radv
to workaround a bug in the LLVM backend.

v2: handle if (a) discard_if(b) (nha)
cleanup and drop pointless loop (Matt)
make sure there are no dependent phis (Eric)
v3: make sure only one instruction in the then block.
v4: remove sneaky tabs, add cursor init (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-10 05:46:33 +10:00
Dave Airlie
dd77faeca2 ac/nir: add support for discard_if intrinsic (v2)
We are going to start lowering to this in NIR code,
so prepare radv for it.

v2: handle conversion to kilp properly (nha)

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-10 05:46:20 +10:00
Kristian Høgsberg Kristensen
b3a29f2e9e anv: Do relocations in userspace before execbuf ioctl
Since our surface state buffer is shared by all batches, the kernel does a
full stall and sync with the CPU between batches every time we call
execbuf2 because it refuses to do relocations on an active buffer.  Doing
them in userspace and passing the NO_RELOC flag to the kernel allows us to
perform the relocations without stalling.

This improves the performance of Dota 2 by around 30% on a Sky Lake GT2.

v2 (Jason Ekstrand):
 - Better comments (Chris Wilson)
 - Fixed write_reloc for correct canonical form (Chris Wilson)

v3 (Jason Ekstrand):
 - Skip relocations which aren't needed
 - Provide an environment variable to always use the kernel
 - More comments about correctness (Chris Wilson)

v4 (Jason Ekstrand):
 - More comments (Chris Wilson)

v5 (Jason Ekstrand):
 - Rebase on top of moving execbuf2 setup go QueueSubmit

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-09 11:31:14 -08:00
Jason Ekstrand
8b61c57049 anv: Move relocation handling from EndCommandBuffer to QueueSubmit
Ever since the early days of the Vulkan driver, we've been setting up the
lists of relocations at EndCommandBuffer time.  The idea behind this was to
move some of the CPU load out of QueueSubmit which the client is required
to lock around and into command buffer building which could be done in
parallel.  Then QueueSubmit basically just becomes a bunch of execbuf2
calls.

Technically, this works.  However, when you start to do more in QueueSubmit
than just execbuf2, you start to run into problems.  In particular, if a
block pool is resized between EndCommandBuffer and QueueSubmit, the list of
anv_bo's and the execbuf2 object list can get out of sync.  This can cause
problems if, for instance, you wanted to do relocations in userspace.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-09 11:31:12 -08:00
Jason Ekstrand
595400d577 anv/batch: Move last_ss_pool_bo_offset to the command buffer
The original reason for putting it in the batch_bo was to allow primaries
to share it across secondaries or something like that.  However, the
relocation lists in secondary command buffers are are always left alone and
copied into the primary command buffer's relocation list.  This means that
the offset really applies at the command buffer level and putting it in the
batch_bo doesn't make sense.  This fixes a couple of potential bugs around
re-submission of command buffers that are not likely to be hit but are bugs
none the less.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-09 11:31:10 -08:00
Jason Ekstrand
0fe6829427 anv: Add an anv_execbuf helper struct
This commit adds a little helper struct for storing everything we use to
build an execbuf2 call.  Since the add_bo function really has nothing to do
with a command buffer, it makes sense to break it out a bit.  This also
reduces some of the churn in the next commit.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-09 11:31:08 -08:00
Jason Ekstrand
095c48a496 anv/batch_chain: Improve write_reloc
The old version wasn't properly handling large addresses where we have to
sign-extend to get it into the "canonical form" expected by the hardware.
Also, the new version is capable of doing a clflush of the newly written
reloc if requested.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-09 11:31:06 -08:00
Jason Ekstrand
d46bfb6297 anv: Initialize anv_bo::offset to -1
Since -1 is an invalid GPU address, this lets us know whether or not we
have a valid address for a buffer.  We don't get a valid address until the
first time that buffer is used in an execbuf2 ioctl.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-09 11:31:04 -08:00
Jason Ekstrand
bd0f8d5070 anv/allocator: Simplify anv_scratch_pool
The previous implementation was being overly clever and using the
anv_bo::size field as its mutex.  Scratch pool allocations don't happen
often, will happen at most a fixed number of times, and never happen in the
critical path (they only happen in shader compilation).  We can make this
much simpler by just using the device mutex.  This also means that we can
start using anv_bo_init_new directly on the bo and avoid setting fields
one-at-a-time.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-09 11:31:01 -08:00
Jason Ekstrand
6283b6d56a anv: Add a new bo_pool_init helper
This ensures that we're always setting all of the fields in anv_bo

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-09 11:30:59 -08:00
Jason Ekstrand
ba1eea4f95 anv: Don't presume to know what address is in a surface relocation
Because our relocation processing happens at EndCommandBuffer time and
because RENDER_SURFACE_STATE objects may be shared by batches, we really
have no clue whatsoever what address is actually written to the relocation
offset in the BO.  We need to stop making such claims to the kernel and
just let it relocate for us.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-09 11:30:57 -08:00
Jason Ekstrand
db9f4b2a2b anv: Add a cmd_buffer_execbuf helper
This puts the actual execbuf2 call in anv_batch_chain.c along with the
other relocation stuff.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-09 11:30:55 -08:00
Jason Ekstrand
07798c9c3e anv/device: Add an execbuf wrapper
This wrapper ensures that we always update all anv_bo::offset fields based
on the offsets returned by the kernel.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-09 11:30:42 -08:00
Jason Ekstrand
64b140498d anv: Make anv_finishme only warn once per call-site
When you fire up Dota2 on Haswell you get spammed with thousands of
"Implement Gen7 HZ ops" finishme's.  The point of anv_finishme is to act as
a reminder that there is something left to implement.  Printing it once
should be sufficient.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-09 10:26:37 -08:00
Jordan Justen
7bcb94bc2f i965/compute: Allow ARB_compute_shader in compat profile
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97447
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Evan Odabashian <eodabash@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-11-09 08:23:33 -08:00
Roland Scheidegger
4d5346aaac Revert "draw: use vectorized calculations for fetch"
Trivial. There's some regressions internally, related to overflow
behavior. I'll have to look at it at another time, some interactions
with vsplit/vcache are actually mind-blowing.

This reverts commit 3fa10ffb49.
2016-11-09 05:53:16 +01:00
Ilia Mirkin
f037afb701 swr: disable logic op when the rt format is float or srgb
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-08 19:28:35 -05:00
Ilia Mirkin
e2e40e236f swr: fix AND_INVERTED logic op conversion
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-08 19:28:35 -05:00
Ilia Mirkin
bef4a48d1c swr: add support for EXT_depth_bounds_test
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-08 19:28:35 -05:00
Ilia Mirkin
aa62fa8fb7 swr: [rasterizer core] set depth hottile when depth bounds test enabled
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-08 19:28:35 -05:00
Anuj Phogat
b9df2251c1 i965: Fix GPU hang related to multiple render targets and alpha testing
This patch should have been the part of commit e592f7df.
In a situation when there are multiple render targets with alpha testing
enabled, if fragment shader doesn't write to draw buffer zero, it causes
the GPU hang on SKL. No GPU hang is seen on HSW. Simulator gives a
warning for all gen6+ h/w:
"Illegal render target write message length 0xa expected 0xc"

This patch fixes the GPU hang as well as the simulator warning with
new piglit test fbo-mrt-alphatest-no-buffer-zero-write:
https://patchwork.freedesktop.org/patch/118212

No regressions in Jenkins CI system.

Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-11-08 14:22:53 -08:00
Tim Rowley
95ed1c19bf swr: allow alphatest without blend or logicop
We need to compile a blend function when alphatest is enabled.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-08 14:18:47 -06:00
Dave Airlie
bafc75b437 radv: emit correct last export when Z/stencil export is enabled
I was getting a random GPU hang in the renderpass simple tests,
it turns out sometimes radv emitted the wrong thing "last".

This fixes the logic to emit Z/stencil last if they occur,
and not mark a color output as last. Also this relies on the
Z/STENCIL being the first two fragment outputs, which they are
so yay.

Fixes: dEQP-VK.renderpass.simple.color_depth (random hangs)
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-09 06:05:03 +10:00
Marek Olšák
bdd48e47c0 tgsi/scan: turn a huge if-else-if.. chain into a switch statement
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-08 17:56:42 +01:00
Marek Olšák
f864547fa9 tgsi/scan: fix images_buffers regression
The first IF statement disabled the second one.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98599

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-08 17:56:42 +01:00
Jason Ekstrand
6b7cc8a9ec anv: Document cmd_buffer_alloc_binding_table
Some of the details of this function are very confusing and have a long
history.  We should document that history and this seems like the best
place to do it.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-08 08:32:55 -08:00
Jason Ekstrand
406cd9d126 intel/blorp: Emit all the binding tables
At least on Sky Lake, after emitting 3DSTATE_CONSTANT_*, you are required
to re-emit the 3DSTATE_BINDING_TABLE_POINTERS packet for the corresponding
stage.  If you don't, double-buffering may fail and you may get the wrong
constants.  It turns out that you need to do this even if you have no push
constants to speak of or else the next 3DSTATE_CONSTANT packet you emit for
that stage may not work correctly.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-08 08:32:55 -08:00
Jordan Justen
112a2ba276 i965/gen9: Allow sampling with hiz when supported
For gen9+ this will indicate when we should allow hiz based sampling
during rendering.

Improves performance in :
  - Synmark's OglDeferred by 2.2% (n=20)
  - Synmark's OglShMapPcf by 0.44% (n=20)

v2 by Ben: Add spec reference, and make it fix with some of the changes made on
the previous patches
Change the check from mt->aux_buf to mt->num_samples. The presence of an aux_buf
isn't enough to determine there isn't a HiZ buffer to use.

v3: It seems all depth surface end up with num_samples = 0 by default,
    so allow sampling from depth HiZ if num_samples <= 1. (Lionel)
    Allow sampling from HiZ only if all LOD are available from the HiZ
    buffer. (Lionel)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> (v2)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v3)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-08 16:13:57 +00:00
Ben Widawsky
3b0c2bc417 i965/gen9: Add HiZ auxiliary buffer support
The original functionality this patch introduces was authored by a patch from
Ken (the commit subject was the same). Since I ended up changing so many patches
in the code before this one, I had some non-trivial decisions to make, and I
didn't feel it was appropriate to keeps Ken's name as author (mostly because he
might not like what I've done). Ken's original patch was like 2 LOC :-)

In either case, some credit needs to go to Ken, and to Jordan for a few small
other changes in that original patch.

v2: Back to a smaller diff now that ISL handles most of the actual
    programming (Lionel)

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v2)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-08 16:13:57 +00:00
Jordan Justen
c0f505c7ef i965: Add function to indicate when sampling with hiz is supported
Currently it indicates that this is never supported, but soon it will
be supported for gen8+^w gen9+

v2 by Ben:
- Explicitly disable aux_hiz for gen < 9 (with comment)
- squashed in next patch to avoid unused and useless functions

   i965: Support sampling with hiz during rendering

   For gen8, we can sample from depth while using the hiz buffer. This
   allows us to sample depth without resolving from hiz to the depth
   texture.

   To do this we must resolve to hiz before drawing so we can use the hiz
   buffer to sample while rendering. Hopefully the hiz buffer will
   already be resolved in most cases because it was previously rendered,
   meaning the hiz resolve is a no-op.

   Note that this is still controlled by the
   intel_miptree_sample_with_hiz function, and we will enable hiz
   sampling for gen8 in a separate patch.

   Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
   Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> (v2)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-08 16:13:57 +00:00
Ben Widawsky
c53e9c9780 i965/miptree: Create a hiz mcs type
This seems counter to the goal of consolidating hiz, mcs, and later ccs buffers.
Unfortunately, hiz on gen6 is a thing the code supports, and this wart will be
helpful to achieve that. Overall, I believe it does help unify AUX buffers on
gen7+.

I updated the size field which I introduced in the previous patch, even though
we have no use for it.

XXX: As I mentioned in the last patch, the height given to the MCS buffer
allocation in intel_miptree_alloc_mcs() looks wrong, but I don't claim to fully
understand how the MCS buffer is laid out.

v2: rebase on master (Lionel)

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v2)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-08 16:13:57 +00:00
Ben Widawsky
36d1c555ed i965: Drop the aux mt when not used
This patch will preserve the BO & offset, and not the miptree for the
aux_mcs buffer. Eventually it might make sense to pull put the sizing
function in miptree creation, but for now this should be sufficient
and not too hideous.

v2: Save BO's offset too (Lionel)

v3: Squash previous patch storing the size of the allocated aux buffer
    (Lionel)
    Fix memory leak with mcs_buf->bo (Lionel)

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v2)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-08 16:13:57 +00:00
Ben Widawsky
42db7ab179 i965/miptree: Directly gtt map the mcs buffer
The next patch will change the map type, and this will make sure there are no
regressions as a result of the other stuff. Since the miptree is newly created,
I believe it is always safe to just map.

It is possible to CPU map this buffer on LLC platforms (it additionally requires
rounding up to tile size). I did experiment with that patch, and found no
performance gains to be had.

I've added in error handling while here. Generally GTT mapping is an operation
which is highly unlikely to fail, but we may as well handle it when it does.

v2: rebase on master (Lionel)

v3: print out error if gtt mapping fails (Topi)

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v2)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-08 16:13:57 +00:00
Jordan Justen
0041169cac i965: Wrap MCS miptree in intel_miptree_aux_buffer
This will allow us to treat HiZ and MCS the same when using as an
auxiliary surface buffer.

v2: (Ben) Minor rebase conflict resolution.
   Rename mcs_buf to aux_buf to address upcoming change for hiz specific buffers.
   That second thing is essentially a squash of:
   i965/gen8: Use intel_miptree_aux_buffer for auxiliary buffer - which didn't need
   to be separate in my opinion.

v3: rebase on master (Lionel)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>a (v2)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v3)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-11-08 16:13:57 +00:00
Nicolai Hähnle
88f791db75 gallivm: fix [IU]MUL_HI regression
This patch does two things:

1. It separates the host-CPU code generation from the generic code
   generation. This guards against accidently breaking things for
   radeonsi in the future.

2. It makes sure we actually use both arguments and don't just compute
   a square :-p

Fixes a regression introduced by commit 29279f44b3

Cc: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-11-08 16:25:54 +01:00
Roland Scheidegger
3fa10ffb49 draw: use vectorized calculations for fetch
Instead of doing all the math with scalars, use vectors. This means the
overflow math needs to be done manually, albeit that's only really
problematic for the stride/index mul, the rest has been pretty much
moved outside the shader loop (albeit the mul could actually be optimized
away too), where things are still scalar. Because llvm is complete fail
with the zero-extend widening mul, roll our own even...
To eliminate control flow in the main shader loop fetch, provide fake
buffers (so index 0 is always valid to fetch).
Still uses aos fetch though in the end - mostly because some more code
would be needed to handle unaligned fetches in that path, and because for
most formats it won't make a difference anyway (we generate some truly
horrendous code for things like R16G16_something for instance).

Instanced fetch however stays roughly the same as before, except that
no longer the same element is fetched multiple times (I've seen a reduction
of ~3 times in main shader loop size due to apparently llvm not being able
to deduce it's really all the same with a couple instanced elements).

Also, for elts gathering, use vectorized code as well - provide a fake
elt buffer if there's no valid one bound.

The generated shaders are smaller and faster to compile (not entirely sure
about execution speed, but generally unless there's just single vertices
to handle I would expect it to be faster - there's more opportunities
for future improvements by using soa fetch).

No piglit change.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-11-08 03:41:26 +01:00
Roland Scheidegger
29279f44b3 gallivm: introduce 32x32->64bit lp_build_mul_32_lohi function
This is used by shader umul_hi/imul_hi functions (and soon by draw).
It's actually useful separating this out on its own, however the real
reason for doing it is because we're using an optimized sse2 version,
since the code llvm generates is atrocious (since there's no widening
mul in llvm, and it does not recognize the widening mul pattern, so
it generates code for real 64x64->64bit mul, which the cpu can't do
natively, in contrast to 32x32->64bit mul which it could do).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-11-08 03:41:26 +01:00
Anuj Phogat
b0554c25e7 i965: Add space before paren
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-11-07 16:13:57 -08:00
Anuj Phogat
501d608e56 i965: Remove unnecessary white space
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-11-07 16:13:57 -08:00
Anuj Phogat
329ae922bd i965: Fix alpha-to-coverage and alpha test enabled checks
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-11-07 16:13:02 -08:00
Anuj Phogat
a1bd2f6950 mesa: Add helper function _mesa_is_alpha_to_coverage_enabled()
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-11-07 16:13:02 -08:00
Anuj Phogat
0295c792b4 mesa: Add helper function _mesa_is_alpha_test_enabled()
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-11-07 16:13:02 -08:00
Anuj Phogat
7fed07766d mesa: Use separate line for function return type
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-11-07 16:13:02 -08:00
Samuel Pitoiset
e32e5d214e nvc0: simplify draw parameters upload for vertex shaders
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-11-07 22:50:17 +01:00
Steven Toth
381edca826 gallium/hud: protect against and initialization race
In the event that multiple threads attempt to install a graph
concurrently, protect the shared list.

Signed-off-by: Steven Toth <stoth@kernellabs.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-07 18:31:52 +01:00
Steven Toth
5a58323064 gallium/hud: close a previously opened handle
We're missing the closedir() to the matching opendir().

Signed-off-by: Steven Toth <stoth@kernellabs.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-07 18:31:52 +01:00
Steven Toth
6ffed08679 gallium/hud: fix a problem where objects are free'd while in use.
Instead of trying to maintain a reference counted list of valid HUD
objects, and freeing them accordingly, creating race conditions
between unanticipated multiple threads, simply accept they're
allocated once and never released until the process terminates.

They're a shared resource between multiple threads, so accept
they're always available for use.

Signed-off-by: Steven Toth <stoth@kernellabs.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-07 18:31:52 +01:00
Rob Clark
a5e733c6b5 mesa: drop current draw/read buffer when ctx is released
This fixes a problem seen with gallium drivers vs android wallpaper.
Basically, what happens is:

   EGLSurface tmpSurface = mEgl.eglCreatePbufferSurface(mEglDisplay, mEglConfig, attribs);
   mEgl.eglMakeCurrent(mEglDisplay, tmpSurface, tmpSurface, mEglContext);

   int[] maxSize = new int[1];
   Rect frame = surfaceHolder.getSurfaceFrame();
   glGetIntegerv(GL_MAX_TEXTURE_SIZE, maxSize, 0);

   mEgl.eglMakeCurrent(mEglDisplay, EGL_NO_SURFACE, EGL_NO_SURFACE, EGL_NO_CONTEXT);
   mEgl.eglDestroySurface(mEglDisplay, tmpSurface);

   ... check maxSize vs frame size and bail if needed ...

   mEglSurface = mEgl.eglCreateWindowSurface(mEglDisplay, mEglConfig, surfaceHolder, null);
   ... error checking ...
   mEgl.eglMakeCurrent(mEglDisplay, mEglSurface, mEglSurface, mEglContext);

When the window-surface is created, it ends up with the same ptr address
as the recently freed tmpSurface pbuffer surface.  Which after many
levels of indirection, results in st_framebuffer_validate() ending up with
the same/old framebuffer object, and in the end never calling the
DRIimageLoaderExtension::getBuffers().  Then in droid_swap_buffers(), the
dri2_surf is still the old pbuffer surface (with dri2_surf->buffer being
NULL, obviously, so when wallpaper app calls eglSwapBuffers() nothing
gets enqueued to the compositor).  Resulting in a black/blank background
layer.

Note that at the EGL layer, when the context is unbound, EGL drops it's
references to the draw and read buffer as well.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Tested-by: Robert Foss <robert.foss@collabora.com>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
2016-11-07 10:23:26 -05:00
Serge Martin
cc495055cd clover: Add CL_PROGRAM_BINARY_TYPE support (CL1.2).
v3 [Francisco Jerez]: Loosely based on Serge's v1 of this patch in
   order to avoid CL-specific enums in the clover module binary
   format.  In addition to other changes made in v2: Represent the CL
   program binary type as the section type instead of adding a CL
   API-specific enum, check that the binary types of the input objects
   are valid during clLinkProgram(), pass section type as argument to
   build_module_library() instead of using separate function.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-11-06 15:56:54 +01:00
Serge Martin
05fcc73f08 clover: add missing clGetDeviceInfo CL1.2 queries
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
2016-11-06 15:56:49 +01:00
Samuel Pitoiset
8cc4a74971 nvc0: get rid of NVE4_COMPUTE_MP_PM_{A,B}_SIGSEL_XXX
Instead, hardcode group sigsel because there are a bunch of unknown
groups, especially on SM50/SM52.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-11-05 19:28:25 +01:00
Samuel Pitoiset
a295364596 gm107/ir: emit RED instead of ATOM when no dst
This is similar to NVC0 and GK110 emitters where we emit
reduction operations instead of atomic operations when the
destination is not used.

Found after writing some tests which check if performance counters
return the expected value. In that case, gred_count returned 0
on gm107 while at least gk106 returned the correct value.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-11-05 19:27:35 +01:00
Brian Paul
cfb5a9ab23 st/mesa: initialize members of glsl_to_tgsi_instruction in emit_asm()
This fixes random crashes with MSVC release builds.  It seems the
members are implicitly initialized to zero with gcc, but not MSVC.
In particular, the tex_offset_num_offset field was non-zero causing
a loop over the NULL tex_offsets array to crash.

Zero-init those fields and a few others to be safe.

The regression began with acc23b04cf "ralloc: remove memset from
ralloc_size".

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-05 12:09:40 -06:00
Mauro Rossi
0148313ea3 android: amd/common: add support for libmesa_amd_common
Fixes the following building error introduced with commit 7115e56
and related amd/common dependencies:

external/mesa/src/gallium/drivers/radeonsi/si_shader.c:6861: error: undefined reference to 'ac_is_sgpr_param'
external/mesa/src/gallium/drivers/radeonsi/si_shader.c:6951: error: undefined reference to 'ac_is_sgpr_param'
clang++: error: linker command failed with exit code 1 (use -v to see invocation)

ninja: build stopped: subcommand failed.
build/core/ninja.mk:148: recipe for target 'ninja_wrapper' failed
make: *** [ninja_wrapper] Error 1

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-11-05 18:42:29 +01:00
Marek Olšák
0f72f7292a winsys/radeon: don't call surface_best for FMASK
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98518

Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-11-05 18:36:26 +01:00
Kenneth Graunke
0c17b0b6f0 mesa: Add linear ETC2/EAC to the compressed format list with ES3 compat.
GL_ARB_ES3_compatibility brings ETC2/EAC formats to desktop GL.

The meaning of the GL compressed format list is pretty vague - it's
supposed to return formats for "general-purpose usage".  (GL 4.2
deprecates the list because of this.)  Basically everyone interprets
this as "linear RGB/RGBA".

ETC2/EAC meets that criteria, so while we shouldn't be required to add
it to the list, there's also little harm in doing so, at least on
platforms with native support.  I doubt anyone is using this list for
much anyway, so even on platforms without native support, it's probably
not a big deal.

Makes the following GL45-CTS.gtf43 tests pass:

* GL3Tests.eac_compression_r11.gl_compressed_r11_eac
* GL3Tests.eac_compression_rg11.gl_compressed_rg11_eac
* GL3Tests.eac_compression_signed_r11.gl_compressed_signed_r11_eac
* GL3Tests.eac_compression_signed_rg11.gl_compressed_signed_rg11_eac
* GL3Tests.etc2_compression_rgb8.gl_compressed_rgb8_etc2
* GL3Tests.etc2_compression_rgb8_pt_alpha1.gl_compressed_rgb8_pt_alpha1_etc2
* GL3Tests.etc2_compression_rgba8.gl_compressed_rgba8_etc2

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-11-04 16:10:20 -07:00
Eric Anholt
283d4d18e5 vc4: Use Newton-Raphson on the 1/W write to fix glmark2 terrain.
The 1/W was apparently not accurate enough, and we were getting sparklies
in the distance.  The closed driver also did a N-R step here.

Cc: <mesa-stable@lists.freedesktop.org>
2016-11-04 15:34:38 -07:00
Eric Anholt
70fc3a941a vc4: Make sure that vertex shader texture2D() calls use LOD 0.
I noticed this while trying to debug glmark2 terrain (which does vertex
shader texturing, but no mipmaps on its textures sampled from the VS).
2016-11-04 15:34:38 -07:00
Nicolai Hähnle
2c875158e2 radeonsi: fix vertex fetches for 2_10_10_10 formats
The hardware always treats the alpha channel as unsigned, so add a shader
workaround. This is rare enough that we'll just build a monolithic vertex
shader.

The SINT case cannot actually happen in OpenGL, but I've included it for
completeness since it's just a mix of the other cases.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-04 21:30:18 +01:00
Nicolai Hähnle
322483f71b st/mesa: fix the layer of VDPAU surface samplers
A (latent) bug in VDPAU interop was exposed by commit
e5cc84dd43.

Before that commit, the st_vdpau code created samplers with
first_layer == last_layer == 1 that the general texture handling code
would immediately delete and re-create, because the layer does not match
the information in the GL texture object.

This was correct behavior at least in the DMABUF case, because the imported
resource is supposed to have the correct offset already applied.  In the
non-DMABUF case, this was just plain wrong but apparently nobody noticed.

After that commit, the state tracker assumes that an existing sampler is
correct at all times.  Existing samplers are supposed to be deleted when
they may become invalid, and they will be created on-demand.  This meant
that the sampler with first_layer == last_layer == 1 stuck around, leading
to rendering artefacts (on radeonsi), command stream failures (on r600), and
assertions (in debug builds everywhere).

This patch fixes the problem by simply not creating a sampler at all in
st_vdpau_map_surface.  We rely on the generic texture code to do the right
thing, adding the layer_override to make the non-DMABUF case work.

v2: add the layer_override

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98512
Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Cc: Christian König <deathsimple@vodafone.de>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-11-04 21:26:29 +01:00
Dave Airlie
d0d5f7600c Revert "st/vdpau: use linear layout for output surfaces"
This reverts commit d180de3532.

This is a radeon specific hack that causes problems on nouveau
when combined with the SHARED flag later. If radeonsi needs a fix
for this, please fix it in the driver.

[chk]
Using linear surfaces for this makes sense because tilling isn't
beneficial and the surfaces can potentially be shared with other GPUs
using the VDPAU OpenGL interop.

[airlied]
I think we need a flag that isn't SHARED/LINEAR that is more
SHARED_OTHER_GPU.

[mareko]
Does radeonsi need PIPE_BIND_VIDEO_DECODE_OUTPUT that it would translate
into linear ?

[mareko]
My only concern is decoding performance. If the decoder works in 64x1
blocks, tiling will hurt. That's the theory. I don't know how the
decoder works.

Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> (I+A)
2016-11-04 15:04:21 +00:00
Marek Olšák
00baaa4752 radeonsi: fix an assertion failure in si_decompress_sampler_color_textures
This fixes a crash in Deus Ex: Mankind Divided. Release builds were
unaffected, so it's not too serious.

Cc: 11.2 12.0 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-04 11:30:47 +01:00
Marek Olšák
64c2593a5c glx: make interop ABI visible again
This was broken when the GLAPI use was removed from mesa_glinterop.h.

Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-04 11:30:47 +01:00
Marek Olšák
ee39d4456e egl: make interop ABI visible again
This was broken when the GLAPI use was removed from mesa_glinterop.h.

Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-04 11:30:47 +01:00
Marek Olšák
bf51b45313 egl: use util/macros.h
I need the definition of PUBLIC.

Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-04 11:30:47 +01:00
Nicolai Hähnle
84a74be9e4 radeonsi: enable GLSL 4.50
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-11-04 10:33:50 +01:00
Nicolai Hähnle
e4b378800e st/glsl_to_tgsi: fix dvec[34] loads from SSBO
When splitting up loads, we have to add 16 bytes to the offset for
the high components, just like already happens for stores.

Fixes arb_gpu_shader_fp64@shader_storage@layout-std140-fp64-shader.

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-04 10:31:02 +01:00
Nicolai Hähnle
aef7eb4cac glsl/cache: correct asprintf error handling
From the manpage of asprintf:

   "If memory allocation wasn't possible, or some other error occurs,
    these functions will return -1, and the contents of strp are
    undefined."

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-11-04 10:28:08 +01:00
Michel Dänzer
8ce7ef75f5 gallium/radeon: Multiply bpe by nsamples in surf_winsys_to_drm
For symmetry with surf_drm_to_winsys.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-04 16:51:18 +09:00
Michel Dänzer
356458363d gallium/radeon: Use flags parameter in radeon_winsys_surface_init
Fixes valgrind warnings about surf_ws->flags being uninitialized while
starting X.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-04 16:49:39 +09:00
Michel Dänzer
6f844a30c1 gallium/radeon: Only convert stencil info if RADEON_SURF_SBUFFER is set
Fixes valgrind warnings about using uninitialized memory when starting X.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-04 16:48:59 +09:00
Michel Dänzer
38fb9aa1aa gallium/radeon: Only loop up to last_level for drm<->winsys conversion
Fixes spurious assertion failure in surf_level_drm_to_winsys when
starting X, due to processing a miplevel which was never initialized.

Fixes: e9c76eeeaa ("gallium/radeon: remove radeon_surf_level::pitch_bytes")
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-04 16:47:43 +09:00
Tapani Pälli
1e3f7bfc9a anv: use limits.h instead of deprecated/obsolete values.h
Mesa uses limits.h elsewhere, and this makes is possible to
compile anv_allocator.c on Android.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-11-04 08:35:43 +02:00
Eric Anholt
80157466cd vc4: Add miptree/texture state support for ETC1 compressed textures.
The format isn't flagged as enabled at runtime yet, because we need kernel
validation support.
2016-11-03 18:42:58 -07:00
Eric Anholt
bedb996087 vc4: Fix use of undefined values since the ralloc zeroing changes.
reralloc() no longer zeroes the new contents, so switch to using
rzalloc_array() instead.
2016-11-03 18:42:58 -07:00
Eric Anholt
49936364e4 nir: Make sure to set the texsrc type in nir drawpixels/bitmap lowering.
We were leaving an undefined value since the ralloc zeroing changes.
Fixes nir_validate() failures on vc4.

v2: Fix the color-index case of drawpixels as well.

Reviewed-by: Rob Clark <robdclark@gmail.com> (v1)
2016-11-03 18:42:58 -07:00
Roland Scheidegger
572a952126 draw: fix undefined input handling some more...
Previous fixes were incomplete - some code still iterated through the number
of elements provided by velem layout instead of the number stored in the key
(which is the same as the number defined by the vs). And also actually
accessed the elements from the layout directly instead of those in the key.
This mismatch could still cause crashes.
(Besides, it is a very good idea to only use data stored in the key anyway.)
v2: move null format check, remove now unnecessary function parameter,
some minor prettify

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-04 01:48:22 +01:00
Brian Paul
f4dd3bde37 gallium/hud: call fflush() after printing error messages
For Windows.  Otherwise, we don't see the message until the program exits.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-11-03 14:29:23 -06:00
Brian Paul
260d951486 svga: move svga_mark_surfaces_dirty() prototype to svga_surface.h
Trivial.
2016-11-03 14:29:23 -06:00
Brian Paul
c96f63cac2 svga: whitespace / formatting clean-up in svga_context.c
Trivial.
2016-11-03 14:29:23 -06:00
Brian Paul
1691e29e62 svga: collect stats for time spent in svga_context_finish()
This should have appeared with commit "svga: add guest statistic
gathering interface" from August 4, but was somehow lost.
2016-11-03 14:29:23 -06:00
Charmaine Lee
8a195e2fd5 svga: invalidate new surface before it is bound to a render target view
Invalidate a "new" surface before it is bound to a render target view or
depth stencil view in order to avoid the unnecessary host side copy
of the surface data before it is rendered to.
Note that, recycled surface is already invalidated before it is reused.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-03 14:29:23 -06:00
Charmaine Lee
06bba2452f Revert "svga: use untyped surface formats in most cases"
Using untyped surface formats causes huge performance degradation on Fusion.
This reverts commit eb0ced74f6 until
the backend has a better solution to address typeless surface formats.
2016-11-03 14:29:23 -06:00
Charmaine Lee
f2eec4e829 svga: allow quad blit for more formats
Currently blitter will fail if the blit format is different and
view-incompatible to the resource format. Instead of punting
to software blit which will stall the pipeline, we will
create temporary resource to allow blitter to work.

Fixes piglit test arb_copy_image-formats.
Also tested with MTT piglit, glretrace.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-03 14:29:22 -06:00
Charmaine Lee
4bd5ce853b svga: create BGRX render target view for BGRX_UNORM surface
Currently we adjust the view format when we are asked to create a
BGRA render target view for BGRX surface. But we only look for
SVGA3D_B8G8R8X8_TYPELESS surface format.
With this patch, we will also check for SVGA3D_B8G8R8X8_UNORM surface format,
and use SVGA3D_B8G8R8X8_UNORM as the view format for that case.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-03 14:29:22 -06:00
Charmaine Lee
0d221fcd40 svga: add a helper function to check for typeless format
This patch adds a helper function svga_format_is_typeless() which
returns TRUE if the specified format is typeless.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-03 14:29:22 -06:00
Brian Paul
d451421bca svga: add SVGA_NEW_FRAME_BUFFER to svga_hw_tss_binding state atom
We may need to re-emit texture bindings when the framebuffer state
changes.  In particular, emitting the texture binding can also involve
updating a texture from its backing copy during sampler view validation.
The backing copy is made during framebuffer validation.

This helps to fix an issue with Photoshop on VGPU9 (VMware bug 1723971).

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-11-03 14:29:22 -06:00
Charmaine Lee
ec138d6237 svga: allow copy_region if sample counts match
With this patch, we will allow blit with copy_region if the
source and destination textures have the same sample counts.

Fixes failures with piglit tests
 spec@arb_texture_float@multisample-formats 2 gl_arb_texture_float
 spec@arb_texture_rg@multisample-formats 2 gl_arb_texture_rg-float

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-03 14:29:22 -06:00
Charmaine Lee
a2d49c4b46 svga: set rendered-to flag after updating the texture using PredCopyRegion
This patch sets the rendered-to flag for the subresource after it is
updated using the PredCopyRegion command. This is to ensure that the GB surface
will be sync up properly before it will be directly mapped to.

Tested with MTT piglit, glretrace.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-03 14:29:22 -06:00
Charmaine Lee
59f14563a3 svga: add can_use_upload flag
This patch adds a flag "can_use_upload" to svga_texture structure
to avoid some checking of the upload availability at each transfer map time.

Tested with Lightsmark2008, Tropics, MTT glretrace, piglit.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-03 14:29:22 -06:00
Charmaine Lee
3dfb4243bd svga: fix texture upload path condition
As Thomas suggested, we'll first try to map directly to a GB surface.
If it is blocked, then we'll use texture upload buffer.
Also if a texture is already "rendered to", that is, the GB surface
is already out of sync, then we'll use the texture upload buffer
to avoid syncing the GB surface.

Tested with Lightsmark2008, Tropics, MTT piglit, glretrace.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-03 14:29:22 -06:00
Charmaine Lee
4750c4e543 svga: set rendered_to flag with texture uploaded using TransferFromBuffer command
This patch sets the rendered_to flag for the texture subresource that
is uploaded using the TransferFromBuffer command. This is to ensure that
the subresource will be read back or invalidated before it will be
directly mapped to. This makes sure that the content of the GB surface
will not be accidentally overwritten by the device at suspend/resume time.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-03 14:29:22 -06:00
Neha Bhende
03e1b7cacd svga: Add render_condition boolean flag in struct svga_context
set render_condition flag when driver performs conditional rendering.
Blit using DXPredCopyRegion command gets affected by conditional rendering so
We should check this flag while performing blit operation

Tested with piglit tests.

v2: As per Charmaine's comment, setting render_condition flag if svga_query is valid.
Tested with pigit tests.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-11-03 14:29:22 -06:00
Neha Bhende
2cff6f4512 svga: Allow DXPredCopyRegion for depth_and_stencil formats.
DXPredCopyRegion supports copy between src and dst for depth_and_stencil
formats if src and dst have same formats.

tested ith piglit

v2: As per Brian's comment, allow DXPredCopyRegion for depth+stencil buffers
if the blit mask is PIPE_MASK_ZS.

Tested with piglit tests and added new piglit test
arb_framebuffer_object-depth-stencil-blit to test this particular testcase.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-03 14:29:22 -06:00
Neha Bhende
9a9627a791 svga: fix memory leak in svga_clear_texture()
Piglit tests which uses arb_clear_texture extension, have memory leak issue.
pipe_surface created in svga_clear_texture() was not deleted which happens to be
the cause for memory leak.

tested all arb_clear_texture-* piglit tests with valgrid.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-11-03 14:29:22 -06:00
Thomas Hellstrom
d787ce7288 svga: Implement the pipe clear_render_target functionality v2
v2: Accounted for the fact that svga_try_clear_render_target also
honors conditional rendering.

Testing done: Excercised all functions in a separate feature branch. Forced
emission of conditional rendering commands when necessary.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-03 14:29:22 -06:00
Charmaine Lee
76f5f76468 svga: add SVGA_3D_CMD_INVALIDATE_GB_SURFACE support
This command will be used in a subsequent patch to invalidate a surface.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-03 14:29:22 -06:00
Francisco Jerez
f3d387867f nir: Flip gl_SamplePosition in nir_lower_wpos_ytransform().
Assuming the hardware is set up to use a screen coordinate system
flipped vertically with respect to the GL's window coordinate system,
the SYSTEM_VALUE_SAMPLE_POS vector will also be flipped vertically
with respect to the value expected by the GL, so we need to give it
the same treatment as gl_FragCoord.  Fixes the following CTS tests on
i965:

 ES31-CTS.functional.shaders.multisample_interpolation.interpolate_at_offset.at_sample_position.default_framebuffer
 ES31-CTS.functional.shaders.sample_variables.sample_pos.correctness.default_framebuffer

when run with any multisample configuration, e.g. rgba8888d24s8ms4.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-11-03 11:46:44 -07:00
Nanley Chery
faab6a0f18 isl: Only allow Y-tiling for ASTC textures
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-03 11:22:58 -07:00
Nanley Chery
1625d911d7 anv/blorp: Don't create linear ASTC surfaces for buffers
Such a surface is not possible on our hardware. Without this change, ISL
surface creation would fail with the next patch.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-03 11:22:58 -07:00
Nanley Chery
bb550e2977 anv/formats: Disallow linear ASTC textures
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-03 11:22:58 -07:00
Nanley Chery
80de528c7e anv/formats: Disallow 1D compressed textures
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-03 11:22:58 -07:00
Chris Wilson
b4001af174 i965: Use rzalloc for cfg_t
Valgrind reports that we use cfg.cycle_count uninitialised, so zero the
cfg_t on construction.

Fixes: 52d2b28f7f ("ralloc: use rzalloc where it's necessary")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 11:16:05 -07:00
Nicolai Hähnle
27bd9c0f0a pipe-loader: add libamd_common for radeonsi
This fixes a build regression of commit 7115e56c21.
Sorry for the breakage, this second location for link dependencies escaped
my build tests.

Bugzilla: https://patchwork.freedesktop.org/patch/119816/
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2016-11-03 16:54:55 +01:00
Andreas Boll
f792f0687f glx/windows: Add wgl.h to the sources list
Otherwise it won't be picked in the tarball and the build will fail.

Fixes: 533b3530c1 ("direct-to-native-GL for GLX clients on Cygwin
("Windows-DRI")")
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Jon Turney <jon.turney@dronecode.org.uk>
2016-11-03 11:38:04 +01:00
Tapani Pälli
979ec2cf75 i965: use rzalloc instead of calloc in brwNewProgram
commit cc6aa1d161 changed to using rzalloc
for gl_program creation but one instance for program creation was still
using calloc.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-03 12:09:40 +02:00
Nicolai Hähnle
908f92ad1f radeonsi: generate GS prolog to (partially) fix triangle strip adjacency rotation
Fixes GL45-CTS.geometry_shader.adjacency.adjacency_indiced_triangle_strip and
others.

This leaves the case of triangle strips with adjacency and primitive restarts
open. It seems that the only thing that cares about that is a piglit test.
Fixing this efficiently would be really involved, and I don't want to use the
hammer of degrading to software handling of indices because there may well
be software that uses this draw mode (without caring about the precise
rotation of triangles).

v2:
- skip the GS prolog entirely if workaround is not needed
- only check for TES (TES is always non-null when tessellation is used)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:11:24 +01:00
Nicolai Hähnle
ffe4e829b0 radeonsi: remove si_shader_context::is_gs_copy_shader
It has become redundant.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:53 +01:00
Nicolai Hähnle
3b2516721b radeonsi: make the GS copy shader owned by the GS selector
The copy shader only depends on the selector. This change avoids creating
separate code paths for monolithic vs. non-monolithic geometry shaders.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:50 +01:00
Nicolai Hähnle
9c6f7d66dc radeonsi: si_shader_vs only depends on the GS selector
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:48 +01:00
Nicolai Hähnle
693435d846 radeonsi: si_vgt_gs_mode only depends on the selector
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:45 +01:00
Nicolai Hähnle
2e1fb7e7fc radeonsi: make si_generate_gs_copy_shader usable as a standalone function
It really only depends on the shader selector.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:42 +01:00
Nicolai Hähnle
ba5de0d034 radeonsi: unify the si_compile_* functions for prologs and epilogs
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:37 +01:00
Nicolai Hähnle
aa9583b0da radeonsi: get rid of no_{prolog,epilog}
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:34 +01:00
Nicolai Hähnle
75503b1904 radeonsi: get rid of si_llvm_emit_fs_epilogue
It is no longer used.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:31 +01:00
Nicolai Hähnle
611510038a radeonsi: get rid of get_interp_param
Replace by a simple LLVMGetParam, since ctx->no_prolog is always false.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:29 +01:00
Nicolai Hähnle
3f4439b6ba radeonsi: get rid of select_interp_param
The condition !ctx->no_prolog is now always true.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:26 +01:00
Nicolai Hähnle
858ac2228f radeonsi: use TCS epilog for monolithic shaders
For fixed function TCS, we keep the copying of VS outputs to TES inputs inside
the main function; the call to si_copy_tcs_inputs is moved accordingly.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:23 +01:00
Nicolai Hähnle
3f1be54e53 radeonsi: extract si_build_tcs_epilog_function
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:20 +01:00
Nicolai Hähnle
be6e31c6a0 radeonsi: use VS epilog for monolithic TES
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:17 +01:00
Nicolai Hähnle
06dcb2d2a9 radeonsi: use VS prolog and epilog for monolithic shaders
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:14 +01:00
Nicolai Hähnle
f9daa2f470 radeonsi: extract si_build_vs_{prolog,epilog}_function
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:12 +01:00
Nicolai Hähnle
6f37e992a3 radeonsi: use PS prolog for monolithic shaders
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:09 +01:00
Nicolai Hähnle
15dd332e6a radeonsi: set num_input_vgprs for fragment shaders in create_function
So that the prolog generated for monolithic fragment shaders will have the
right signature.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:05 +01:00
Nicolai Hähnle
fec7ced211 radeonsi: extract si_build_ps_prolog_function
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:02 +01:00
Nicolai Hähnle
7115e56c21 radeonsi: use PS epilog for monolithic shaders
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:07:00 +01:00
Nicolai Hähnle
bf86c56594 radeonsi: extract si_build_ps_epilog_function
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:06:57 +01:00
Nicolai Hähnle
0b9bba7f6c radeonsi: pass the function name to si_llvm_create_func
We will use multiple functions in one module, so they should have
different names.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:06:54 +01:00
Nicolai Hähnle
96d60dd9ee radeonsi: split is_monolithic into no_prolog and no_epilog
This helps to achieve a gradual transition towards building monolithic shaders
via inlining.

no_prolog and no_epilog will be removed by the end of the series,
separate_prolog remains in use to control the PS input mapping.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:06:50 +01:00
Nicolai Hähnle
8db9d915cd radeonsi: free data structures when shader compiles fail
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:06:47 +01:00
Nicolai Hähnle
4c1504af6a radeonsi: move main TGSI translation into its own function
The idea is that adding prolog and epilog code will be pulled out into the
caller.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:06:44 +01:00
Nicolai Hähnle
23dfb688ba radeonsi: add always-inline pass to si_llvm_finalize_module
Change the pass manager as well, since this is a module-level pass. No
noticeable run-time difference on shader-db.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:06:42 +01:00
Nicolai Hähnle
4ada1dabc4 radeonsi: fix signature of export intrinsic in VS epilog
The incompatible signature becomes an issue when the VS epilog gets merged
with the main vertex shader at the IR level.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:06:33 +01:00
Nicolai Hähnle
899b2f24a4 radeonsi: link against amd_common
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:06:30 +01:00
Nicolai Hähnle
908100cfae amd/common: add ac_is_sgpr_param helper
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:06:27 +01:00
Nicolai Hähnle
2ff5df8f50 amd/common: build also for gallium drivers
At least when LLVM is used, which is basically always (unless you're only
building r600 without OpenCL).

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:06:24 +01:00
Nicolai Hähnle
8eabee9ec0 amd/common: move llvm helper prototype to ac_llvm_util.h
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-03 10:05:46 +01:00
Nicolai Hähnle
37d646c1b3 glsl: fix lowering of UBO references of named blocks
When a UBO reference has the form block_name.foo where block_name refers
to a block where the first member has a non-zero offset, the base offset
was incorrectly added to the reference.

Fixes an assertion triggered in debug builds by
GL45-CTS.enhanced_layouts.uniform_block_layout_qualifier_conflict. That test
doesn't properly check for correct execution in this case, so I am also
going to send out a piglit test.

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-11-03 09:57:25 +01:00
Kenneth Graunke
8df4aebc94 glsl: Update deref types when resizing implicitly sized arrays.
At link time, we resolve the size of implicitly sized arrays.
When doing so, we update the type of the ir_variables.  However,
we neglected to update the type of ir_dereference nodes which
reference those variables.

It turns out array_resize_visitor (for GS/TCS/TES interface array
handling) already did 2/3 of the cases for this, so we can simply
refactor the code and reuse it.

This fixes:
GL45-CTS.shader_storage_buffer_object.basic-syntax
GL45-CTS.shader_storage_buffer_object.basic-syntaxSSO

which have an SSBO containing an implicitly sized array, followed
by some other members.  setup_buffer_access uses the dereference
types to compute offsets to fields, and it had a stale type where
the implicitly sized array's length was still 0 instead of the
actual length.

While we're here, we can also fix update_array_sizes to properly
update deref types as well, fixing a FINISHME from 2010.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-03 01:42:37 -07:00
Timothy Arceri
d2861d682a mesa/glsl: delete previously linked shaders earlier when linking
This moves the delete linked shaders call to
_mesa_clear_shader_program_data() which makes sure we delete them
before returning due to any validation problems.

It also reduces some code duplication.

From the OpenGL 4.5 Core spec:

   "If LinkProgram failed, any information about a previous link of
   that program object is lost. Thus, a failed link does not restore
   the old state of program.

   ...

   If one of these commands is called with a program for which
   LinkProgram failed, no error is generated unless otherwise noted.
   Implementations may return information on variables and interface
   blocks that would have been active had the program been linked
   successfully. In cases where the link failed because the program
   required too many resources, these commands may help applications
   determine why limits were exceeded."

Therefore it's expected that we shouldn't be able to query the
program that failed to link and retrieve information about a
previously successful link.

Before this change the linker was doing validation before freeing
the previously linked shaders and therefore could exit on failure
before they were freed.

This change also fixes an issue in compat profile where a program
with no shaders attached is expect to fall back to fixed function
but was instead trying to relink IR from a previous link.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97715
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-03 11:58:53 +11:00
Timothy Arceri
903e5eae97 nir: fix nir_shader_clone() and nir_sweep()
These were broken in e1af20f18a when the info field in nir_shader was
turned into a pointer.

Clone was copying the pointer rather than the data and nir_sweep was
cleaning up shader_info rather than claiming it.

Reviewed-by: Eric Anholt <eric@anholt.net>
2016-11-03 10:39:13 +11:00
Timothy Arceri
f304aca542 mesa: move shader_info to the start of gl_program
This will allow use to use ralloc_parent() on the info field and fix
a regression in nir_sweep() caused by e1af20f18a.

This is intended to be a temporary requirement that will be removed
when we finish separating shader_info from nir_shader.

Reviewed-by: Eric Anholt <eric@anholt.net>
2016-11-03 10:39:13 +11:00
Timothy Arceri
cc6aa1d161 st/mesa/r200/i915/i965: use rzalloc() to create gl_program
This allows us to use ralloc_parent() to see which data structure owns
shader_info which allows us to fix a regression in nir_sweep().

This will also allow us to move some fields from gl_linked_shader to
gl_program, which will allow us to do some clean-ups like storing
gl_program directly in the CurrentProgram array in gl_pipeline_object
enabling some small validation optimisations at draw time.

Also it is error prone to depend on the gl_linked_shader for
programs in current use because a failed linking attempt will free
infomation about the current program. In i965 we could be trying
to recompile a shader variant but may have lost some required fields.

Reviewed-by: Eric Anholt <eric@anholt.net>
2016-11-03 10:39:13 +11:00
Samuel Pitoiset
548b5fee6b nv50,nvc0: stop limiting the number of active queries to 1
This limitation was initially here because AMD_performance_monitor
doesn't allow to expose the real number of hardware counters. But
this actually really annoying when profiling with qapitrace.

Anyways, performance counters are mostly for developers and
failures are expected if you try to monitor more queries than
supported.

This breaks amd_performance_monitor_measure but it's expected.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-11-02 23:42:09 +01:00
Samuel Pitoiset
b6137f226c nvc0: add new warp_nonpred_execution_efficiency metric on SM35
Event not_predicated_off_thread_inst_executed is SM35+.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-11-02 23:35:49 +01:00
Samuel Pitoiset
98a382d013 nvc0: add missing metric-issue_slot on SM35
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-11-02 23:35:46 +01:00
Samuel Pitoiset
c32d7175aa nvc0: do not expose metric-inst_issued twice on SM35
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-11-02 23:35:44 +01:00
Samuel Pitoiset
524703da58 nvc0: add new warp_execution_efficiency metric on SM30+
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-11-02 23:35:42 +01:00
Samuel Pitoiset
51fe48660a nvc0: respect 80-chars for perf metrics descriptions
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-11-02 23:35:39 +01:00
Samuel Pitoiset
b58d85bac8 nvc0: sort performance metrics alphabetically
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-11-02 23:35:28 +01:00
Fredrik Höglund
e7b9c5eb74 radv: add support for anisotropic filtering on VI+
Ported from radeonsi.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-03 08:27:21 +10:00
Dave Airlie
73592b9284 radv: fix dual source blending
Dolphin tried to use this, but we hadn't had any tests for it properly.

All that is required is the shader output format needs to be set
for 0 and 1 exports.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-03 08:26:51 +10:00
Samuel Pitoiset
1d75d681d3 nv50: add missing draw_calls_indexed driver stat
Spotted when glancing at the VBO push code.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-11-02 21:11:57 +01:00
Adam Jackson
afaaf623d4 glx/glvnd: Use bsearch() in FindGLXFunction instead of open-coding it
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2016-11-02 14:52:43 -04:00
Adam Jackson
8bca8d89ef glx/glvnd: Fix dispatch function names and indices
As this array was not actually sorted, FindGLXFunction's binary search
would only sometimes work.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2016-11-02 14:52:38 -04:00
Adam Jackson
deb0eb1660 glx/glvnd: Don't modify the dummy slot in the dispatch table
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2016-11-02 14:52:31 -04:00
Jason Ekstrand
71cc1e188d anv/pipeline: Properly cache prog_data::param
Before we were caching the prog data but we weren't doing anything with
brw_stage_prog_data::param so anything with push constants wasn't getting
cached properly.  This commit fixes that.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98012
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-02 09:32:28 -07:00
Jason Ekstrand
ff3185e3ba anv/pipeline: Put actual pointers in anv_shader_bin
While we can simply calculate offsets to get to things such as the
prog_data and the key, it's much more user-friendly if there are just
pointers.  Also, it's a bit more fool-proof.

While we're at it, we rework the pipeline cache API to use the
brw_stage_prog_data type directly.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98012
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-02 09:32:22 -07:00
Jason Ekstrand
4306c10a88 intel/blorp: Pass a brw_stage_prog_data to upload_shader
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98012
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-02 09:32:19 -07:00
Jason Ekstrand
058304f081 intel/blorp: Use wm_prog_data instead of hand-rolling our own
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98012
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-02 09:32:15 -07:00
Jason Ekstrand
a5f8ff6ca1 anv: Better handle return codes from anv_physical_device_init
The case where we just want the loop to continue is INCOMPATIBLE_DRIVER
because that simply means that whatever FD we opened isn't a supported
Intel chip.  Other error codes such as OUT_OF_HOST_MEMORY are actual errors
and we should be returning early in that case.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-02 09:26:41 -07:00
Jason Ekstrand
daeb21e478 vulkan/wsi/x11: Clean up connections in finish_wsi
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-02 09:26:36 -07:00
Jason Ekstrand
fc0e9e3e40 vulkan/wsi/x11: Better handle wsi_x11_connection_create failure
Without this fix, the function would still end up returning NULL but it
would put that NULL connection in the hash table which would be bad.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-11-02 09:25:57 -07:00
Chih-Wei Huang
e3e5b1a488 android: avoid using libdrm with host modules
Note LOCAL_CFLAGS and LOCAL_SHARED_LIBRARIES in Android.common.mk
are used by both host and target modules. However, commit 112e988
moved libdrm related flags to common. It causes the errors like:

error: 'out/host/linux-x86/obj32/SHARED_LIBRARIES/libdrm_intermediates/export_includes',
needed by 'out/host/linux-x86/obj32/EXECUTABLES/mesa_gen_matypes_intermediates/import_includes',
missing and no known rule to make it

No reason to use libdrm with host modules.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Fixes: 112e988329 ("Android: move libdrm settings to top-level
Android.common.mk")
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-02 14:43:26 +00:00
Nicolai Hähnle
1ef505bb02 glsl: compute lvalues of [in]out parameters before inlined function body
This is required when an out argument involves an array index that is either
a global variable modified by the function or another out argument in the
same function call.

Fixes the shaders/out-parameter-indexing/vs-inout-index-inout-* tests.

v2:
- modify the ir_dereference_array nodes in place
- use ir_hierarchical_visitor
v3: use base_ir (Ian Romanick)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-11-02 12:32:47 +01:00
Nicolai Hähnle
5aef14932a radeonsi: fix BFE/BFI lowering for GLSL semantics
Fixes spec/arb_gpu_shader5/execution/built-in-functions/*-bitfield{Extract,Insert}

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-02 12:30:11 +01:00
Nicolai Hähnle
6526977306 tgsi: align the definition of BFI & [UI]BFE with GLSL
As previously written, these opcodes use the SM5 semantics which is
incompatible with GLSL when bits == 0, offset == 32.

At some point we may want to add BFI_SM5 etc. opcodes, but all users
currently either want (and expect!) the GLSL semantics or don't care.

Bitfield inserts are generated by the GLSL lower_instructions and
lower_packing_builtins passes with constant bits and offset arguments,
so any workaround code that drivers may have to emit to follow GLSL
semantics should be optimized away easily for those uses.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-02 12:30:07 +01:00
Dave Airlie
9f0726f3e5 radv: expose xlib platform extension
I missed this when I added the xlib code, this allows
dolphin emu to start and crash later.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-02 10:00:38 +10:00
Lionel Landwerlin
a28db12e21 intel: aubinator: print field values if available
Turning this :

sampler state 0
    Sampler Disable: false
    Texture Border Color Mode: 0
    LOD PreClamp Enable: 1
    Base Mip Level: 0.000000
    Mip Mode Filter: 0
    Mag Mode Filter: 1
    Min Mode Filter: 1
    Texture LOD Bias: foo
    Anisotropic Algorithm: 0

into this :

sampler state 0
    Sampler Disable: false
    Texture Border Color Mode: 0 (DX10/OGL)
    LOD PreClamp Enable: 1 (OGL)
    Base Mip Level: 0.000000
    Mip Mode Filter: 0 (NONE)
    Mag Mode Filter: 1 (LINEAR)
    Min Mode Filter: 1 (LINEAR)
    Texture LOD Bias: foo
    Anisotropic Algorithm: 0 (LEGACY)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sirisha Gandikota<sirisha.gandikota@intel.com>
2016-11-01 22:37:56 +00:00
Lionel Landwerlin
74c4c84482 intel: aubinator: load fields values from xml data
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sirisha Gandikota<sirisha.gandikota@intel.com>
2016-11-01 22:37:52 +00:00
Lionel Landwerlin
c8806eeefc intel: aubinator: print boolean fields to true with colors
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sirisha Gandikota<sirisha.gandikota@intel.com>
2016-11-01 22:37:22 +00:00
Marek Olšák
d3244c47ce amd: fix a typo in PIXEL_PIPE_STAT_RESET definition
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-01 22:33:13 +01:00
Marek Olšák
7786f8c635 gallium/radeon: add enum radeon_micro_mode
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-01 22:33:13 +01:00
Marek Olšák
1a4e0162fc gallium/radeon: make it clear that DRM 2.x.x fast clear constraint is CIK-only
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-01 22:33:13 +01:00
Marek Olšák
e3697b4be6 gallium/radeon: remove r600_surface::level_info
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-01 22:33:13 +01:00
Marek Olšák
bf4d102ea3 gallium/radeon: add radeon_surf::is_linear
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-01 22:33:13 +01:00
Marek Olšák
e9c76eeeaa gallium/radeon: remove radeon_surf_level::pitch_bytes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-01 22:33:13 +01:00
Marek Olšák
c66a550385 gallium/radeon: don't call u_format helpers if we have that info already
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-01 22:33:13 +01:00
Marek Olšák
692f2640ab gallium/radeon: replace radeon_surf_info::dcc_enabled with num_dcc_levels
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-01 22:33:13 +01:00
Marek Olšák
315eb0acb4 radeonsi: add a driver query for counting CP DMA calls
CP DMA calls are synchronous with regard to shaders, but can be made
asynchronous if needed.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-01 22:33:13 +01:00
Marek Olšák
d268b7f95e radeonsi: add a driver query for shader cache hits
This is an 8-month old patch.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-01 22:33:13 +01:00
Marek Olšák
6b309f7368 gbm: set up the interop extension for egl/drm
breaking libgbm -> libEGL ABI?

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-01 22:33:13 +01:00
Samuel Pitoiset
8bfd65395e nvc0: do not duplicate similar performance metrics
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2016-11-01 19:03:26 +01:00
Emil Velikov
bc4c09dc99 docs: add news item and link release notes for 13.0.0
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-11-01 16:09:13 +00:00
Emil Velikov
631fa587e1 docs: add sha256 checksums for 13.0.0
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 405dd26860)
2016-11-01 16:07:26 +00:00
Emil Velikov
e205c265c8 docs: Update 13.0.0 release notes
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit df1b0a5a86)
2016-11-01 16:07:24 +00:00
Jason Ekstrand
c41ec1679f anv/device: Return DEVICE_LOST if execbuf2 fails
This makes more sense than OUT_OF_HOST_MEMORY.  Technically, you can
recover from a failed execbuf2 but the batch you just submitted didn't
fully execute so things are in an ill-defined state.  The app doesn't want
to continue from that point anyway.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-01 07:54:52 -07:00
Antia Puentes
61a8a55f55 i965/gen8: Fix vertex attrib upload for dvec3/4 shader inputs
The emission of vertex attributes corresponding to dvec3 and dvec4
vertex shader input variables was not correct when the <size> passed
to the VertexAttribL* commands was <= 2.

This was because we were using the vertex array size when emitting vertices
to decide if we uploaded a 64-bit floating point attribute as 1 slot (128-bits)
for sizes 1 and 2, or 2 slots (256-bits) for sizes 3 and 4. This caused problems
when mapping the input variables to registers because, for deciding which
registers contain the values uploaded for a certain variable, we use the size
and type given to the variable in the shader, so we will be assigning 256-bits
to dvec3/4 variables, even if we only uploaded 128-bits for them, which happened
when the vertex array size was <= 2.

The patch uses the shader information to only emit as 128-bits those 64-bit floating
point variables that were declared as double or dvec2 in the vertex shader. Dvec3 and
dvec4 variables will be always uploaded as 256-bits, independently of the <size> given
to the VertexAttribL* command.

From the ARB_vertex_attrib_64bit specification:

   "For the 64-bit double precision types listed in Table X.1, no default
    attribute values are provided if the values of the vertex attribute variable
    are specified with fewer components than required for the attribute
    variable. For example, the fourth component of a variable of type dvec4
    will be undefined if specified using VertexAttribL3dv or using a vertex
    array specified with VertexAttribLPointer and a size of three."

We are filling these unspecified components with zeros, which coincidentally is
also what the GL44-CTS.vertex_attrib_binding.basic-inputL-case1 expects.

v2: Do not use bitcount (Kenneth Graunke)

Fixes: GL44-CTS.vertex_attrib_binding.basic-inputL-case1 test

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97287
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-11-01 09:39:09 +01:00
Dave Airlie
f88ea8c72a radv: drop some unused cmask info members.
These were assigned but never used.

Inspired by similiar patch in radeonsi.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-01 15:11:35 +10:00
Lionel Landwerlin
1b88760f85 intel: aubinator: fix printing missing gen option
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-31 22:03:13 +00:00
Lionel Landwerlin
46d67799a6 intel: aubinator: fix assumptions on amount of required data
We require 12 bytes of headers but in some cases we just need 4.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-31 22:03:09 +00:00
Lionel Landwerlin
6f05b69572 intel: aubinator: don't print out blocks twice
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-31 22:02:41 +00:00
Nanley Chery
e9a25e0247 i965: Move gen8_disable_stages to brw_upload_initial_gpu_state
3DSTATE_WM_CHROMAKEY isn't programmed anywhere else.
3DSTATE_WM_HZ_OP is programmed, then cleared by blorp during a
HZ op, so repeatedly clearing it after every blorp execution is
redundant.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-31 13:20:05 -07:00
Nanley Chery
477ea60b68 i965: Program 3DSTATE_AA_LINE_PARAMETERS in upload_invariant_state
This packet is non-pipelined and doesn't ever change across emissions.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-31 13:20:00 -07:00
Leo Liu
06e3cd6a45 st/omx/dec: disable tunnel for size different case
When the video coded size is different from frame size, we need the result
buffers are same as coded size, which are not size compatible with encode
required size, so that simply use no tunnel for this case instead of frame
by frame converting.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Cc: 13.0 <mesa-stable@lists.freedesktop.org>
2016-10-31 11:45:29 -04:00
Leo Liu
d9b2c4048d st/omx/dec: result buffers size should match codec decoder size
Otherwise fails the check of matching between decoder size and buffers
size in kernel.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Cc: 13.0 <mesa-stable@lists.freedesktop.org>
2016-10-31 11:45:14 -04:00
George Kyriazis
55fb874376 swr: [rasterizer] added EventHandlerFile contructor
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-10-31 09:06:29 -05:00
George Kyriazis
0a5811b0f3 swr: [rasterizer core] Frontend dependency work
Add frontend dependency concept in the DRAW_CONTEXT, which
allows serialization of frontend work if necessary.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-10-31 09:06:21 -05:00
George Kyriazis
06f93d0329 swr: [rasterizer core] Refactor/cleanup backends
Used for common code reuse and simplification

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-10-31 09:06:15 -05:00
George Kyriazis
78a0a09e48 swr: [rasterizer core] Remove deprecated simd intrinsics
Used in abandoned all-or-nothing approach to converting to AVX512

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-10-31 09:06:08 -05:00
George Kyriazis
1a3ed86348 swr: [rasterizer archrast] Add thread tags to event files.
This allows the post-processor to easily detect the API thread and to
process frame information. The frame information is needed to
optimized how data is processed from worker threads.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-10-31 09:05:25 -05:00
Marek Olšák
7a2387c3e0 glsl: use a non-malloc'd storage for short ir_variable names
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-31 11:53:38 +01:00
Marek Olšák
21e11b5282 glsl: use the linear allocator in opt_constant_propagation
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-31 11:53:38 +01:00
Marek Olšák
565b2c4c4b glsl: use the linear allocator in opt_copy_propagation
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-31 11:53:38 +01:00
Marek Olšák
b6f50e4640 glsl: use the linear allocator in opt_copy_propagation_elements
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-31 11:53:38 +01:00
Marek Olšák
9c19dedff0 glsl: use the linear allocator in opt_dead_code_local
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-31 11:53:38 +01:00
Marek Olšák
23e373eb4f glsl: use the linear allocator in glsl_symbol_table
no ralloc_free occurences

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-31 11:53:38 +01:00
Marek Olšák
a4a93103fb glsl: use the linear allocator for ast_node and derived classes
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-31 11:53:38 +01:00
Marek Olšák
2296bb0967 glsl/lexer: use the linear allocator
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-31 11:53:38 +01:00
Marek Olšák
47e1758692 glcpp: use the linear allocator for most objects
v2: cosmetic changes

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
2016-10-31 11:53:38 +01:00
Marek Olšák
6608dbf540 ralloc: add a linear allocator as a child node of ralloc
v2: remove goto, cosmetic changes

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-31 11:53:38 +01:00
Marek Olšák
acc23b04cf ralloc: remove memset from ralloc_size
only do it in rzalloc_size as it was supposed to be

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
2016-10-31 11:53:38 +01:00
Marek Olšák
52d2b28f7f ralloc: use rzalloc where it's necessary
No change in behavior. ralloc_size is equivalent to rzalloc_size.
That will change though.

Calls not switched to rzalloc_size:
- ralloc_vasprintf
- glsl_type::name allocation (it's filled with snprintf)
- C++ classes where valgrind didn't show uninitialized values

I switched most of non-glsl stuff to rzalloc without checking whether
it's really needed.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-31 11:53:38 +01:00
Marek Olšák
9454f7c0ef ralloc: add DECLARE_RZALLOC_CXX_OPERATORS
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whiteacpe.org>
2016-10-31 11:53:38 +01:00
Juha-Pekka Heikkila
3bf6c6c3ad nir: zero allocated memory where needed
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-10-31 11:53:38 +01:00
Juha-Pekka Heikkila
4d4335c81a i965/fs: fill allocated memory with zeros where needed
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-10-31 11:53:38 +01:00
Juha-Pekka Heikkila
5fa41520e4 i965/vec4: zero allocated memory where needed
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-10-31 11:53:38 +01:00
Tapani Pälli
e40c5dab5e glsl/glcpp: initialize all fields of glcpp_parser_t on creation
this fixes some of the regressions with
	"ralloc: remove memset from ralloc_size"

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-10-31 11:53:38 +01:00
Juha-Pekka Heikkila
6770b17b99 glsl: Fix reading of uninitialized memory
Switch to use memory allocations which zero memory for places
where needed.

v2: modify and rebase on top of Marek's series (Tapani)

Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-10-31 11:53:38 +01:00
Marek Olšák
f67c5a7ccd glsl: initialize glsl_struct_field properly
don't rely on ralloc doing memset

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whiteacpe.org>
2016-10-31 11:53:38 +01:00
Marek Olšák
330482177c ralloc: don't memset ralloc_header, clear it manually
time GALLIUM_NOOP=1 ./run shaders/private/alien_isolation/ >/dev/null

Before (2 takes):

real    0m8.734s    0m8.773s
user    0m34.232s   0m34.348s
sys     0m0.084s    0m0.056s

After (2 takes):

real    0m8.448s    0m8.463s
user    0m33.104s   0m33.160s
sys     0m0.088s    0m0.076s

Average change in "real" time spent: -3.4%

calloc should only do 2 things compared to malloc:
- check for overflow of "n * size"
- call memset
I'm not sure if that explains the difference.

v2: clear "parent" and "next" in the caller of add_child.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> (v1)
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
2016-10-31 11:53:38 +01:00
Serge Martin
cb0879985a clover: Implement clGetExtensionFunctionAddressForPlatform.
Add clGetExtensionFunctionAddressForPlatform (CL 1.2).

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-10-30 12:53:03 -07:00
Vedran Miletić
2fba72046d clover: Introduce CLOVER_EXTRA_*_OPTIONS environment variables
The options specified in the CLOVER_EXTRA_BUILD_OPTIONS shell
variable are appended to the options specified by the OpenCL program
in the clBuildProgram function call, if any.
Analogously, the options specified in the CLOVER_EXTRA_COMPILE_OPTIONS
and CLOVER_EXTRA_LINK_OPTIONS variables are appended to the options
specified in clCompileProgram and clLinkProgram function calls,
respectively.

v2:
 * rename to CLOVER_EXTRA_COMPILER_OPTIONS
 * use debug_get_option
 * append to linker options as well

v3: code cleanups

v4: separate CLOVER_EXTRA_LINKER_OPTIONS options

v5:
 * fix documentation typo
 * use CLOVER_EXTRA_COMPILER_OPTIONS in link stage

v6:
 * separate in CLOVER_EXTRA_{BUILD,COMPILE,LINK}_OPTIONS
 * append options in cl{Build,Compile,Link}Program

Signed-off-by: Vedran Miletić <vedran@miletic.net>
Reviewed-by[v1]: Edward O'Callaghan <funfunctor@folklore1984.net>

v7 [Francisco Jerez]: Slight simplification.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-10-30 12:45:26 -07:00
Vedran Miletić
e3272865c2 clover: Pass unquoted compiler arguments to Clang
OpenCL apps can quote arguments they pass to the OpenCL compiler, most
commonly include paths containing spaces.

If the Clang OpenCL compiler was called via a shell, the shell would
split the arguments with respect to to quotes and then remove quotes
before passing the arguments to the compiler. Since we call Clang as a
library, we have to split the argument with respect to quotes and then
remove quotes before passing the arguments.

v2: move to tokenize(), remove throwing of CL_INVALID_COMPILER_OPTIONS

v3: simplify parsing logic, use more C++11

v4: restore error throwing, clarify a comment

Signed-off-by: Vedran Miletić <vedran@miletic.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-10-30 12:14:59 -07:00
Jason Ekstrand
2a4a86862c i965/fs/generator: Don't use the address immediate for MOV_INDIRECT
The address immediate field is only 9 bits and, since the value is in
bytes, the highest GRF we can point to with it is g15.  This makes it
pretty close to useless for MOV_INDIRECT.  There were already piles of
restrictions preventing us from using it prior to Broadwell, so let's get
rid of the gen8+ code path entirely.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97779
Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-10-28 17:11:16 -07:00
Marek Olšák
4bf45a6079 radeonsi: fix behavior of GLSL findLSB(0)
12.0 and older need the same fix but elsewhere.

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-29 01:17:36 +02:00
Marek Olšák
e24dc43164 radeonsi: set VGT_GS_ONCHIP_CNTL on CIK and later
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: 11.2 12.0 13.0  <mesa-stable@lists.freedesktop.org>
2016-10-29 01:17:36 +02:00
Jason Ekstrand
cab3d46739 i965: Fix make check after 66fcfa6894
Commit 66fcfa6894 changed the vec4 version of offset() to have 3
parameters instead of 2 but the vec4_cmod_propagation test was never
updated.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-28 14:55:38 -07:00
Kenneth Graunke
e6aeeace69 glsl: Improve accuracy of alpha scaling in advanced blend lowering.
When blending with GL_COLORBURN_KHR and these colors:

   dst = <0.372549027, 0.372549027, 0.372549027, 0.372549027>
   src = <0.09375, 0.046875, 0.0, 0.375>

the normalized dst value became 0.99999994 (due to precision problems
in the floating point divide of rgb by alpha).  This caused the color
burn equation to fail the dst >= 1.0 comparison.  The blue channel would
then fall through to the dst < 1.0 && src >= 0 comparison, which was
true, since src.b == 0.  This produced a factor of 0.0 instead of 1.0.

This is an inherent numerical instability in the color burn and dodge
equations - depending on the precision of alpha scaling, the value can
be either 0.0 or 1.0.  Technically, GLSL floating point division doesn't
even guarantee that 0.372549027 / 0.372549027 = 1.0.  So arguably, the
CTS should allow either value.  I've filed a bug at Khronos for further
discussion (linked below).

In the meantime, this patch improves the precision of alpha scaling by
replacing the division with (rgb == alpha ? 1.0 : rgb / alpha).  We may
not need this long term, but for now, it fixes the following CTS tests:

ES31-CTS.blend_equation_advanced.blend_specific.GL_COLORBURN_KHR
ES31-CTS.blend_equation_advanced.blend_all.GL_COLORBURN_KHR_all_qualifier

Cc: currojerez@riseup.net
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://cvs.khronos.org/bugzilla/show_bug.cgi?id=16042
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-10-28 10:40:53 -07:00
Brian Paul
c538846e31 mesa: rename gl_client_array -> gl_vertex_array
The term "client array" is a legacy thing dating back to the pre-VBO
era when _all_ vertex arrays lived in client memory.

Nowadays, it only contains vertex array state which is derived from
gl_array_attributes and gl_vertex_buffer_binding.  It's used by the
VBO module and some drivers.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-28 09:25:30 -07:00
Brian Paul
161db1335b mesa: code clean-up in _mesa_update_vao_client_arrays()
Init vars where declared, use const qualifiers.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-28 09:25:30 -07:00
Brian Paul
7a4ba9f16e mesa: update comment on vertex_attrib_binding()
Was missed in an earlier renaming patch.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-28 09:25:29 -07:00
Brian Paul
910bc4d12c mesa: rename gl_vertex_array_object::VertexBinding to BufferBinding
To be a little more understandable.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-28 09:25:29 -07:00
Eduardo Lima Mitev
129da27426 vulkan/wsi/x11: Smplify implementation of vkGetPhysicalDeviceSurfaceFormatsKHR
This patch simplifies x11_surface_get_formats(). It is actually just a
readability improvement over the patch I provided earlier this week
(750d8cad72).

Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-10-28 16:53:28 +02:00
Eduardo Lima Mitev
b677b99db5 vulkan/wsi/x11: Fix behavior of vkGetPhysicalDeviceSurfacePresentModesKHR
x11_surface_get_present_modes() is currently asserting that the number of
elements in pPresentModeCount must be greater than or equal to the number
of present modes available. This is buggy because pPresentModeCount
elements are later copied from the internal modes' array, so if
pPresentModeCount is greater, it will overflow it.

On top of that, this assertion violates the spec. From the Vulkan 1.0
(revision 32, with KHR extensions), page 581 of the PDF:

    "If the value of pPresentModeCount is less than the number of
     presentation modes supported, at most pPresentModeCount values will be
     written. If pPresentModeCount is smaller than the number of
     presentation modes supported for the given surface, VK_INCOMPLETE
     will be returned instead of VK_SUCCESS to indicate that not all the
     available values were returned."

So, the correct behavior is: if pPresentModeCount is greater than the
internal number of formats, it is clamped to that many present modes. But
if it is lesser than that, then pPresentModeCount elements are copied,
and the call returns VK_INCOMPLETE.

This fix is similar (but simpler and more readable) than the one I provided
in 750d8cad72 for vkGetPhysicalDeviceSurfaceFormatsKHR, which was suffering
from the same problem.

Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-10-28 16:53:06 +02:00
Timothy Arceri
7d059bdfb9 i965: use memory context when creating passthrough tcs
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-28 19:57:15 +11:00
Timothy Arceri
5857c3082e intel/blorp: remove stale comment
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-28 19:51:08 +11:00
Eduardo Lima Mitev
c06480390b drivers/meta: Accept GL_TEXTURE_3D as target for tex image decompression
An assert is currently raised, preventing decompression of a texture image into
a GL_TEXTURE_3D target. I have not found any spec wording that would explain
this, or implementation detail that would prevent it. And in any case, the
driver should not cause a crash upon user input arguments.

Fixes most failing subcases in CTS tests:
* GL44-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pixelstore
* GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pixelstore

These tests were crashing the driver before. Now they just fail, but due
to an unrelated issue affecting 2 out of the 45 test subcases.

No regressions observed against piglit or CTS-GL.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-28 00:20:32 -07:00
Jason Ekstrand
43dadb6edd intel/blorp: Rework our usage of ralloc when compiling shaders
Previously, we were creating the shader with a NULL ralloc context and then
trusting in blorp_compile_fs to clean it up.  The only problem was that
blorp_compile_fs didn't clean up its context properly so we were leaking.
When I went to fix that, I realized that it couldn't because it has to
return the shader binary which is allocated off of that context and used by
the caller.  The solution is to make blorp_compile_fs take a ralloc
context, allocate the nir_shaders directly off that context, and clean it
all up in whatever function creates the shader and calls blorp_compile_fs.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "12.0, 13.0" <mesa-stable@lists.freedesktop.org>
2016-10-27 22:46:13 -07:00
Jason Ekstrand
ab92480272 intel/blorp: Rename compile_nir_shader to compile_fs
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-10-27 22:46:13 -07:00
Fredrik Höglund
044ef54d65 radv: split the device local memory heap into two
Advertise two device local memory heaps; one that is host visible
and one that is not.

This makes it possible for clients to tell how much host visible
vs. non-host visible memory is available.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-28 12:27:49 +10:00
Fredrik Höglund
c9675b4e17 radv: add a write-combining host-local memory type
Add the new memory type between the two device-local types. This makes
the list of supported memory types look like this:

1) DEVICE_LOCAL |              |               |
2)              | HOST_VISIBLE | HOST_COHERENT |
3) DEVICE_LOCAL | HOST_VISIBLE | HOST_COHERENT |
4)              | HOST_VISIBLE | HOST_COHERENT | HOST_CACHED

With this order a client that searches for a HOST_VISIBLE and
HOST_COHERENT memory type using the algorithm described in section
10.2 of the Vulkan specification (revision 32) will find the host-
local memory type first.

A client that requires the memory type to be HOST_VISIBLE and
HOST_COHERENT, but not DEVICE_LOCAL is most likely searching for
a memory type suitable for staging buffers / images.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-28 12:27:46 +10:00
Jason Ekstrand
44760c100c i965/miptree: Remove the width/height < 32768 restrictions
These restrictions existed because intel_miptree_blit couldn't handle
surfaces bigger than 32k.  How that we're chopping blits up into chunks, it
can handle any size we throw at it so we can get rid of this restriction.
This improves the terrain tests in synmark by 25-30% on my Sky Lake gt3.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reported-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-27 14:44:59 -07:00
Jason Ekstrand
80d3af8129 i965/blit: Break blits into chunks in intel_miptree_blit
This allows us to blit much larger images than if we use the blitter
directly.  In particular, it gives us an almost infinite image height
compared to the fairly limiting 32k.  We do, however, still have a
restriction on stride of the image because handling larger strides, while
possible, is fairly difficult.

v2: Properly handle linear blit alignment restrictions

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-27 14:44:54 -07:00
Jason Ekstrand
b7979a849b i965/blit: Break blits into chunks in set_alpha_to_one
v2: Properly handle linear blit alignment restrictions

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-27 14:43:59 -07:00
Jason Ekstrand
174f4900b2 i965/blit: Remove a bogus assertion
This assertion, while valid for linear buffers, doesn't work properly for
tiled memory.  It used to work most of the time because the offset provided
was always to the left-hand edge of the image.  However, if you use a byte
offset to get to the inside of the image, the height * stride calculation
may actually end up being too large.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-27 14:43:24 -07:00
Jason Ekstrand
6da8149601 i965/miptree: Break miptree -> ISL tiling conversion into a helper
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-27 14:43:21 -07:00
Jason Ekstrand
c30b7164b7 i965/miptree: Remove the stencil_as_y_tiled parameter from get_aligned_offset
The only actual user of this parameter was blorp and, since the conversion
to ISL, it no longer uses this function.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-10-27 14:43:17 -07:00
Jason Ekstrand
4964a5149b intel/blorp: Fix a couple asserts around image copy rectangles
With dealing with rectangles in compressed images, you can have a width or
height that isn't a multiple of the corresponding compression block
dimension but only if that edge of your rectangle is on the edge of the
image.  When we call convert_to_single_slice, it creates an 2-D image and a
set of tile offsets into that image.  When detecting the right-edge and
bottom-edge cases, we weren't including the tile offsets so the assert
would misfire.  This caused crashes in a few UE4 demos

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reported-by: "Eero Tamminen" <eero.t.tamminen@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98431
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Tested-by: "Eero Tamminen" <eero.t.tamminen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-10-27 13:45:39 -07:00
Jason Ekstrand
caf67bb12f anv/allocator: Assert that we have a valid gem handle in bo_pool_alloc 2016-10-27 13:45:39 -07:00
Samuel Pitoiset
84e946380b nvc0/ir: fix emission of IMAD with NEG modifiers
The emitter tried to emit sub instead of subr when src0 has
actually a NEG modifier.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 12.0 13.0" <mesa-stable@lists.freedesktop.org>
2016-10-27 19:29:56 +02:00
Juan A. Suarez Romero
5d83820a1d glsl: inspect interfaces in contains_foo()
When checking if a type contains doubles, integers, samples, etc. we
check if the current type is a record or array, but not if it is an
interface.

This commit also inspects if the type is an interface.

It fixes spec/arb_enhanced_layouts/compiler/transform-feedback-layout-qualifiers/xfb_offset/invalid-block-with-double.vert
piglit test.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-10-27 12:36:09 +02:00
Iago Toral Quiroga
66fcfa6894 i965/vec4: make offset() work in terms of a simd width and scalar components
So that it has the same semantics as the scalar backend implementation. The
helper will now take a simd width (which is always 8 in vec4 mode) and step
as many scalar components as specified by that width, respecting the size of
the scalar channels.

v2 (Curro):
  - Remove the assertion in offset(), byte_offset() has the same checks.
  - Use byte_offset() directly instead of add_byte_offset().
  - Make things more clear by explicitly including the vertical stride
    in the byte offset expression.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-10-27 10:59:31 +02:00
Iago Toral Quiroga
ba63db1f2e i965/vec4: use byte_offset() instead of offset()
In a later patch we want to change the semantics of offset() to be in terms
of SIMD width and scalar channels so it is consistent with the definition
of the same helper in the scalar backend. However, some uses of offset()
in the vec4 backend do not operate naturally in terms of these
semantics. In these cases it is more natural to use the byte_offset() helper
instead.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-10-27 10:59:31 +02:00
Iago Toral Quiroga
5a4ce9f9a7 i965/vec4: add a byte_offset helper
v2: wrap the helper in a namespace to make clear that it is an
    implementation detail of byte_offset() and is not intended
    to be used independently (Curro).

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-10-27 10:59:31 +02:00
Kenneth Graunke
173558445d glsl: Size TCS->TES unsized arrays to gl_MaxPatchVertices for queries.
SSO validation and other program interface queries want to see that
unsized (non-patch) TCS output/TES input arrays are implicitly sized
to gl_MaxPatchVertices.

By the time we create the program resource lists, we've sized the arrays
to their actual size.  (We try to create TCS output arrays to match the
output patch size right away, and at this point, we should have shrunk
TES input arrays.)  One option would be to keep them sized to
gl_MaxPatchVertices, and defer shrinking them.  But that's a big change,
and I don't think it's a good idea.

Instead, this patch introduces a new ir_variable flag which indicates
the variable is implicitly to gl_MaxPatchVertices.  Then, the linker
munges the types when creating the resource list, ignoring the size
in the IR's types.  Basically, lie about it for resource queries.
It's ugly, but I think it ought to work.

We probably could use var->data.implicit_sized_array for this, but
I opted for a separate bit to try and avoid convoluting the existing
SSBO handling.  They're similar in concept, but share none of the
same code...

Fixes:
ES31-CTS.core.tessellation_shader.single.xfb_captures_data_from_correct_stage
and the ES32-CTS and ESEXT-CTS variants.

v2: Add a comment (requested by Timothy, written by me).

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-10-27 00:56:51 -07:00
Kenneth Graunke
34fd2ffed8 glsl: Pass ctx to program interface query helper functions.
The next commit will use this in add_shader_variable - this just
separates out some of the mechanical changes for easier review.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-10-27 00:56:34 -07:00
Tapani Pälli
2035930966 egl: set preserved behavior for surface only if config supports it
Otherwise we can end up with mismatching behavior between config and
surface when client queries surface attributes. As example, configs
for DRI3 do not support preserved behavior but here we were setting
preserved behavior for pixmap and pbuffer.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98326
Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2016-10-27 07:12:51 +03:00
Tapani Pälli
671da8d8ba mesa: expose GL_EXT_robustness
Fixes 8 failing dEQP tests:
   dEQP-EGL.functional.create_context_ext.robust_gles*

(now 42 tests pass in dEQP-EGL*robust*, 0 fail and rest are skipped)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98343
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-10-27 07:06:41 +03:00
Tapani Pälli
44482d5a3e st/mesa: set RobustAccess true when is supported
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-10-27 07:06:41 +03:00
Tapani Pälli
fe764477b0 i956: set RobustAccess true when is supported
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-10-27 07:06:41 +03:00
Tapani Pälli
ef16003320 mesa: add missing CONTEXT_ROBUST_ACCESS enum
commit 85008db1d5 missed this enum
for GL_KHR_robustness implementation

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-10-27 07:06:41 +03:00
Tapani Pälli
6bf6fcfcd9 egl: fix error handling in _eglCreateSync
EGL specification requires context to be current only when sync
type matches EGL_SYNC_FENCE_KHR.

Fixes 25 failing dEQP tests:
   dEQP-EGL.functional.reusable_sync.*

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98339
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-10-27 07:06:41 +03:00
Dave Airlie
ca035006c8 vulkan/wsi/x11: add support for IMMEDIATE present mode
We shouldn't be using ASYNC here, that would be used
for immediate mode, so let's implement that.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-27 11:43:15 +10:00
Dave Airlie
1cdca1eb16 vulkan/wsi: store present mode in swapchain base class
This just moves this up a level as x11 will need it to
implement things properly.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-27 11:43:00 +10:00
Dave Airlie
787c172aed vulkan/wsi/x11: handle timeouts properly in next image acquire (v1.1)
For 0 timeout, just poll for an event, and if none, return
For UINT64_MAX timeout, just wait for special event blocked
For other timeouts get the xcb fd and block on it, decreasing
the timeout if we get woken up for non-special events.

v1.1: return VK_TIMEOUT for poll timeouts.
handle timeout going negative.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-27 11:42:26 +10:00
Dave Airlie
d548fa882b radv/ac/llvm: trim texture return values
The intrinsic engine asserts in llvm due to this,
as we put a vec4 into a vec1, and the next instruction
isn't expecting it.

So trim the vector at the end before inserting it.

Reported-by: Christoph Haag <haagch+mesadev@frickel.club>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-27 11:42:03 +10:00
Rhys Kidd
5c73ecaac4 glsl: Add pthread libs to cache_test
Fixes the following compile error, present when the SHA1 library is libgcrypt:

  CCLD     glsl/tests/cache-test
glsl/.libs/libglsl.a(libmesautil_la-mesa-sha1.o): In function `call_once':
/mesa/src/util/../../include/c11/threads_posix.h:96: undefined reference to `pthread_once'

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-10-27 09:49:35 +11:00
Matt Turner
8e5aed5b56 genxml: Handle failure of Python codegen scripts. 2016-10-26 14:06:45 -07:00
Samuel Pitoiset
1ec7227d44 nvc0/ir: fix emission of SHLADD with NEG modifiers
This affects GF100:GK110 chipsets, but not GM107+ where the
logic is a bit different. The emitters tried to emit sub
instead of subr when src0 has a NEG modifier.

This fixes the following piglit tests glsl-fs-loop-nested
and glsl-vs-loop-nested.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-10-26 22:18:04 +02:00
Erik Faye-Lund
aca491341b compiler: avoid warning about redefinition of PYTHON_GEN
PYTHON_GEN is defined to the exact same thing in both
Makefile.glsl.am and Makefile.nir.am. This makes automake complain,
so let's lift the definition up to Makefile.am, the same way as
MKDIR_GEN.

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-26 14:54:26 +01:00
Eric Engestrom
4fa799ae04 egl/dri2: swap_buffers_with_damage falls back to swap_buffers
Since commit 0a606a400f ("egl: add eglSwapBuffersWithDamageKHR"),
Android has been broken because the function eglSwapBuffersWithDamageKHR
is provided regardless of the extension being present. Also, the Android
meta-EGL always advertises the extension regardless of the underlying
EGL implementation. As there doesn't seem to be a simple way
conditionally make the EGL function ptr NULL, just implement a brain
dead version of eglSwapBuffersWithDamage{KHR,EXT}.

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
CC: Rob Clark <robdclark@gmail.com>
Suggested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Rob Herring <robh@kernel.org>
[Emil Velikov: copy the original commit message from Rob's patch]
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-26 12:04:21 +01:00
Emil Velikov
294b5f5f71 compiler: automake: add shader_info.h to the sources list
Otherwise it'll be missing from the tarball.

Fixes: 094fe3a959 ("nir: move nir_shader_info to a common compiler header")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-26 12:04:02 +01:00
Marek Olšák
1ac40173c2 configure.ac: simplify EGL requirements for drivers dependent on EGL
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
b687f766fd st/mesa: allow multiple concurrent waiters in ClientWaitSync
so->fence can be unreferenced by one thread while another thread is
somewhere in ClientWaitSync and expecting so->fence to be non-NULL.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98172

Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
f240ad98bc st/mesa: unduplicate st_check_sync code
It's the same as st_client_wait_sync. Discovered by Michel.
This is needed to make the following fix simpler.

Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
ad45dce4a2 radeonsi: remove si_resource_create_custom
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
29144d0f34 gallium/radeon: stop using PIPE_BIND_CUSTOM
it has no effect whatsoever

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
e3c3a7fada r600g: remove a redundant buffer_create helper
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
3dc78c33a9 gallium/radeon: remove unused r600_cmask_info members
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
d18bf0b944 gallium/radeon: don't force the same tiling parameters for FMASK
GCN can use a completely different tile mode for FMASK.

FMASK allocation now skips one unrelated amdgpu_surface_init codepath as
hinted by the assertion.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
ecf045b4f7 winsys/amdgpu: allocate FMASK properly
I expect no change in behavior, because r600_texture.c forces the same
tile mode as the base texture has.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
24faeb94be gallium/radeon: print tiling index when printing texture info
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
37659071b8 gallium/radeon: don't do (fmask.size && cmask.size)
fmask implies that cmask is present too.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
2664351dfe gallium/radeon: re-order radeon_surf::dcc and htile members
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
2a2e537577 gallium/radeon: rename bo_size -> surf_size, bo_alignment -> surf_alignment
these names were misleading.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
67a44c97af gallium/radeon: remove flags specific to libdrm_radeon from winsys interface
These just say whether libdrm can assume that the latest radeon_surface
definition is used by Mesa.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
7a706ad25c gallium/radeon: remove r600_htile_info
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
7e73ff87c0 gallium/radeon: remove unnecessary fields from radeon_surf_level
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
d5c7ea3b83 gallium/radeon: decrease the size of radeon_surf
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
e9590d9092 gallium/radeon: pass pipe_resource and other params to surface_init directly
This removes input-only parameters from the radeon_surf structure.

Some of the translation logic from pipe_resource to radeon_surf is moved to
winsys/radeon.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
8b94976df9 radeon/vce: use nblk_y instead of npix_y
npix_y will be removed. level[0].npix_y will be removed too. nblk_y should
be the same as npix_y if the block height == 1. However, nblk_y is aligned
to the tile size, so it can be greater than npix_y.

If that's a problem, we'll have to save the input height of surface_init
and use that.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
ba174b8dff gallium/radeon: define RADEON_SURF_MODE_* as enums
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
b5118fe054 gallium/radeon: stop using some input fields from radeon_surface
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
28d237d63d gallium/radeon: fold r600_setup_surface into r600_init_surface
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
b0d8a717a7 winsys/amdgpu: remove unused definitions
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
81a95946da gallium/radeon: fold radeon_winsys::surface_best into radeon/winsys
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
dc6bbe2dd0 gallium/radeon: use r600_gfx_write_event_eop everywhere
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
462e3cdf3b gallium/radeon: make r600_gfx_write_fence more generic
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
edf56fb428 gallium/radeon: fix a ZPASS comment, EVENT_WRITE_EOP fixups
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
d883c83ba9 radeonsi: enable SDMA on Carrizo and all CIK chips again
SDMA might be fixed by:
  "winsys/amdgpu: fix radeon_surf::macro_tile_index for imported textures"

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
6ec3b2a4b1 winsys/amdgpu: fix radeon_surf::macro_tile_index for imported textures
Maybe this is why SDMA has been broken for many amdgpu users?

SDMA is the only block which is used with imported textures and relies
on this variable. DB also uses it, but it doesn't get imported textures,
so it's unaffected.

I do get SDMA failures on Tonga before this patch if R600_DEBUG=testdma
is changed to use imported textures.

Cc: 11.2 12.0 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
dce05b3423 gallium/radeon: make sure the address of separate CMASK is aligned properly
This should fix random GPU hangs on Hawaii and Fiji.

Cc: 11.2 12.0 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Marek Olšák
8a21f52d73 gallium/radeon: fix incorrect bpe use in si_set_optimal_micro_tile_mode
Oh my god, I wonder what catastrophic issues this was causing on SI.

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-26 13:02:58 +02:00
Samuel Iglesias Gonsálvez
0e742926c6 glsl: update default precision qualifier when it is set in the shader
Default precision qualifier for a data type could be set several times
inside a shader. This patch allows to update the default precision
qualifier for the given type that is saved in the symbol table.

If it is not in the symbol table, just add it.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97804
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-10-26 11:57:07 +02:00
Samuel Iglesias Gonsálvez
dfbdb2c0b3 mesa/program: Add _mesa_symbol_table_replace_symbol()
This function allows to modify an existing symbol.

v2:
- Remove namespace usage now that it was deleted.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-10-26 11:57:02 +02:00
Timothy Arceri
2e423ca147 nir: stop adjusting driver location for varying packing
As of 59864e8e02 we just use the location assigned by the front-end and
no longer need this for i965.

Since there were some issues in the logic with assigning arrays the same
driver location if they didn't start at the same location just remove it
and let other drivers implement a solution if needed when they add
ARB_enhanced_layouts support.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-26 14:29:36 +11:00
Timothy Arceri
4ac6686165 compiler: remove copy_shader_info()
This temporary helper is no longer needed now that we have finished
refactoring common shader metadata.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
9972c591e7 glsl: set uses texture gather directly in shader_info
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
4016f08854 glsl/st/mesa: use common system values read field
And set system values read directly in shader_info.

st/mesa changes where:
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
2f59f3eee5 glsl: set patch outputs written directly in shader_info
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
f79d37f1ec st/mesa: use common patch outputs written field
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-26 14:29:36 +11:00
Timothy Arceri
419de307dc glsl: set patch inputs read directly in shader_info
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
3d2a503998 st/mesa: use common patch inputs read field
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-26 14:29:36 +11:00
Timothy Arceri
fdf42d3abc glsl: set outputs read directly in shader_info
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
5346630593 r200/glsl/st/mesa: use common outputs written field
And set outputs written directly in shader_info.

st/mesa changes where:
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
b4b450a5cb mesa/glsl: set double inputs read directly in shader_info
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
24093975e8 st/mesa: use common double inputs read field
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
e81aaeba37 r200/i915/st/mesa/compiler: use common inputs read field
And set set inputs_read directly in shader_info.

To avoid regressions between changes this change is a squashed
version of the following patches.

st/mesa changes where:
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
dfcbdba471 mesa/compiler: copy early fragment tests to shader_info in _mesa_copy_linked_program_data()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
6c2fcf6a8a meta: remove remaining tabs in meta.c
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
3d8947824f i965: replace brw_compute_program with brw_program
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
13d0cf57bf i965: replace brw_fragment_program with brw_program
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
46a4e4257e i965: replace brw_tess_{eval,ctrl}_program with brw_program
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
649bdb1f03 i965: replace brw_geomerty_program with brw_program
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
cb2d181944 i965: replace brw_vertex_program with new generic brw_program
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
3423488d55 st/mesa/r200/i915/i965: eliminate gl_fragment_program
Here we move OriginUpperLeft and PixelCenterInteger into gl_program
all other fields have been replace by shader_info.

V2: Don't use anonymous union/structs to hold vertex/fragment fields
suggested by Ian.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
17e28a1571 i965/mesa/st/swrast: set fs shader_info directly and switch to using it
Note we access shader_info from the program struct rather than the
nir_shader pointer because shader cache won't create a nir_shader.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
91d5b0eda9 mesa: remove now unused IsCentroid from gl_fragment_program
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
d9d04373c1 st/mesa: get interpolation location at translation time
Rather then messing around creating bitfields and arrays to store
the interpolation location just translate it on the fly.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
a317b40d1d i965: remove unused debug param
This was accidently disabled in 832bcc3613 not long after it was added.

Since it's only for gen5 and lower we might as well just remove it rather
than fixing it.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
89512987c6 compiler: update the comment for enum glsl_interp_mode
We no longer store the interp mode with the program metadata.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
aa881e4dc0 glsl: remove now unused InterpQualifier
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
c596f47b80 i965: remove unused BRW_STATE_INTERPOLATION_MAP flag
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
91d61fbf7c i965: rewrite brw_setup_vue_interpolation()
Here brw_setup_vue_interpolation() is rewritten not to use the InterpQualifier
array in gl_fragment_program which will allow us to remove it.

This change also makes the code which is only used by gen4/5 more self contained
as it now has its own gen5_fragment_program struct rather than storing the map
in brw_context. This means the interpolation map will only get processed once
and will get stored in the in memory cache rather than being processed everytime
the fs changes.

Also by calling this from the fs compile code rather than from the upload code
and using the interpolation assigned there we can get rid of the
BRW_NEW_INTERPOLATION_MAP flag.

It might not seem ideal to add a gen5_fragment_program struct however by the end
of this series we will have gotten rid of all the brw_{shader_stage}_program
structs and replaced them with a generic brw_program struct so there will only
be two program structs which is better than what we have now.

V2: Don't remove BRW_NEW_INTERPOLATION_MAP from dirty_bit_map until the following
patch to fix build error.

V3 - Suggestions by Jason:
- name struct gen4_fragment_program rather than gen5_fragment_program
- don't use enum with memset()
- create interp mode set helper and simplify logic to call it
- add assert when calling function to show prog will never be NULL for
 gen4/5 i.e. no Vulkan

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-26 14:29:36 +11:00
Timothy Arceri
20c0e67501 st/mesa: stop making use of InterpQualifier array
A following patch is going to merge the gl_fragment_program struct into
a common gl_program and we want to avoid all stages having this array.

V2: use TGSI_INTERPOLATE_COUNT as the temporary placeholder. Suggested by
Marek.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
cdafd499cc mesa: remove unrequired code
InterpQualifier is never set for ARB programs so this will do nothing.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
9605b98a07 i965/mesa/st: eliminate gl_compute_program
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
5a228c0aae mesa: set cs shader_info metadata directly
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
4ca71a1175 st/mesa: switch cs over to shared shader_info
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
54095ed8b9 compiler: add additional cs metadata fields to shader info
And copy values from GLSL.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
81faead818 mesa/i965/i915/r200: eliminate gl_vertex_program
Here we move the only field in gl_vertex_program to the
ARB program fields in gl_program.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
0ab51f8e16 i965: switch vs over to shared shader_info
Note we access shader_info from the program struct rather than the
nir_shader pointer because shader cache won't create a nir_shader.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
92f77e9c01 i965/mesa/st: eliminate gl_geometry_program
We now get all the gs metadata from shader_info.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
9045ddcfe4 mesa: set gs shader_info metadata directly
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
8d7b25ee58 st/mesa: switch gs over to shared shader_info
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
288c96afce i965: switch gs over to shared shader_info
Note we access shader_info from the program struct rather than the
nir_shader pointer because shader cache won't create a nir_shader.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
b99ecaf872 compiler: add input primative field for gs in shader info
And copy the value from GLSL.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
67c2d80a83 i965/mesa/st: eliminate gl_tess_eval_program
We now get all the tes metadata from shader_info.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
65225c20c6 mesa: copy tes metadata directly to shared shader info
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
2be3dbd90b st/mesa: switch tes over to shared shader_info
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
4f1c415cc4 i965: switch tes over to shared shader_info
Note we access shader_info from the program struct rather than the
nir_shader pointer because shader cache won't create a nir_shader.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
088c25bfb7 compiler: add fields for tes metadata to shader info
And copy the values from gl_tess_eval_program struct.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
64d9773cfe i965/mesa/st: eliminate gl_tess_ctrl_program
We now get all the tcs metadata from shader_info.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
750b14ed8e mesa: set tcs shader_info metadata directly
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
7a4bbfa90d st/mesa: switch tcs over to shared shader_info
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
e6cecb837d i965: switch tcs over to shared shader_info
Note we access shader_info from the program struct rather than the
nir_shader pointer because shader cache won't create a nir_shader.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
9d2b391165 glsl: add temporary copy_shader_info() function
This function is added here to ease refactoring towards using the new shared
shader_info. Once refactoring is complete and values are set directly it
will be removed.

We call it from _mesa_copy_linked_program_data() rather than glsl_to_nir()
so that the values will be set for all drivers. In order to do this some
calls need to be moved around so that we make sure to call
do_set_program_inouts() before _mesa_copy_linked_program_data()

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
debed12fdd glsl: add a shader info field to the gl_program type
And use this field as the source for shader info in the nir_shader
this will allow us to set some of these fields from GLSL directly.

It will also simplify restoring from shader cache and allow the
removal of duplicate fields from GLSL.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
e1af20f18a nir/i965/anv/radv/gallium: make shader info a pointer
When restoring something from shader cache we won't have and don't
want to create a nir_shader this change detaches the two.

There are other advantages such as being able to reuse the
shader info populated by GLSL IR.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
094fe3a959 nir: move nir_shader_info to a common compiler header
This will allow use to stop copying values between structs and
will also simplify handling handling these values in the shader cache.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri
e40d32b3ec mesa: modify _mesa_copy_linked_program_data() to take gl_linked_shader
This allows us to do some small tidy ups, but will also allow us to call
a new function that copies values to a shared shader info from here.

In order to make this change this function now requires
_mesa_reference_program() to have previously been called.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Fredrik Höglund
68db0fe034 vulkan/wsi/wayland: fix ARGB window support
Use an ARGB format for the DRM buffer when the compositeAlpha field
in VkSwapchainCreateInfoKHR is set to
VK_COMPOSITE_ALPHA_PRE_MULTIPLIED_BIT_KHR.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-26 12:40:39 +10:00
Fredrik Höglund
972670c200 vulkan/wsi/x11: fix ARGB window support
Pass the correct depth to xcb_dri3_pixmap_from_buffer_checked().
Otherwise xcb_present_pixmap() fails with a BadMatch error.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-26 12:40:39 +10:00
Fredrik Höglund
0a153f4ee4 radv: mark the fence as submitted and signalled in vkAcquireNextImageKHR
This stops the debug layers from complaining when fences are used to
throttle image acquisition.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-26 12:25:35 +10:00
Vinson Lee
f2770fb3d5 scons: Require libdrm >= 2.4.66 for DRM.
configure.ac already requires 2.4.66.

Fix SCons build. drmDevicePtr is not available until libdrm 2.4.65.

  Compiling src/loader/loader.c ...
src/loader/loader.c:111:40: error: unknown type name ‘drmDevicePtr’
 static char *drm_construct_id_path_tag(drmDevicePtr device)
                                        ^

Fixes: 4a183f4d06 ("scons: loader: use libdrm when available")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98421
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
2016-10-25 15:48:12 -07:00
Matt Turner
14aac061e9 radv: Replace "abi_versions" with correct "api_version".
git history shows "abi_versions" was used from the outset.

Cc: <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98415
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-10-25 12:55:39 -07:00
Matt Turner
07755237d3 anv: Replace "abi_versions" with correct "api_version".
git history shows "abi_versions" was used from the outset.

Cc: <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98415
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-10-25 12:55:39 -07:00
Karol Herbst
0404678c5f nv50/ir: start LocalCSE with getFirst to merge PHI instructions
total instructions in shared programs : 3499888 -> 3499445 (-0.01%)
total gprs used in shared programs    : 453866 -> 453803 (-0.01%)
total local used in shared programs   : 21621 -> 21621 (0.00%)
total bytes used in shared programs   : 32078952 -> 32074936 (-0.01%)

                local        gpr       inst      bytes
    helped           0          39         119         119
      hurt           0           0           0           0

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-10-25 20:55:07 +02:00
Samuel Pitoiset
7b2712c367 nvc0: use correct bufctx when invalidating CP textures
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org>
2016-10-25 20:22:05 +02:00
Eduardo Lima Mitev
750d8cad72 vulkan/wsi/x11: Fix behavior of vkGetPhysicalDeviceSurfaceFormatsKHR
x11_surface_get_formats() is currently asserting that the number of
elements in pSurfaceFormats must be greater than or equal to the number
of formats available. This is buggy because pSurfaceFormatsCount
elements are later copied from the internal formats' array, so if
pSurfaceFormatCount is greater, it will overflow it.

On top of that, this assertion violates the spec. From the Vulkan 1.0
(revision 32, with KHR extensions), page 579 of the PDF:

    "If pSurfaceFormats is NULL, then the number of format pairs supported
     for the given surface is returned in pSurfaceFormatCount. Otherwise,
     pSurfaceFormatCount must point to a variable set by the user to the
     number of elements in the pSurfaceFormats array, and on return the
     variable is overwritten with the number of structures actually written
     to pSurfaceFormats. If the value of pSurfaceFormatCount is less than
     the number of format pairs supported, at most pSurfaceFormatCount
     structures will be written. If pSurfaceFormatCount is smaller than
     the number of format pairs supported for the given surface,
     VK_INCOMPLETE will be returned instead of VK_SUCCESS to indicate that
     not all the available values were returned."

So, the correct behavior is: if pSurfaceFormatCount is greater than the
internal number of formats, it is clamped to that many formats. But
if it is lesser than that, then pSurfaceFormatCount elements are copied,
and the call returns VK_INCOMPLETE.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-10-25 13:22:38 +02:00
Tapani Pälli
a1652a059e mesa: fix error handling in DrawBuffers
Patch rearranges error checking so that enum checking provided via
destmask happens before other checks. It needs to be done in this
order because other error checks do not work properly if there were
invalid enums passed.

Patch also refines one existing check and it's documentation to match
GLES 3.0 spec (also in later specs). This was somewhat mysteriously
referring to desktop GL but had a check for gles3.

Fixes following dEQP tests:

   dEQP-GLES31.functional.debug.negative_coverage.get_error.buffer.draw_buffers

no CI regressions observed.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98134
Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-25 08:04:11 +03:00
Tapani Pälli
5876f3c85a egl: add check that eglCreateContext gets a valid config
Fixes following dEQP test:

   dEQP-EGL.functional.negative_api.create_context

v2: don't break EGL_KHR_no_config_context (Eric Engestrom)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org>
2016-10-25 07:24:11 +03:00
Tapani Pälli
58b4fef8bb mesa: add missing formats to driGLFormatToImageFormat
Fixes following dEQP tests:

   dEQP-EGL.functional.image.api.create_image_gles2_tex2d_luminance
   dEQP-EGL.functional.image.api.create_image_gles2_tex2d_luminance_alpha

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98328
2016-10-25 07:24:11 +03:00
Tapani Pälli
282b87dd03 egl: fix type mismatch error type in _eglInitSurface
EGL spec defines EGL_BAD_MATCH for windows, pixmaps and pbuffers in
case where user creates a surface but config does not support rendering
to such surface type.

Following quotes are from EGL 1.5 spec 3.5 "Rendering Surfaces" :

for eglCreatePlatformWindowSurface, eglCreateWindowSurface:

   "If config does not support rendering to windows (the EGL_SURFACE_TYPE
   attribute does not contain EGL_WINDOW_BIT ), an EGL_BAD_MATCH error is
   generated."

for eglCreatePbufferSurface:

   "If config does not support pbuffers, an EGL_BAD_MATCH error is
   generated."

for eglCreatePlatformPixmapSurface, eglCreatePixmapSurface:

   "If config does not support rendering to pixmaps (the EGL_SURFACE_TYPE
   attribute does not contain EGL_PIXMAP_BIT ), an EGL_BAD_MATCH error is
   generated."

Fixes following dEQP test:

   dEQP-EGL.functional.negative_api.create_pbuffer_surface

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-10-25 07:24:11 +03:00
Tapani Pälli
1ef7873397 Revert "egl/android: Set EGL_MAX_PBUFFER_WIDTH and EGL_MAX_PBUFFER_HEIGHT"
This reverts commit b1d636aa00, previous
commit sets these values for all egl configs.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org>
Suggested-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-25 07:24:11 +03:00
Tapani Pälli
b91e1e38e8 egl/dri2: set max values for pbuffer width and height
While these max values were previously fixed for pbuffer creation, this
change makes also eglGetConfigAttrib() return correct values.

Fixes following dEQP tests:

   dEQP-EGL.functional.create_surface.pbuffer.rgb888_no_depth_no_stencil
   dEQP-EGL.functional.create_surface.pbuffer.rgb888_depth_stencil
   dEQP-EGL.functional.create_surface.pbuffer.rgba8888_no_depth_no_stencil
   dEQP-EGL.functional.create_surface.pbuffer.rgba8888_depth_stencil

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98326
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org>
2016-10-25 07:24:11 +03:00
Brian Paul
76c3f1bbbe gallium/stapi: fix comment for st_visual::buffer_mask
Trivial.
2016-10-24 17:22:00 -07:00
Nanley Chery
59385da39d isl/format: Correct ASTC entries of format info table
With the isl_format_supports* helpers, we can now conveniently
report support for this format on Cherry View.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92925
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-24 15:51:57 -07:00
Kenneth Graunke
41034abfe6 i965: Drop nir_inputs from fs_visitor.
It's unused.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-10-24 14:33:38 -07:00
Kenneth Graunke
59864e8e02 i965: Don't use nir_assign_var_locations for VS/TES/GS outputs.
Fixes spec/arb_enhanced_layouts/execution/component-layout/vs-fs-array-dvec3.

v2: Remove nir_outputs field from fs_visitor (caught by Tim and Iago).

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-10-24 14:33:38 -07:00
Kenneth Graunke
27715c73ff i965: Make split_virtual_grfs() call compact_virtual_grfs().
Post-splitting, VGRFs have a maximum size (MAX_VGRF_SIZE).  This is
required by the register allocator, as we have to create classes for
each size of VGRF.

We can (and do) allocate virtual registers larger than MAX_VGRF_SIZE,
but we must ensure that they are splittable.  split_virtual_grfs()
asserts that the post-splitting register size is in range.

Unfortunately, these trip for completely dead registers which are too
large - we only set split points for live registers.  So dead ones are
never split, and if they happened to be too large, they'd trip asserts.

To fix this, call compact_virtual_grfs() to eliminate dead registers
before splitting.

v2: Add a comment written by Iago.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-10-24 14:33:38 -07:00
Kenneth Graunke
3728ee000a i965: Drop unnecessary switch statement in nir_setup_outputs()
TCS and FS are skipped above.  CS has no output variables.
All remaining cases take the same path.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-10-24 14:33:38 -07:00
Brian Paul
88a618ce86 tgsi: trivial build fix for MSVC
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-24 14:16:07 -07:00
Samuel Pitoiset
6dbb8d12a8 nv50/ir: do not perform global membar for shared memory
Shared memory is local to CTA, thus we should only wait for
prior memory writes which are visible to other threads in
the same CTA, and not at global level. This should speedup
compute shaders which use shared memory.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-24 22:51:54 +02:00
Axel Davy
eed605a473 st/nine: Fix locking CubeTexture surfaces.
Only one face of Cubetextures was locked when in DEFAULT Pool.
Fixes:
https://github.com/iXit/Mesa-3D/issues/129

CC: "12.0 13.0" <mesa-stable@lists.freedesktop.org>

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-24 21:56:44 +02:00
Axel Davy
fe7bb46134 st/nine: Fix mistake in Volume9 UnlockBox
In the format fallback path,
the height was used instead of the depth.

CC: "12.0 13.0" <mesa-stable@lists.freedesktop.org>

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-24 21:56:44 +02:00
Axel Davy
942778099e st/nine: Use align_calloc instead of align_malloc
We are not sure exactly what needs to be 0 initialized,
but we are missing some cases. 0 initialize all our current
aligned allocation.

Fixes Tree of Savior visual issues.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-24 21:56:44 +02:00
Axel Davy
54010cf8b6 gallium/util: Add align_calloc
Add implementation for align_calloc,
which is align_malloc + memset.

v2: add if (ptr) before memset.
Fix indentation.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-24 21:56:44 +02:00
Axel Davy
25beccb379 st/nine: Fix leak with integer and boolean constants
Leak introduced by:
a83dce0128

The patch also moves the part to
release changed.vs_const_i and changed.vs_const_b
before the if (!cb.buffer_size) check,
to avoid reuploading every draw call if
integer or boolean constants are dirty, but the shaders
use no constants.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
CC: "13.0" <mesa-stable@lists.freedesktop.org>
2016-10-24 21:56:44 +02:00
Marek Olšák
f35b1d156b tgsi/scan: scan texture offset operands
This seems important considering how much we depend on some of the flags.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-24 21:41:38 +02:00
Marek Olšák
a2f98dff14 tgsi/scan: move src operand processing into a separate function
the next commit will need this

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-24 21:41:36 +02:00
Marek Olšák
72267a25db tgsi/scan: get information about shader buffer usage
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-24 21:41:35 +02:00
Marek Olšák
d89890d000 tgsi/scan: handle indirect image indexing correctly
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-24 21:41:33 +02:00
Marek Olšák
ac37720f51 tgsi/scan: don't treat RESQ etc. as memory instructions
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-24 21:41:30 +02:00
Marek Olšák
f095a4eb17 tgsi/scan: get information about indirect 2D file access
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-24 21:41:28 +02:00
Marek Olšák
965a5f1810 tgsi/scan: get information about indirect CONST access
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-24 21:41:26 +02:00
Anuj Phogat
35010718bc i965/gen8: Don't enable alpha test and alpha to coverage if draw bufer zero is integer type
We follow this rule at multiple places in i965 driver. This patch
doesn't fix any testcase.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-24 11:07:39 -07:00
Anuj Phogat
93b84cae54 i965/gen8: Use DrawBuffer->_IntegerBuffers in gen8_upload_ps_blend()
No functional changes in this patch.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-24 11:07:39 -07:00
Anuj Phogat
e2dd582de8 i965/gen8: Use DrawBuffer->_IntegerBuffers in gen8_upload_blend_state()
No functional changes in this patch.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-24 11:07:39 -07:00
Samuel Pitoiset
d588e4f192 nv50/ir: display OP_BAR subops in debug mode
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-24 18:53:45 +02:00
Iago Toral Quiroga
537dce06ec glsl: add matrix layout information to interface block types
So far we have been checking that interface block definitions had matching
matrix layouts by comparing the definitions of their fields, however, this
does not cover the case where the interface blocks are defined with
mismatching matrix layouts but don't define any field with a matrix type.
In this case Mesa will not fail to link because none of the fields will
inherit the mismatching layout qualifier.

This patch fixes the problem in the same way we fixed it for packing layout
information: we add the the layout information to the interface type and then
we check it matches during the uniform block linking process.

v2: Fix unit tests so they pass the new parameter to
    glsl_type::get_interface_instance()

Fixes:
dEQP-GLES31.functional.shaders.linkage.uniform.block.layout_qualifier_mismatch_3

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98245
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
2016-10-24 15:49:53 +02:00
Nicolai Hähnle
3d6b5dee3a st/mesa: cleanup and fix primitive restart for indirect draws
There are three intended functional changes here:

1. OpenGL 4.5 clarifies that primitive restart should only apply with index
   buffers, so make that change explicit in the indirect draw path.

2. Make PrimitiveRestartFixedIndex work with indirect draws.

3. The change where primitive_restart is only set when the restart index can
   actually have an effect (based on the size of indices) is also applied for
   indirect draws.

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-24 15:24:23 +02:00
Timothy Arceri
6dbe8a1b9f glsl/mesa: remove unused namespace support from the symbol table
Namespace support seems to have been unused for a very long time.

Previously the hash table entry was never removed and the symbol name
wasn't freed until the symbol table was destroyed.

In theory this could reduced the number of times we need to copy a string
as duplicate names are reused. However in practice there is likely only a
limited number of symbols that are the same and this is likely to cause
other less than optimal behaviour such as the hash_table continuously
growing.

Along with dropping namespace support this change removes entries from
the hash table as they become unused.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-10-24 21:40:39 +11:00
Jonathan Gray
907ace5798 mapi: automake: set VISIBILITY_CFLAGS for shared glapi
shared glapi was previously built without setting CFLAGS for
AM_CFLAGS and VISIBILITY_CFLAGS.

This resulted in symbols being exported that shouldn't be.

The x86 and sparc assembly versions of the dispatch table partially
mitigated this by using .hidden.  Otherwise shared_dispatch_stub_*
were being exported.

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: "11.2 12.0 13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-10-24 11:29:23 +01:00
Emil Velikov
8df581520a anv: automake: cleanup the generated json file during make clean
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-24 11:29:12 +01:00
Stencel, Joanna
2e0ab61e29 egl/wayland: add missing destroy_window callback
The original patch by Joanna added the function pointer and callback yet
things got only partially applied - the infra was added, but the
implementation was missing.

Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org>
Fixes: 690ead4a13 ("egl/wayland-egl: Fix for segfault in
dri2_wl_destroy_surface.")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-10-24 09:50:53 +01:00
Emil Velikov
3511a86111 automake: don't forget to pick wglext.h in the tarball
Earlier commit reworked the header install rules, to ensure that the
correct ones are installed only as needed.

By doing so it dropped a wildcard which was effectively including the
wglext.h header in the tarball.

Add the header to the top-level noinst_HEADERS, since the it is not
meant to be installed (autoconf is not used on Windows plaforms).

Fixes: a89faa2022 ("autoconf: Make header install distinct for various
APIs (v2)")
Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org>
Cc: Chuck Atkins <chuck.atkins@kitware.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-10-24 09:44:26 +01:00
Samuel Iglesias Gonsálvez
b50b82b8a5 glsl/es31: precision qualifier doesn't need to match in shader interface block members
It is specific only to GLSL ES 3.1. From the spec, section 4.3.9
"Interface Blocks":

"Matched block names within a shader interface (as defined above) must
 match in terms of having the same number of declarations with the same
 sequence of types and the same sequence of member names, as well as
 having the same qualification as specified in section 9.2 (“Matching
 of Qualifiers“)."

But in GLSL ES 3.0 and 3.2, it is the opposite:

"Matched block names within a shader interface (as defined above) must
 match in terms of having the same number of declarations with the same
 sequence of types, precisions and the same sequence of member names,
 as well as having the matching member-wise layout qualification as
 defined in section 9.2 (“Matching of Qualifiers”)."

Fixes:

dEQP-GLES31.functional.shaders.linkage.uniform.block.differing_precision

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98243
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-10-24 07:04:38 +02:00
Samuel Iglesias Gonsálvez
849390a61a glsl: move intrastage_match() after interstage_member_mismatch()
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-10-24 07:04:32 +02:00
Dave Airlie
a969548f59 radv: allow cmask transitions without fast clear
This fixes
dEQP-VK.pipeline.multisample.sampled_image*

These all render to multisampled image, and then
sample from it, so we must transition it correctly,
since we have a cmask and fmask this will cause
the correct transition.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-24 11:03:09 +10:00
Ilia Mirkin
7b7eb7170d nv50/ir: it appears that OP_DISCARD can't take a join modifier
nvdisasm does not print a .S even though the bit is set.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-10-22 12:02:35 -04:00
Ilia Mirkin
adad576bfc nv50/ir: use levelZero for non-frag tex/txp ops
radeonsi also does the same thing. I suspect that this is likely to be a
no-op in reality, but it brings nouveau code closer to what the blob
produces. Plus it makes sense to not try to do auto-derivatives on this.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-10-22 12:02:35 -04:00
Ilia Mirkin
3fdeb7c983 gallium: add PIPE_CAP_STREAM_OUTPUT_INTERLEAVE_BUFFERS
This allows the driver to signal that it can't handle random
interleaving of attributes across buffers. This is required for
ARB_transform_feedback3, and it's initialized to whatever the previous
value of PIPE_CAP_STREAM_OUTPUT_PAUSE_RESUME was except for nv50 where
it is disabled. Note that the proprietary drivers never expose
ARB_transform_feedback3 on any GT21x's (where nouveau previously did),
and after some effort I was unable to get it to work.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-22 12:02:35 -04:00
Samuel Pitoiset
6e08f3e96c nvc0/ir: remove outdated comment about SHLADD
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-22 14:50:17 +02:00
Eric Anholt
8ff4182876 vc4: Avoid making temporaries for assignments to NIR registers.
Getting stores to NIR regs to not generate new MOVs is tricky, since the
result we're trying to store into the NIR reg may have been from a
conditional update of a temp, or a series of packed writes.  The easiest
solution seems to be to require that nir_store_dest()'s arg comes from an
SSA temp.

This causes us to put in a few more temporary MOVs in the NIR SSA dest
case, but copy propagation successfully cleans those up.

The shader-db change is modest:

total instructions in shared programs: 93774 -> 93598 (-0.19%)
instructions in affected programs:     14760 -> 14584 (-1.19%)
total estimated cycles in shared programs: 212135 -> 211946 (-0.09%)
estimated cycles in affected programs:     27005 -> 26816 (-0.70%)

but I was seeing patterns in some register-allocation failures in DEQP
tests that looked like the extra MOVs would increase maximum register
pressure in loops.  Some debug code indicates that that's not the case,
though I'm still a bit confused by that result.
2016-10-21 14:12:22 -07:00
Eric Anholt
a689b8b9df vc4: Add a comment with discussion of how simulation works. 2016-10-21 14:12:22 -07:00
Eric Anholt
83ffb607b7 vc4: Move simulator winsys mapping and tracking to the simulator.
One tiny hack is left in vc4_bufmgr.c for what kind of mapping we got so
that we can free it.
2016-10-21 14:12:22 -07:00
Eric Anholt
1c38ee380d vc4: Move simulator memory management to a u_mm.h heap.
Now we aren't limited to 256MB total allocated across a driver instance,
just 256MB at one time.  We're still copying in and out, which should get
fixed.
2016-10-21 14:12:22 -07:00
Eric Anholt
9f75522382 vc4: Move simulator globals into a struct.
I would like to put a couple more things in here, so it's time to package
it up.
2016-10-21 14:12:22 -07:00
Eric Anholt
78087676c9 vc4: Restructure the simulator mode.
Rather than having simulator mode changes scattered around vc4_bufmgr.c
and vc4_screen.c, make vc4_bufmgr.c just call a vc4_simulator_ioctl, which
then dispatches to a corresponding implementation.

This will give the simulator support a centralized place to do tricks like
storing most BOs directly in simulator memory rather than copying in and
out.

This leaves special casing of mmaping BOs and execution, because of the
winsys mapping.
2016-10-21 14:12:22 -07:00
Eric Anholt
1d7874fa7b vc4: Fix termination of the initial scan for branch targets.
The loop is scanning until the original max_ip (size of the BO), but we
want to not examine any code after the PROG_END's delay slots.  There was
a block trying to do that, except that we had some early continue
statements if the signal wasn't a PROG_END or a BRANCH.

The failure mode would be that a valid shader is rejected because some
undefined memory after the PROG_END slots is parsed as a branch and the
rest of its setup is illegal.  I haven't seen this in the wild, but
valgrind was complaining and the new userland simulator code started
triggering it.
2016-10-21 14:12:06 -07:00
Jason Ekstrand
3f05fc62f9 configure: Get rid of the --disable-vulkan-icd-full-driver-path flag
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-10-21 09:30:24 -07:00
Jason Ekstrand
7ea4ef8849 anv: Always use the full driver path in the intel_icd.*.json
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-10-21 09:30:23 -07:00
Jason Ekstrand
d96345de98 anv: Suffix the intel_icd file with the host CPU
Vulkan has a multi-arch problem... The idea behind the Vulkan loader is
that you have a little json file on your disk that tells the loader where
to find drivers.  The loader looks for these json files in standard
locations, and then goes and loads the my_driver.so's that they specify.
This allows you as a driver implementer to put their driver wherever on the
disk they want so long as the ICD points in the right place.

For a multi-arch system, however, you may have multiple libvulkan_intel.so
files installed that the loader needs to pick depending on architecture.
Since the ICD file format does not specify any architecture information,
you can't tell the loader where to find the 32-bit version vs. the 64-bit
version.  The way that packagers have been dealing with this is to place
libvulkan_intel.so in the top level lib directory and provide just a name
(and no path) to the loader.  It will then use the regular system search
paths and find the correct driver.  While this solution works fine for
distro-installed Vulkan drivers, it doesn't work so well for user-installed
drivers because they may put it in /opt or $HOME/.local or some other more
exotic location.  In this case, you can't use an ICD json file with just a
library name because it doesn't know where to find it; you also have to add
that to your library lookup path via LD_LIBRARY_PATH or similar.

This patch handles both use-cases by taking advantage of the fact that the
loader dlopen()s each of the drivers and, if one dlopen() calls fails, it
silently continues on to open other drivers.  By suffixing the icd file, we
can provide two different json files: intel_icd.x86_64.json and
intel_icd.i686.json with different paths.  Since dlopen() will only succeed
on the libvulkan_intel.so of the right arch, the loader will happily ignore
the others and load that one.  This allows us to properly handle multi-arch
while still providing a full path so user installs will work fine.

I tested this on my Fedora 25 machine with 32 and 64-bit builds of our
Vulkan driver installed and 32 and 64-bit builds of crucible.  It seems to
work just fine.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-10-21 09:30:20 -07:00
Nicolai Hähnle
17353ef043 radeonsi: fix a regression in si_eliminate_const_output
A constant value of float type is not necessarily a ConstantFP: it could also
be a constant expression that for some reason hasn't been folded.

This fixes a regression in GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2
that was introduced by commit 3ec9975555.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-21 09:59:26 +02:00
Ilia Mirkin
8cf0f05713 nv50,nvc0: don't keep track of whether fb rt0 is integer-only
This reverts commits 1af0641db3 and
a6ad49cbbd.

st/mesa adjusts the rasterizer state for us now.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-10-21 02:28:26 -04:00
Francisco Jerez
811eb7f178 Revert "Revert "mapi: export all GLES 3.2 functions in libGLESv2.so""
This reverts commit 85e9bbc14d.  The
previous commit should help with the scons build failure caused by the
original commit.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2016-10-20 15:57:18 -07:00
Francisco Jerez
15a084a039 glapi: Move PrimitiveBoundingBox and BlendBarrier definitions into ES3.2 category.
These two GLES 3.2 entry points were being defined in the category of
the ARB_ES3_2_compatibility and KHR_blend_equation_advanced extensions
respectively instead of in the ES3.2 category.  Defining them in the
ES3.2 category makes sure that the gl_procs.py generator emits
declarations in the glprocs.h header file for the unsuffixed GLES-only
entry points that PrimitiveBoundingBoxARB and BlendBarrierKHR
respectively alias.  This should avoid a compilation failure during
scons builds in combination with "mapi: export all GLES 3.2 functions
in libGLESv2.so".

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2016-10-20 15:55:21 -07:00
Vinson Lee
889ee4da05 util: Include string.h in bitscan.h.
Fix build error with clang.

  Compiling src/compiler/glsl/link_varyings.cpp ...
In file included from src/compiler/glsl/link_varyings.cpp:33:
In file included from src/compiler/glsl/glsl_symbol_table.h:34:
In file included from src/compiler/glsl/ir.h:33:
In file included from src/compiler/glsl_types.h:29:
/usr/include/string.h:518:12: error: exception specification in declaration does not match previous declaration
extern int ffs (int __i) __THROW __attribute__ ((__const__));
           ^
src/util/bitscan.h:51:13: note: expanded from macro 'ffs'
            ^
src/util/bitscan.h:96:18: note: previous declaration is here
   const int i = ffs(*mask) - 1;
                 ^
src/util/bitscan.h:51:13: note: expanded from macro 'ffs'
            ^

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97952
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-10-20 14:54:28 -07:00
Samuel Pitoiset
42273edf79 nvc0: do not break 3D state by pushing MS coordinates on Fermi
Long story short, 3D and CP are aliased on Fermi and initializing
compute after pushing the MS sample coordinate offsets seems to
corrupt 3D state for weird reasons.

I still don't have the faintest clue what is going on, but
this seems to only affect Fermi generation. A possible fix
could be to use two different channels, one for 3D and one
for CP.

This fixes a bunch of regressions pinpointed by piglit.

Fixes: "nvc0: fix up image support for allowing multiple samples"
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-20 19:59:08 +02:00
Samuel Pitoiset
24e15aa198 nvc0: translate compute shaders at program creation
This makes shader-db reports results for compute shaders.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-20 19:46:18 +02:00
Ben Widawsky
ffd9060b23 i965: Reorder PCI ID list to match release order
I have some OCD...

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2016-10-20 08:54:03 -07:00
Ben Widawsky
b8509c8936 i965: Add some APL and KBL SKU strings
We got a couple for products that exist on ark.intel.com, so let's just
put them in now.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2016-10-20 08:54:03 -07:00
Brian Paul
bd60fb49ba vbo: clean up with 'indent', whitespace fixes, etc in vbo_exec_array.c
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-10-20 09:47:21 -06:00
Brian Paul
8b9965442a vbo: whitespace fixes and reformatting in vbo_exec_api.c
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-10-20 09:47:21 -06:00
Brian Paul
8320bf1a7e vbo: minor clean-up in vbo_exec_api.c
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-10-20 09:47:21 -06:00
Brian Paul
1098e6957c vbo: move attribute type assignment
If the attribute type is changing, we would have found that earlier in
the ATTR_UNION() macro and would have called vbo_exec_fixup_vertex().
So move the assignment into that function so we don't do it every time.

No Piglit regressions.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-10-20 09:47:21 -06:00
Brian Paul
4c3c9f1441 vbo: rename reset_attrfv() to vbo_reset_all_attr()
Use a better name.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-10-20 09:47:21 -06:00
Brian Paul
7693bcde28 vbo: make vbo_reset_attr() static
Not called from any other file.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-10-20 09:47:21 -06:00
Brian Paul
9d6d9b28f7 vbo: trivial indentation fix in vbo_exec_api.c 2016-10-20 09:47:21 -06:00
Marek Olšák
c2a602d21a gallivm: try to fix build with LLVM <= 3.4 due to missing CallSite.h
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2016-10-20 17:45:23 +02:00
Marek Olšák
f19f71830b radeonsi: fix build of si_eliminate_const_vs_outputs on LLVM <= 3.8
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-20 11:07:50 +02:00
Marek Olšák
2db56434d4 gallivm: add wrappers for missing functions in LLVM <= 3.8
radeonsi needs these.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-20 11:07:50 +02:00
Nicolai Hähnle
4a2dbfff05 radeonsi: fix 64-bit loads from LDS
Fixes spec/arb_tessellation_shader/execution/dvec[23]-vs-tcs-tes, among
others.

Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-20 10:37:07 +02:00
Nicolai Hähnle
bfa50f88ce st/mesa: only set primitive_restart when the restart index is in range
Even when enabled, primitive restart has no effect when the restart index
is larger than the representable values in the index buffer.

Fixes GL45-CTS.gtf31.GL3Tests.primitive_restart.primitive_restart_upconvert
for radeonsi VI.

v2: add an explanatory comment

Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
2016-10-20 10:37:06 +02:00
Nicolai Hähnle
3d9b57e493 st/glsl_to_tgsi: sort input and output decls by TGSI index
Fixes a regression introduced by commit 777dcf81b.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98307
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: 13.0 <mesa-stable@lists.freedesktop.org>
2016-10-20 10:37:06 +02:00
Nicolai Hähnle
a1895685f8 st/glsl_to_tgsi: fix block copies of arrays of structs
Use a full writemask in this case. This is relevant e.g. when a function
has an inout argument which is an array of structs.

v2: use C-style comment (Timothy Arceri)

Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Cc: 13.0 <mesa-stable@lists.freedesktop.org>
2016-10-20 10:37:01 +02:00
Nicolai Hähnle
ca592af880 st/glsl_to_tgsi: fix block copies of arrays of doubles
Set the type of the left-hand side to the same as the right-hand side,
so that when the base type is double, the writemask of the MOV instruction
is properly fixed up.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: 13.0 <mesa-stable@lists.freedesktop.org>
2016-10-20 10:30:00 +02:00
Iago Toral Quiroga
3da08e1664 glsl: Indirect array indexing on non-last SSBO member must fail compilation
After the changes in comit 5b2675093e, we moved this check to the
linker, but the spec expects this to be checked at compile-time. There are
dEQP tests that expect an error at compile time and the spec seems to confirm
that expectation:

"Except for the last declared member of a shader storage block (section 4.3.9
 “Interface Blocks”), the size of an array must be declared (explicitly sized)
 before it is indexed with anything other than an integral constant expression.
 The size of any array must be declared before passing it as an argument to a
 function. Violation of any of these rules result in compile-time errors. It
 is legal to declare an array without a size (unsized) and then later
 redeclare the same name as an array of the same type and specify a size, or
 index it only with integral constant expressions (implicitly sized)."

Commit 5b2675093e tries to take care of the case where we have implicitly
sized arrays in SSBOs and it does so by checking the max_array_access field
in ir_variable during linking. In this patch we change the approach: we look
for indirect access on SSBO arrays, and when we find one, we emit a
compile-time error if the accessed member is not the last in the SSBO
definition.

There is a corner case that the specs do not address directly though and that
dEQP checks for: the case of an unsized array in an SSBO definition that is
not defined last but is never used in the shader code either. The following
dEQP tests expect a compile-time error in this scenario:

dEQP-GLES31.functional.debug.negative_coverage.callbacks.shader.compile_compute_shader
dEQP-GLES31.functional.debug.negative_coverage.get_error.shader.compile_compute_shader
dEQP-GLES31.functional.debug.negative_coverage.log.shader.compile_compute_shader

However, since the unsized array is never used it is never indexed with a
non-constant expression, so by the spec quotation above, it should be valid and
the tests are probably incorrect.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-20 08:26:51 +02:00
Ilia Mirkin
cd45d758ff nv50/ir: process texture offset sources as regular sources
With ARB_gpu_shader5, texture offsets can be any source, including TEMPs
and IN's. Make sure to process them as regular sources so that we pick
up masks, etc.

This should fix some CTS tests that feed offsets directly to
textureGatherOffset, and we were not picking up the input use, thus not
advertising it in the shader header.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Dave Airlie <airlied@redhat.com>
Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org>
2016-10-19 21:02:01 -04:00
Ilia Mirkin
313fba5ee1 nv50,nvc0: avoid reading out of bounds when getting bogus so info
The state tracker tries to attach the info to the wrong shader. This is
easy enough to protect against.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org>
2016-10-19 21:02:01 -04:00
Eric Engestrom
8bf7717e1f wsi/wayland: fix error path
Fixes: 1720bbd353 ("anv/wsi: split image alloc/free out to separate fns.")
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-20 10:53:59 +10:00
Dave Airlie
b0f131b0bf anv: drop unused zero macro.
I can't see this being used anywhere.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-20 10:53:37 +10:00
Dave Airlie
d842546ad1 radv: use emit_icmp for samples_identical
On a debug llvm build we'd assert on the next compare
when the return from samples_identical was i1 instead
of i32.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-20 01:43:55 +01:00
Jordan Justen
64c3d73535 i965/cs: Don't use a thread channel ID for small local sizes
When the local group size is 8 or less, we will execute the program at
most 1 time. Therefore, the local channel ID will always be 0. By
using a constant 0 in this case we can prevent using push constant
data.

This is not expected to be common a occurance in real applications,
but it has been seen in tests.

We could extend this optimization to 16 and 32 for SIMD16 and SIMD32,
but it gets a bit more complicated, because this optimization is
currently being done early on, before we have decided the SIMD size.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-10-19 16:51:45 -07:00
Jordan Justen
1fa000a33b i965/cs: Use udiv/umod for local IDs
This allows for more optimizations relating to power-of-two divisions.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-10-19 16:51:45 -07:00
Timothy Arceri
740a8fa1e2 mesa: remove unused LocalSizeVariable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-10-20 10:30:32 +11:00
Samuel Pitoiset
2b6e04e91f nvc0/ir: simplify predicate logic for GK104 atomic operations
The predicate is always CC_NOT_P as defined in
processSurfaceCoordsNVE4(), so we only want to emit OR.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-19 23:53:57 +02:00
Samuel Pitoiset
974ab614d3 nvc0/ir: remove useless NVC0LoweringPass::gMemBase
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-10-19 23:53:48 +02:00
Samuel Pitoiset
03dc87caab nv50/ir: print CCTL subops in debug mode
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-19 23:53:39 +02:00
Ian Romanick
4d35683d91 nir: Optimize integer division and modulus with 1
The previous power-of-two rules didn't catch idiv (because i965 doesn't
set lower_idiv) and imod cases.  The udiv and umod cases should have
been caught, but I included them for orthogonality.

This fixes silly code observed from compute shaders with local_size_[xy]
= 1.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98299
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-19 14:25:10 -07:00
Marek Olšák
baed5eab82 configure.ac: enable EGL platform DRM if GBM is enabled
since GBM is enabled by default, this is also enabled by default

the whitespace changes remove tabs

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-19 23:19:16 +02:00
Marek Olšák
4650a27ba1 configure.ac: enable GBM by default
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-19 23:19:16 +02:00
Marek Olšák
0e075700fa configure.ac: print whether GBM is enabled
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-19 23:19:16 +02:00
Marek Olšák
3ec9975555 radeonsi: eliminate trivial constant VS outputs
These constant value VS PARAM exports:
- 0,0,0,0
- 0,0,0,1
- 1,1,1,0
- 1,1,1,1
can be loaded into PS inputs using the DEFAULT_VAL field, and the VS exports
can be removed from the IR to save export & parameter memory.

After LLVM optimizations, analyze the IR to see which exports are equal to
the ones listed above (or undef) and remove them if they are.

Targeted use cases:
- All DX9 eON ports always clear 10 VS outputs to 0.0 even if most of them
  are unused by PS (such as Witcher 2 below).
- VS output arrays with unused elements that the GLSL compiler can't
  eliminate (such as Batman below).

The shader-db deltas are quite interesting:
(not from upstream si-report.py, it won't be upstreamed)

PERCENTAGE DELTAS    Shaders PARAM exports (affected only)
batman_arkham_origins    589  -67.17 %
bioshock-infinite       1769   -0.47 %
dirt-showdown            548   -2.68 %
dota2                   1747   -3.36 %
f1-2015                  776   -4.94 %
left_4_dead_2           1762   -0.07 %
metro_2033_redux        2670   -0.43 %
portal                   474   -0.22 %
talos_principle          324   -3.63 %
warsow                   176   -2.20 %
witcher2                1040  -73.78 %
----------------------------------------
All affected             991  -65.37 %  ... 9681 -> 3353
----------------------------------------
Total                  26725  -10.82 %  ... 58490 -> 52162

v2: treat Undef as both 0 and 1

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> (v1)
2016-10-19 22:21:46 +02:00
Samuel Pitoiset
041da0ae81 nv50/ir: silent TGSI_PROPERTY_FS_DEPTH_LAYOUT
Found that information message while replaying a trace from
Metro 2033 Redux. Mark that property as useless for now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-19 21:02:50 +02:00
Emil Velikov
1a9b0221bc docs: add 13.1.0-devel release notes template, bump version
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-19 19:10:16 +01:00
2365 changed files with 229843 additions and 140225 deletions

View File

@@ -1,4 +1,5 @@
((prog-mode
((nil . ((show-trailing-whitespace . t)))
(prog-mode
(indent-tabs-mode . nil)
(tab-width . 8)
(c-basic-offset . 3)
@@ -8,6 +9,10 @@
(c-set-offset 'case-label '0)
(c-set-offset 'innamespace '0)
(c-set-offset 'inline-open '0)))
)
(whitespace-style face indentation)
(whitespace-line-column . 79)
(eval ignore-errors
(require 'whitespace)
(whitespace-mode 1)))
(makefile-mode (indent-tabs-mode . t))
)

View File

@@ -6,6 +6,7 @@ root = true
[*]
charset = utf-8
insert_final_newline = true
tab_width = 8
[*.{c,h,cpp,hpp,cc,hh}]
indent_style = space

View File

@@ -1,6 +1,6 @@
language: c
sudo: true
sudo: required
dist: trusty
cache:
@@ -15,11 +15,9 @@ addons:
- libexpat1-dev
- libxcb-dri2-0-dev
- libx11-xcb-dev
- llvm-3.5-dev
# llvm-config is not in the dev package?
- llvm-3.5
# LLVM packaging is broken and misses this dep.
# LLVM packaging is broken and misses these dependencies
- libedit-dev
- libelf-dev
- scons
env:
@@ -29,14 +27,16 @@ env:
- XORGMACROS_VERSION=util-macros-1.19.0
- GLPROTO_VERSION=glproto-1.4.17
- DRI2PROTO_VERSION=dri2proto-2.8
- DRI3PROTO_VERSION=dri3proto-1.0
- PRESENTPROTO_VERSION=presentproto-1.0
- LIBPCIACCESS_VERSION=libpciaccess-0.13.4
- LIBDRM_VERSION=libdrm-2.4.65
- LIBDRM_VERSION=libdrm-2.4.74
- XCBPROTO_VERSION=xcb-proto-1.11
- LIBXCB_VERSION=libxcb-1.11
- LIBXSHMFENCE_VERSION=libxshmfence-1.2
- LLVM_VERSION=3.9
- LLVM_PACKAGE="llvm-${LLVM_VERSION} llvm-${LLVM_VERSION}-dev"
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
- PKG_CONFIG_PATH=$HOME/prefix/lib/pkgconfig
- MAKEFLAGS=-j2
matrix:
- BUILD=make
- BUILD=scons
@@ -47,7 +47,7 @@ install:
# Since libdrm gets updated in configure.ac regularly, try to pick up the
# latest version from there.
- for line in `grep "^LIBDRM_.*_REQUIRED=" configure.ac`; do
- for line in `grep "^LIBDRM.*_REQUIRED=" configure.ac`; do
old_ver=`echo $LIBDRM_VERSION | sed 's/libdrm-//'`;
new_ver=`echo $line | sed 's/.*REQUIRED=//'`;
if `echo "$old_ver,$new_ver" | tr ',' '\n' | sort -Vc 2> /dev/null`; then
@@ -70,14 +70,6 @@ install:
- tar -jxvf $DRI2PROTO_VERSION.tar.bz2
- (cd $DRI2PROTO_VERSION && ./configure --prefix=$HOME/prefix && make install)
- wget $XORG_RELEASES/proto/$DRI3PROTO_VERSION.tar.bz2
- tar -jxvf $DRI3PROTO_VERSION.tar.bz2
- (cd $DRI3PROTO_VERSION && ./configure --prefix=$HOME/prefix && make install)
- wget $XORG_RELEASES/proto/$PRESENTPROTO_VERSION.tar.bz2
- tar -jxvf $PRESENTPROTO_VERSION.tar.bz2
- (cd $PRESENTPROTO_VERSION && ./configure --prefix=$HOME/prefix && make install)
- wget $XCB_RELEASES/$XCBPROTO_VERSION.tar.bz2
- tar -jxvf $XCBPROTO_VERSION.tar.bz2
- (cd $XCBPROTO_VERSION && ./configure --prefix=$HOME/prefix && make install)
@@ -92,21 +84,31 @@ install:
- wget http://dri.freedesktop.org/libdrm/$LIBDRM_VERSION.tar.bz2
- tar -jxvf $LIBDRM_VERSION.tar.bz2
- (cd $LIBDRM_VERSION && ./configure --prefix=$HOME/prefix --enable-vc4 && make install)
- (cd $LIBDRM_VERSION && ./configure --prefix=$HOME/prefix --enable-vc4 --enable-freedreno --enable-etnaviv-experimental-api && make install)
- wget $XORG_RELEASES/lib/$LIBXSHMFENCE_VERSION.tar.bz2
- tar -jxvf $LIBXSHMFENCE_VERSION.tar.bz2
- (cd $LIBXSHMFENCE_VERSION && ./configure --prefix=$HOME/prefix && make install)
# Install LLVM directly via apt-get (not Travis-CI's apt addon)
# See https://github.com/travis-ci/apt-source-whitelist/pull/205#issuecomment-216054237
- wget -nv -O - http://llvm.org/apt/llvm-snapshot.gpg.key | sudo apt-key add -
- sudo apt-add-repository -y 'deb http://llvm.org/apt/trusty llvm-toolchain-trusty-3.9 main'
- sudo apt-add-repository -y 'deb http://llvm.org/apt/trusty llvm-toolchain-trusty main'
- sudo apt-get update -qq
- sudo apt-get install -qq -y $LLVM_PACKAGE
script:
- if test "x$BUILD" = xmake; then
./autogen.sh --enable-debug
--with-egl-platforms=x11,drm
--with-platforms=x11,drm
--with-dri-drivers=i915,i965,radeon,r200,swrast,nouveau
--with-gallium-drivers=svga,swrast,vc4,virgl,r300,r600
--with-gallium-drivers=i915,nouveau,r300,r600,radeonsi,freedreno,svga,swrast,vc4,virgl,etnaviv,imx
--with-vulkan-drivers=radeon
--disable-llvm-shared-libs
;
make && make check;
elif test x$BUILD = xscons; then
scons;
scons llvm=1 && scons llvm=1 check;
fi

View File

@@ -30,7 +30,6 @@ LOCAL_C_INCLUDES += \
$(MESA_TOP)/include
MESA_VERSION := $(shell cat $(MESA_TOP)/VERSION)
# define ANDROID_VERSION (e.g., 4.0.x => 0x0400)
LOCAL_CFLAGS += \
-Wno-unused-parameter \
-Wno-date-time \
@@ -39,11 +38,10 @@ LOCAL_CFLAGS += \
-Wno-initializer-overrides \
-Wno-mismatched-tags \
-DPACKAGE_VERSION=\"$(MESA_VERSION)\" \
-DPACKAGE_BUGREPORT=\"https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa\" \
-DANDROID_VERSION=0x0$(MESA_ANDROID_MAJOR_VERSION)0$(MESA_ANDROID_MINOR_VERSION)
-DPACKAGE_BUGREPORT=\"https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa\"
LOCAL_CFLAGS += \
-D__STDC_LIMIT_MACROS \
-DENABLE_SHADER_CACHE \
-DHAVE___BUILTIN_EXPECT \
-DHAVE___BUILTIN_FFS \
-DHAVE___BUILTIN_FFSLL \
@@ -51,6 +49,7 @@ LOCAL_CFLAGS += \
-DHAVE_FUNC_ATTRIBUTE_UNUSED \
-DHAVE_FUNC_ATTRIBUTE_FORMAT \
-DHAVE_FUNC_ATTRIBUTE_PACKED \
-DHAVE_FUNC_ATTRIBUTE_ALIAS \
-DHAVE___BUILTIN_CTZ \
-DHAVE___BUILTIN_POPCOUNT \
-DHAVE___BUILTIN_POPCOUNTLL \
@@ -59,9 +58,17 @@ LOCAL_CFLAGS += \
-DHAVE___BUILTIN_UNREACHABLE \
-DHAVE_PTHREAD=1 \
-DHAVE_DLOPEN \
-DHAVE_DL_ITERATE_PHDR \
-fvisibility=hidden \
-Wno-sign-compare
LOCAL_CPPFLAGS += \
-D__STDC_CONSTANT_MACROS \
-D__STDC_FORMAT_MACROS \
-D__STDC_LIMIT_MACROS \
-Wno-error=non-virtual-dtor \
-Wno-non-virtual-dtor
# mesa requires at least c99 compiler
LOCAL_CONLYFLAGS += \
-std=c99
@@ -69,37 +76,37 @@ LOCAL_CONLYFLAGS += \
ifeq ($(strip $(MESA_ENABLE_ASM)),true)
ifeq ($(TARGET_ARCH),x86)
LOCAL_CFLAGS += \
-DUSE_X86_ASM \
-DUSE_X86_ASM
endif
endif
ifeq ($(MESA_ENABLE_LLVM),true)
LOCAL_CFLAGS += \
-DHAVE_LLVM=0x0305 -DMESA_LLVM_VERSION_PATCH=2 \
-D__STDC_CONSTANT_MACROS \
-D__STDC_FORMAT_MACROS \
-D__STDC_LIMIT_MACROS
ifeq ($(MESA_ANDROID_MAJOR_VERSION),5)
LOCAL_CFLAGS += -DHAVE_LLVM=0x0305 -DMESA_LLVM_VERSION_PATCH=2
ELF_INCLUDES := external/elfutils/0.153/libelf
endif
ifeq ($(MESA_ANDROID_MAJOR_VERSION),6)
LOCAL_CFLAGS += -DHAVE_LLVM=0x0307 -DMESA_LLVM_VERSION_PATCH=0
ELF_INCLUDES := external/elfutils/src/libelf
endif
ifeq ($(MESA_ANDROID_MAJOR_VERSION),7)
LOCAL_CFLAGS += -DHAVE_LLVM=0x0308 -DMESA_LLVM_VERSION_PATCH=0
ELF_INCLUDES := external/elfutils/libelf
endif
endif
ifneq ($(LOCAL_IS_HOST_MODULE),true)
# add libdrm if there are hardware drivers
ifneq ($(filter-out swrast,$(MESA_GPU_DRIVERS)),)
LOCAL_CFLAGS += -DHAVE_LIBDRM
LOCAL_SHARED_LIBRARIES += libdrm
endif
LOCAL_CPPFLAGS += \
$(if $(filter true,$(MESA_LOLLIPOP_BUILD)),-D_USING_LIBCXX) \
-Wno-error=non-virtual-dtor \
-Wno-non-virtual-dtor
ifeq ($(MESA_LOLLIPOP_BUILD),true)
LOCAL_CFLAGS_32 += -DDEFAULT_DRIVER_DIR=\"/system/lib/$(MESA_DRI_MODULE_REL_PATH)\"
LOCAL_CFLAGS_64 += -DDEFAULT_DRIVER_DIR=\"/system/lib64/$(MESA_DRI_MODULE_REL_PATH)\"
else
LOCAL_CFLAGS += -DDEFAULT_DRIVER_DIR=\"/system/lib/$(MESA_DRI_MODULE_REL_PATH)\"
endif
LOCAL_CFLAGS_32 += -DDEFAULT_DRIVER_DIR=\"/system/lib/$(MESA_DRI_MODULE_REL_PATH)\"
LOCAL_CFLAGS_64 += -DDEFAULT_DRIVER_DIR=\"/system/lib64/$(MESA_DRI_MODULE_REL_PATH)\"
# uncomment to keep the debug symbols
#LOCAL_STRIP_MODULE := false
@@ -109,3 +116,7 @@ endif
# Quiet down the build system and remove any .h files from the sources
LOCAL_SRC_FILES := $(patsubst %.h, , $(LOCAL_SRC_FILES))
ifneq ($(LOCAL_IS_HOST_MODULE),true)
LOCAL_SHARED_LIBRARIES += libz
endif

View File

@@ -24,7 +24,7 @@
# BOARD_GPU_DRIVERS should be defined. The valid values are
#
# classic drivers: i915 i965
# gallium drivers: swrast freedreno i915g ilo nouveau r300g r600g radeonsi vc4 virgl vmwgfx
# gallium drivers: swrast freedreno i915g nouveau r300g r600g radeonsi vc4 virgl vmwgfx
#
# The main target is libGLES_mesa. For each classic driver enabled, a DRI
# module will also be built. DRI modules will be loaded by libGLES_mesa.
@@ -32,15 +32,6 @@
MESA_TOP := $(call my-dir)
MESA_ANDROID_MAJOR_VERSION := $(word 1, $(subst ., , $(PLATFORM_VERSION)))
MESA_ANDROID_MINOR_VERSION := $(word 2, $(subst ., , $(PLATFORM_VERSION)))
MESA_ANDROID_VERSION := $(MESA_ANDROID_MAJOR_VERSION).$(MESA_ANDROID_MINOR_VERSION)
ifeq ($(filter 1 2 3 4,$(MESA_ANDROID_MAJOR_VERSION)),)
MESA_LOLLIPOP_BUILD := true
else
define local-generated-sources-dir
$(call local-intermediates-dir)
endef
endif
MESA_DRI_MODULE_REL_PATH := dri
MESA_DRI_MODULE_PATH := $(TARGET_OUT_SHARED_LIBRARIES)/$(MESA_DRI_MODULE_REL_PATH)
@@ -50,7 +41,7 @@ MESA_COMMON_MK := $(MESA_TOP)/Android.common.mk
MESA_PYTHON2 := python
classic_drivers := i915 i965
gallium_drivers := swrast freedreno i915g ilo nouveau r300g r600g radeonsi vmwgfx vc4 virgl
gallium_drivers := swrast freedreno i915g nouveau r300g r600g radeonsi vmwgfx vc4 virgl
MESA_GPU_DRIVERS := $(strip $(BOARD_GPU_DRIVERS))
@@ -97,7 +88,8 @@ SUBDIRS := \
src/egl \
src/amd \
src/intel \
src/mesa/drivers/dri
src/mesa/drivers/dri \
src/vulkan
INC_DIRS := $(call all-named-subdir-makefiles,$(SUBDIRS))

View File

@@ -27,7 +27,7 @@ AM_DISTCHECK_CONFIGURE_FLAGS = \
--enable-egl \
--enable-gallium-tests \
--enable-gallium-osmesa \
--enable-gallium-llvm \
--enable-llvm \
--enable-gbm \
--enable-gles1 \
--enable-gles2 \
@@ -40,10 +40,10 @@ AM_DISTCHECK_CONFIGURE_FLAGS = \
--enable-vdpau \
--enable-xa \
--enable-xvmc \
--disable-llvm-shared-libs \
--with-egl-platforms=x11,wayland,drm,surfaceless \
--enable-llvm-shared-libs \
--with-platforms=x11,wayland,drm,surfaceless \
--with-dri-drivers=i915,i965,nouveau,radeon,r200,swrast \
--with-gallium-drivers=i915,ilo,nouveau,r300,r600,radeonsi,freedreno,svga,swrast,vc4,virgl,swr \
--with-gallium-drivers=i915,nouveau,r300,r600,radeonsi,freedreno,svga,swrast,vc4,virgl,swr,etnaviv,imx \
--with-vulkan-drivers=intel,radeon
ACLOCAL_AMFLAGS = -I m4
@@ -62,6 +62,7 @@ noinst_HEADERS = \
include/c99_math.h \
include/c11 \
include/D3D9 \
include/GL/wglext.h \
include/HaikuGL \
include/no_extern_c.h \
include/pci_ids

View File

@@ -58,6 +58,7 @@ F: src/compiler/nir/
DOCUMENTATION
R: Emil Velikov <emil.l.velikov@gmail.com>
R: Eric Engestrom <eric@engestrom.ch>
F: docs/
F: doxygen/
@@ -69,6 +70,10 @@ DRI LOADER
R: Emil Velikov <emil.l.velikov@gmail.com>
F: src/loader/
EGL
R: Eric Engestrom <eric@engestrom.ch>
F: src/egl/
GALLIUM LOADER
R: Emil Velikov <emil.l.velikov@gmail.com>
F: src/gallium/auxiliary/pipe-loader/
@@ -80,6 +85,7 @@ F: src/gallium/targets/
AUTOCONF BUILD
R: Emil Velikov <emil.l.velikov@gmail.com>
F: autogen.sh
F: configure.ac
F: */Automake.inc
F: */Makefile.*am
@@ -92,10 +98,16 @@ F: */Makefile.sources
ANDROID BUILD
R: Emil Velikov <emil.l.velikov@gmail.com>
R: Rob Herring <robh@kernel.org>
F: CleanSpec.mk
F: */Android.*mk
F: */Makefile.sources
ANDROID EGL SUPPORT
R: Rob Herring <robh@kernel.org>
R: Tomasz Figa <tfiga@chromium.org>
F: src/egl/drivers/dri2/platform_android.c
WAYLAND EGL SUPPORT
R: Daniel Stone <daniels@collabora.com>
F: src/egl/wayland/*

View File

@@ -1 +1 @@
12.1.0-devel
17.1.0-devel

View File

@@ -34,13 +34,13 @@ branches:
clone_depth: 100
cache:
- win_flex_bison-2.4.5.zip
- win_flex_bison-2.5.9.zip
- llvm-3.3.1-msvc2013-mtd.7z
os: Visual Studio 2013
environment:
WINFLEXBISON_ARCHIVE: win_flex_bison-2.4.5.zip
WINFLEXBISON_ARCHIVE: win_flex_bison-2.5.9.zip
LLVM_ARCHIVE: llvm-3.3.1-msvc2013-mtd.7z
install:
@@ -48,11 +48,13 @@ install:
- python --version
- python -m pip --version
# Install Mako
- python -m pip install --egg Mako
- python -m pip install Mako==1.0.6
# Install pywin32 extensions, needed by SCons
- python -m pip install pypiwin32
# Install python wheels, necessary to install SCons via pip
- python -m pip install wheel
# Install SCons
- python -m pip install --egg scons==2.4.1
- python -m pip install scons==2.5.1
- scons --version
# Install flex/bison
- if not exist "%WINFLEXBISON_ARCHIVE%" appveyor DownloadFile "https://downloads.sourceforge.net/project/winflexbison/old_versions/%WINFLEXBISON_ARCHIVE%"

View File

@@ -1,4 +1,4 @@
#!/bin/bash
#!/bin/sh
# This script is used to generate the list of fixed bugs that
# appears in the release notes files, with HTML formatting.
@@ -11,8 +11,6 @@
# $ bin/bugzilla_mesa.sh mesa-9.0.2..mesa-9.0.3
# $ bin/bugzilla_mesa.sh mesa-9.0.2..mesa-9.0.3 > bugfixes
# $ bin/bugzilla_mesa.sh mesa-9.0.2..mesa-9.0.3 | tee bugfixes
# $ DRYRUN=yes bin/bugzilla_mesa.sh mesa-9.0.2..mesa-9.0.3
# $ DRYRUN=yes bin/bugzilla_mesa.sh mesa-9.0.2..mesa-9.0.3 | wc -l
# regex pattern: trim before bug number
@@ -21,29 +19,17 @@ trim_before='s/.*show_bug.cgi?id=\([0-9]*\).*/\1/'
# regex pattern: reconstruct the url
use_after='s,^,https://bugs.freedesktop.org/show_bug.cgi?id=,'
echo "<ul>"
echo ""
# extract fdo urls from commit log
urls=$(git log $* | grep 'bugs.freedesktop.org/show_bug' | sed -e $trim_before | sort -n -u | sed -e $use_after)
# if DRYRUN is set to "yes", simply print the URLs and don't fetch the
# details from fdo bugzilla.
#DRYRUN=yes
if [ "x$DRYRUN" = xyes ]; then
for i in $urls
do
echo $i
done
else
echo "<ul>"
git log $* | grep 'bugs.freedesktop.org/show_bug' | sed -e $trim_before | sort -n -u | sed -e $use_after |\
while read url
do
id=$(echo $url | cut -d'=' -f2)
summary=$(wget --quiet -O - $url | grep -e '<title>.*</title>' | sed -e 's/ *<title>[0-9]\+ &ndash; \(.*\)<\/title>/\1/')
echo "<li><a href=\"$url\">Bug $id</a> - $summary</li>"
echo ""
done
for i in $urls
do
id=$(echo $i | cut -d'=' -f2)
summary=$(wget --quiet -O - $i | grep -e '<title>.*</title>' | sed -e 's/ *<title>[0-9]\+ &ndash; \(.*\)<\/title>/\1/')
echo "<li><a href=\"$i\">Bug $id</a> - $summary</li>"
echo ""
done
echo "</ul>"
fi
echo "</ul>"

View File

@@ -10,26 +10,28 @@
# $ bin/get-extra-pick-list.sh | tee picklist
# Use the last branchpoint as our limit for the search
# XXX: there should be a better way for this
latest_branchpoint=`git branch | grep \* | cut -c 3-`-branchpoint
latest_branchpoint=`git merge-base origin/master HEAD`
# Grep for commits with "cherry picked from commit" in the commit message.
git log --reverse --grep="cherry picked from commit" $latest_branchpoint..HEAD |\
grep "cherry picked from commit" |\
sed -e 's/^[[:space:]]*(cherry picked from commit[[:space:]]*//' -e 's/)//' |\
cut -c -8 |\
sed -e 's/^[[:space:]]*(cherry picked from commit[[:space:]]*//' -e 's/)//' > already_picked
# For each cherry-picked commit...
cat already_picked | cut -c -8 |\
while read sha
do
# Check if the original commit is referenced in master
# ... check if it's referenced (fixed by another) patch
git log -n1 --pretty=oneline --grep=$sha $latest_branchpoint..origin/master |\
cut -c -8 |\
while read candidate
do
# Check if the potential fix, hasn't landed in branch yet.
found=`git log -n1 --pretty=oneline --reverse --grep=$candidate $latest_branchpoint..HEAD |wc -l`
if test $found = 0
then
echo Commit $candidate might need to be picked, as it references $sha
# And flag up if it hasn't landed in branch yet.
if grep -q ^$candidate already_picked ; then
continue
fi
echo Commit $candidate references $sha
done
done
rm -f already_picked

61
bin/get-fixes-pick-list.sh Executable file
View File

@@ -0,0 +1,61 @@
#!/bin/sh
# Script for generating a list of candidates [referenced by a Fixes tag] for
# cherry-picking to a stable branch
#
# Usage examples:
#
# $ bin/get-fixes-pick-list.sh
# $ bin/get-fixes-pick-list.sh > picklist
# $ bin/get-fixes-pick-list.sh | tee picklist
# Use the last branchpoint as our limit for the search
latest_branchpoint=`git merge-base origin/master HEAD`
# List all the commits between day 1 and the branch point...
git log --reverse --pretty=%H $latest_branchpoint > already_landed
# ... and the ones cherry-picked.
git log --reverse --grep="cherry picked from commit" $latest_branchpoint..HEAD |\
grep "cherry picked from commit" |\
sed -e 's/^[[:space:]]*(cherry picked from commit[[:space:]]*//' -e 's/)//' > already_picked
# Grep for commits with Fixes tag
git log --reverse --pretty=%H -i --grep="fixes:" $latest_branchpoint..origin/master |\
while read sha
do
# For each one try to extract the tag
fixes_count=`git show $sha | grep -i "fixes:" | wc -l`
if [ "x$fixes_count" != x1 ] ; then
echo WARNING: Commit $sha has more than one Fixes tag
fi
fixes=`git show $sha | grep -i "fixes:" | head -n 1`
# The following sed/cut combination is borrowed from GregKH
id=`echo ${fixes} | sed -e 's/^[ \t]*//' | cut -f 2 -d ':' | sed -e 's/^[ \t]*//' | cut -f 1 -d ' '`
# Bail out if we cannot find suitable id.
# Any specific validation the $id is valid and not some junk, is
# implied with the follow up code
if [ "x$id" = x ] ; then
continue
fi
# Check if the offending commit is in branch.
# Be that cherry-picked ...
# ... or landed before the branchpoint.
if grep -q ^$id already_picked ||
grep -q ^$id already_landed ; then
# Finally nominate the fix if it hasn't landed yet.
if grep -q ^$sha already_picked ; then
continue
fi
echo Commit $sha fixes $id
fi
done
rm -f already_picked
rm -f already_landed

View File

@@ -8,13 +8,16 @@
# $ bin/get-pick-list.sh > picklist
# $ bin/get-pick-list.sh | tee picklist
# Use the last branchpoint as our limit for the search
latest_branchpoint=`git merge-base origin/master HEAD`
# Grep for commits with "cherry picked from commit" in the commit message.
git log --reverse --grep="cherry picked from commit" origin/master..HEAD |\
git log --reverse --grep="cherry picked from commit" $latest_branchpoint..HEAD |\
grep "cherry picked from commit" |\
sed -e 's/^[[:space:]]*(cherry picked from commit[[:space:]]*//' -e 's/)//' > already_picked
# Grep for commits that were marked as a candidate for the stable tree.
git log --reverse --pretty=%H -i --grep='^\([[:space:]]*NOTE: .*[Cc]andidate\|CC:.*mesa-stable\)' HEAD..origin/master |\
git log --reverse --pretty=%H -i --grep='^CC:.*mesa-stable' $latest_branchpoint..origin/master |\
while read sha
do
# Check to see whether the patch is on the ignore list.

42
bin/get-typod-pick-list.sh Executable file
View File

@@ -0,0 +1,42 @@
#!/bin/sh
# Script for generating a list of candidates which have typos in the nomination line
#
# Usage examples:
#
# $ bin/get-typod-pick-list.sh
# $ bin/get-typod-pick-list.sh > picklist
# $ bin/get-typod-pick-list.sh | tee picklist
# NB:
# This script intentionally _never_ checks for specific version tag
# Should we consider folding it with the original get-pick-list.sh
# Use the last branchpoint as our limit for the search
latest_branchpoint=`git merge-base origin/master HEAD`
# Grep for commits with "cherry picked from commit" in the commit message.
git log --reverse --grep="cherry picked from commit" $latest_branchpoint..HEAD |\
grep "cherry picked from commit" |\
sed -e 's/^[[:space:]]*(cherry picked from commit[[:space:]]*//' -e 's/)//' > already_picked
# Grep for commits that were marked as a candidate for the stable tree.
git log --reverse --pretty=%H -i --grep='^CC:.*mesa-dev' $latest_branchpoint..origin/master |\
while read sha
do
# Check to see whether the patch is on the ignore list.
if [ -f bin/.cherry-ignore ] ; then
if grep -q ^$sha bin/.cherry-ignore ; then
continue
fi
fi
# Check to see if it has already been picked over.
if grep -q ^$sha already_picked ; then
continue
fi
git log -n1 --pretty=oneline $sha | cat
done
rm -f already_picked

View File

@@ -1,4 +1,4 @@
#!/bin/bash
#!/bin/sh
# This script is used to generate the list of changes that
# appears in the release notes files, with HTML formatting.
@@ -10,7 +10,7 @@
# $ bin/shortlog_mesa.sh mesa-9.0.2..mesa-9.0.3 | tee changes
typeset -i in_log=0
in_log=0
git shortlog $* | while read l
do

View File

@@ -59,7 +59,7 @@ if target_platform == 'windows' and host_platform != 'windows':
# find default_llvm value
if 'LLVM' in os.environ:
if 'LLVM' in os.environ or 'LLVM_CONFIG' in os.environ:
default_llvm = 'yes'
else:
default_llvm = 'no'
@@ -110,5 +110,6 @@ def AddOptions(opts):
opts.Add(BoolOption('texture_float',
'enable floating-point textures and renderbuffers',
'no'))
opts.Add(BoolOption('swr', 'Build OpenSWR', 'no'))
if host_platform == 'windows':
opts.Add('MSVC_VERSION', 'Microsoft Visual C/C++ version')

File diff suppressed because it is too large Load Diff

View File

@@ -39,7 +39,7 @@ steps that work as of this writing.
get pywin32-218.4.win-amd64-py2.7.exe
- install git
- download mesa from git
see http://www.mesa3d.org/repository.html
see https://www.mesa3d.org/repository.html
- run scons
General

View File

@@ -33,7 +33,7 @@ without a depth buffer.
<p>
Mesa 9.1.2 and later (will) support a DRI configuration option to work around
this issue.
Using the <a href="http://dri.freedesktop.org/wiki/DriConf">driconf</a> tool,
Using the <a href="https://dri.freedesktop.org/wiki/DriConf">driconf</a> tool,
set the "Create all visuals with a depth buffer" option before running Topogun.
Then, all GLX visuals will be created with a depth buffer.
</p>

View File

@@ -55,7 +55,7 @@ to your preference, type:
</pre>
<p>
This will produce libGL.so and several other libraries depending on the
This will produce libGL.so and/or several other libraries depending on the
options you have chosen. Later, if you want to rebuild for a different
configuration run <code>make realclean</code> before rebuilding.
</p>
@@ -118,7 +118,7 @@ directories. For example, <code>LDFLAGS="-L/usr/X11R6/lib"</code>.</p>
<dt><code>PKG_CONFIG_PATH</code></dt>
<dd><p>The
<code>pkg-config</code> utility is a hard requirement for cofiguring and
<code>pkg-config</code> utility is a hard requirement for configuring and
building mesa. It is used to search for external libraries
on the system. This environment variable is used to control the search
path for <code>pkg-config</code>. For instance, setting
@@ -133,9 +133,11 @@ There are also a few general options for altering the Mesa build:
</p>
<dl>
<dt><code>--enable-debug</code></dt>
<dd><p>This option will enable compiler
options and macros to aid in debugging the Mesa libraries.</p>
</dd>
<dd><p>This option will set the compiler debug/optimisation levels (if the user
hasn't already set them via the CFLAGS/CXXFLAGS) and macros to aid in
debugging the Mesa libraries.</p>
<p>Note that enabling this option can lead to noticeable loss of performance.</p>
<dt><code>--disable-asm</code></dt>
<dd><p>There are assembly routines
@@ -174,27 +176,22 @@ architecture, the following should be sufficient to configure multilib Mesa</p>
</dl>
<h2 id="driver">2. Driver Options</h2>
<h2 id="driver">2. GL Driver Options</h2>
<p>
There are several different driver modes that Mesa can use. These are
described in more detail in the <a href="install.html">basic
installation instructions</a>. The Mesa driver is controlled through the
configure options <code>--enable-xlib-glx</code>, <code>--enable-osmesa</code>,
and <code>--enable-dri</code>.
configure options <code>--enable-glx</code> and <code>--enable-osmesa</code>
</p>
<h3 id="xlib">Xlib</h3><p>
It uses Xlib as a software renderer to do all rendering. It corresponds
to the option <code>--enable-xlib-glx</code>. The libX11 and libXext
libraries, as well as the X11 development headers, will be need to
support the Xlib driver.
to the option <code>--enable-glx=xlib</code> or <code>--enable-glx=gallium-xlib</code>.
<h3 id="dri">DRI</h3><p>This mode uses the DRI hardware drivers for
accelerated OpenGL rendering. Enable the DRI drivers with the option
<code>--enable-dri</code>. See the <a href="install.html">basic
installation instructions</a> for details on prerequisites for the DRI
drivers.
accelerated OpenGL rendering. To enable use <code>--enable-glx=dri
--enable-dri</code>.
<!-- DRI specific options -->
<dl>
@@ -252,10 +249,8 @@ will create the libOSMesa16 library with a 16-bit color channel.
<h2 id="library">3. Library Options</h2>
<p>
The configure script provides more fine grained control over the GL
libraries that will be built. More details on the specific GL libraries
can be found in the <a href="install.html">basic installation
instructions</a>.
The configure script provides more fine grained control over the libraries
that will be built.
</div>
</body>

View File

@@ -18,7 +18,7 @@
<p>
The Mesa bug database is hosted on
<a href="http://freedesktop.org">freedesktop.org</a>.
<a href="https://freedesktop.org">freedesktop.org</a>.
The old bug database on SourceForge is no longer used.
</p>

142
docs/codingstyle.html Normal file
View File

@@ -0,0 +1,142 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Coding Style</title>
<link rel="stylesheet" type="text/css" href="mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="contents.html"></iframe>
<div class="content">
<h1>Coding Style</h1>
<p>
Mesa is over 20 years old and the coding style has evolved over time.
Some old parts use a style that's a bit out of date.
Different sections of mesa can use different coding style as set in the local
EditorConfig (.editorconfig) and/or Emacs (.dir-locals.el) file.
Alternatively the following is applicable.
If the guidelines below don't cover something, try following the format of
existing, neighboring code.
</p>
<p>
Basic formatting guidelines
</p>
<ul>
<li>3-space indentation, no tabs.
<li>Limit lines to 78 or fewer characters. The idea is to prevent line
wrapping in 80-column editors and terminals. There are exceptions, such
as if you're defining a large, static table of information.
<li>Opening braces go on the same line as the if/for/while statement.
For example:
<pre>
if (condition) {
foo;
} else {
bar;
}
</pre>
<li>Put a space before/after operators. For example, <tt>a = b + c;</tt>
and not <tt>a=b+c;</tt>
<li>This GNU indent command generally does the right thing for formatting:
<pre>
indent -br -i3 -npcs --no-tabs infile.c -o outfile.c
</pre>
<li>Use comments wherever you think it would be helpful for other developers.
Several specific cases and style examples follow. Note that we roughly
follow <a href="https://www.stack.nl/~dimitri/doxygen/">Doxygen</a> conventions.
<br>
<br>
Single-line comments:
<pre>
/* null-out pointer to prevent dangling reference below */
bufferObj = NULL;
</pre>
Or,
<pre>
bufferObj = NULL; /* prevent dangling reference below */
</pre>
Multi-line comment:
<pre>
/* If this is a new buffer object id, or one which was generated but
* never used before, allocate a buffer object now.
*/
</pre>
We try to quote the OpenGL specification where prudent:
<pre>
/* Page 38 of the PDF of the OpenGL ES 3.0 spec says:
*
* "An INVALID_OPERATION error is generated for any of the following
* conditions:
*
* * <length> is zero."
*
* Additionally, page 94 of the PDF of the OpenGL 4.5 core spec
* (30.10.2014) also says this, so it's no longer allowed for desktop GL,
* either.
*/
</pre>
Function comment example:
<pre>
/**
* Create and initialize a new buffer object. Called via the
* ctx->Driver.CreateObject() driver callback function.
* \param name integer name of the object
* \param type one of GL_FOO, GL_BAR, etc.
* \return pointer to new object or NULL if error
*/
struct gl_object *
_mesa_create_object(GLuint name, GLenum type)
{
/* function body */
}
</pre>
<li>Put the function return type and qualifiers on one line and the function
name and parameters on the next, as seen above. This makes it easy to use
<code>grep ^function_name dir/*</code> to find function definitions. Also,
the opening brace goes on the next line by itself (see above.)
<li>Function names follow various conventions depending on the type of function:
<pre>
glFooBar() - a public GL entry point (in glapi_dispatch.c)
_mesa_FooBar() - the internal immediate mode function
save_FooBar() - retained mode (display list) function in dlist.c
foo_bar() - a static (private) function
_mesa_foo_bar() - an internal non-static Mesa function
</pre>
<li>Constants, macros and enum names are ALL_UPPERCASE, with _ between
words.
<li>Mesa usually uses camel case for local variables (Ex: "localVarname")
while gallium typically uses underscores (Ex: "local_var_name").
<li>Global variables are almost never used because Mesa should be thread-safe.
<li>Booleans. Places that are not directly visible to the GL API
should prefer the use of <tt>bool</tt>, <tt>true</tt>, and
<tt>false</tt> over <tt>GLboolean</tt>, <tt>GL_TRUE</tt>, and
<tt>GL_FALSE</tt>. In C code, this may mean that
<tt>#include &lt;stdbool.h&gt;</tt> needs to be added. The
<tt>try_emit_</tt>* methods in src/mesa/program/ir_to_mesa.cpp and
src/mesa/state_tracker/st_glsl_to_tgsi.cpp can serve as examples.
</ul>
</p>
</div>
</body>
</html>

View File

@@ -53,7 +53,7 @@
<li><a href="lists.html" target="_parent">Mailing Lists</a>
<li><a href="bugs.html" target="_parent">Bug Database</a>
<li><a href="webmaster.html" target="_parent">Webmaster</a>
<li><a href="http://dri.freedesktop.org/" target="_parent">Mesa/DRI Wiki</a>
<li><a href="https://dri.freedesktop.org/" target="_parent">Mesa/DRI Wiki</a>
</ul>
<b>User Topics</b>
@@ -66,7 +66,7 @@
<li><a href="debugging.html" target="_parent">Debugging Tips</a>
<li><a href="perf.html" target="_parent">Performance Tips</a>
<li><a href="extensions.html" target="_parent">Mesa Extensions</a>
<li><a href="mangling.html" target="_parent">Function Name Mangling</a>
<li><a href="mangling.html" target="_parent">GL Function Name Mangling</a>
<li><a href="llvmpipe.html" target="_parent">Gallium llvmpipe driver</a>
<li><a href="vmware-guest.html" target="_parent">VMware SVGA3D guest driver</a>
<li><a href="postprocess.html" target="_parent">Gallium post-processing</a>
@@ -81,23 +81,25 @@
<li><a href="utilities.html" target="_parent">Utilities</a>
<li><a href="helpwanted.html" target="_parent">Help Wanted</a>
<li><a href="devinfo.html" target="_parent">Development Notes</a>
<li><a href="codingstyle.html" target="_parent">Coding Style</a>
<li><a href="submittingpatches.html" target="_parent">Submitting patches</a>
<li><a href="releasing.html" target="_parent">Releasing process</a>
<li><a href="sourcedocs.html" target="_parent">Source Documentation</a>
<li><a href="dispatch.html" target="_parent">GL Dispatch</a>
</ul>
<b>Links</b>
<ul>
<li><a href="http://www.opengl.org" target="_parent">OpenGL website</a>
<li><a href="http://dri.freedesktop.org" target="_parent">DRI website</a>
<li><a href="http://www.freedesktop.org" target="_parent">freedesktop.org</a>
<li><a href="http://planet.freedesktop.org" target="_parent">Developer blogs</a>
<li><a href="https://www.opengl.org" target="_parent">OpenGL website</a>
<li><a href="https://dri.freedesktop.org" target="_parent">DRI website</a>
<li><a href="https://www.freedesktop.org" target="_parent">freedesktop.org</a>
<li><a href="https://planet.freedesktop.org" target="_parent">Developer blogs</a>
</ul>
<b>Hosted by:</b>
<br>
<blockquote>
<a href="http://sourceforge.net"
target="_parent">sourceforge.net</a>
<a href="https://freedesktop.org" target="_parent">freedesktop.org</a>
</blockquote>
</body>

View File

@@ -20,7 +20,7 @@
Both professional and volunteer developers contribute to Mesa.
</p>
<p>
<a href="http://www.vmware.com/">VMware</a>
<a href="https://www.vmware.com/">VMware</a>
employs several of the main Mesa developers including Brian Paul
and Keith Whitwell.
</p>
@@ -44,7 +44,7 @@ Intel has recently contributed the new GLSL compiler in Mesa 7.9.
</p>
<p>
<a href="http://www.lunarg.com/">LunarG</a> can be contacted
<a href="https://www.lunarg.com/">LunarG</a> can be contacted
for custom Mesa / 3D graphics development.
</p>

View File

@@ -18,650 +18,9 @@
<ul>
<li><a href="#style">Coding Style</a>
<li><a href="#submitting">Submitting Patches</a>
<li><a href="#release">Making a New Mesa Release</a>
<li><a href="#extensions">Adding Extensions</a>
</ul>
<h2 id="style">Coding Style</h2>
<p>
Mesa is over 20 years old and the coding style has evolved over time.
Some old parts use a style that's a bit out of date.
If the guidelines below don't cover something, try following the format of
existing, neighboring code.
</p>
<p>
Basic formatting guidelines
</p>
<ul>
<li>3-space indentation, no tabs.
<li>Limit lines to 78 or fewer characters. The idea is to prevent line
wrapping in 80-column editors and terminals. There are exceptions, such
as if you're defining a large, static table of information.
<li>Opening braces go on the same line as the if/for/while statement.
For example:
<pre>
if (condition) {
foo;
} else {
bar;
}
</pre>
<li>Put a space before/after operators. For example, <tt>a = b + c;</tt>
and not <tt>a=b+c;</tt>
<li>This GNU indent command generally does the right thing for formatting:
<pre>
indent -br -i3 -npcs --no-tabs infile.c -o outfile.c
</pre>
<li>Use comments wherever you think it would be helpful for other developers.
Several specific cases and style examples follow. Note that we roughly
follow <a href="http://www.stack.nl/~dimitri/doxygen/">Doxygen</a> conventions.
<br>
<br>
Single-line comments:
<pre>
/* null-out pointer to prevent dangling reference below */
bufferObj = NULL;
</pre>
Or,
<pre>
bufferObj = NULL; /* prevent dangling reference below */
</pre>
Multi-line comment:
<pre>
/* If this is a new buffer object id, or one which was generated but
* never used before, allocate a buffer object now.
*/
</pre>
We try to quote the OpenGL specification where prudent:
<pre>
/* Page 38 of the PDF of the OpenGL ES 3.0 spec says:
*
* "An INVALID_OPERATION error is generated for any of the following
* conditions:
*
* * <length> is zero."
*
* Additionally, page 94 of the PDF of the OpenGL 4.5 core spec
* (30.10.2014) also says this, so it's no longer allowed for desktop GL,
* either.
*/
</pre>
Function comment example:
<pre>
/**
* Create and initialize a new buffer object. Called via the
* ctx->Driver.CreateObject() driver callback function.
* \param name integer name of the object
* \param type one of GL_FOO, GL_BAR, etc.
* \return pointer to new object or NULL if error
*/
struct gl_object *
_mesa_create_object(GLuint name, GLenum type)
{
/* function body */
}
</pre>
<li>Put the function return type and qualifiers on one line and the function
name and parameters on the next, as seen above. This makes it easy to use
<code>grep ^function_name dir/*</code> to find function definitions. Also,
the opening brace goes on the next line by itself (see above.)
<li>Function names follow various conventions depending on the type of function:
<pre>
glFooBar() - a public GL entry point (in glapi_dispatch.c)
_mesa_FooBar() - the internal immediate mode function
save_FooBar() - retained mode (display list) function in dlist.c
foo_bar() - a static (private) function
_mesa_foo_bar() - an internal non-static Mesa function
</pre>
<li>Constants, macros and enumerant names are ALL_UPPERCASE, with _ between
words.
<li>Mesa usually uses camel case for local variables (Ex: "localVarname")
while gallium typically uses underscores (Ex: "local_var_name").
<li>Global variables are almost never used because Mesa should be thread-safe.
<li>Booleans. Places that are not directly visible to the GL API
should prefer the use of <tt>bool</tt>, <tt>true</tt>, and
<tt>false</tt> over <tt>GLboolean</tt>, <tt>GL_TRUE</tt>, and
<tt>GL_FALSE</tt>. In C code, this may mean that
<tt>#include &lt;stdbool.h&gt;</tt> needs to be added. The
<tt>try_emit_</tt>* methods in src/mesa/program/ir_to_mesa.cpp and
src/mesa/state_tracker/st_glsl_to_tgsi.cpp can serve as examples.
</ul>
<h2 id="submitting">Submitting patches</h2>
<p>
The basic guidelines for submitting patches are:
</p>
<ul>
<li>Patches should be sufficiently tested before submitting.
<li>Code patches should follow Mesa coding conventions.
<li>Whenever possible, patches should only effect individual Mesa/Gallium
components.
<li>Patches should never introduce build breaks and should be bisectable (see
<code>git bisect</code>.)
<li>Patches should be properly formatted (see below).
<li>Patches should be submitted to mesa-dev for review using
<code>git send-email</code>.
<li>Patches should not mix code changes with code formatting changes (except,
perhaps, in very trivial cases.)
</ul>
<h3>Patch formatting</h3>
<p>
The basic rules for patch formatting are:
</p>
<ul>
<li>Lines should be limited to 75 characters or less so that git logs
displayed in 80-column terminals avoid line wrapping. Note that git
log uses 4 spaces of indentation (4 + 75 &lt; 80).
<li>The first line should be a short, concise summary of the change prefixed
with a module name. Examples:
<pre>
mesa: Add support for querying GL_VERTEX_ATTRIB_ARRAY_LONG
gallium: add PIPE_CAP_DEVICE_RESET_STATUS_QUERY
i965: Fix missing type in local variable declaration.
</pre>
<li>Subsequent patch comments should describe the change in more detail,
if needed. For example:
<pre>
i965: Remove end-of-thread SEND alignment code.
This was present in Eric's initial implementation of the compaction code
for Sandybridge (commit 077d01b6). There is no documentation saying this
is necessary, and removing it causes no regressions in piglit on any
platform.
</pre>
<li>A "Signed-off-by:" line is not required, but not discouraged either.
<li>If a patch address a bugzilla issue, that should be noted in the
patch comment. For example:
<pre>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89689
</pre>
<li>If there have been several revisions to a patch during the review
process, they should be noted such as in this example:
<pre>
st/mesa: add ARB_texture_stencil8 support (v4)
if we support stencil texturing, enable texture_stencil8
there is no requirement to support native S8 for this,
the texture can be converted to x24s8 fine.
v2: fold fixes from Marek in:
a) put S8 last in the list
b) fix renderable to always test for d/s renderable
fixup the texture case to use a stencil only format
for picking the format for the texture view.
v3: hit fallback for getteximage
v4: put s8 back in front, it shouldn't get picked now (Ilia)
</pre>
<li>If someone tested your patch, document it with a line like this:
<pre>
Tested-by: Joe Hacker &lt;jhacker@foo.com&gt;
</pre>
<li>If the patch was reviewed (usually the case) or acked by someone,
that should be documented with:
<pre>
Reviewed-by: Joe Hacker &lt;jhacker@foo.com&gt;
Acked-by: Joe Hacker &lt;jhacker@foo.com&gt;
</pre>
</ul>
<h3>Testing Patches</h3>
<p>
It should go without saying that patches must be tested. In general,
do whatever testing is prudent.
</p>
<p>
You should always run the Mesa test suite before submitting patches.
The test suite can be run using the 'make check' command. All tests
must pass before patches will be accepted, this may mean you have
to update the tests themselves.
</p>
<p>
Whenever possible and applicable, test the patch with
<a href="http://piglit.freedesktop.org">Piglit</a> to
check for regressions.
</p>
<h3>Mailing Patches</h3>
<p>
Patches should be sent to the mesa-dev mailing list for review:
<a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev">
mesa-dev@lists.freedesktop.org<a/>.
When submitting a patch make sure to use
<a href="https://git-scm.com/docs/git-send-email">git send-email</a>
rather than attaching patches to emails. Sending patches as
attachments prevents people from being able to provide in-line review
comments.
</p>
<p>
When submitting follow-up patches you can use --in-reply-to to make v2, v3,
etc patches show up as replies to the originals. This usually works well
when you're sending out updates to individual patches (as opposed to
re-sending the whole series). Using --in-reply-to makes
it harder for reviewers to accidentally review old patches.
</p>
<p>
When submitting follow-up patches you should also login to
<a href="https://patchwork.freedesktop.org">patchwork</a> and change the
state of your old patches to Superseded.
</p>
<h3>Reviewing Patches</h3>
<p>
When you've reviewed a patch on the mailing list, please be unambiguous
about your review. That is, state either
<pre>
Reviewed-by: Joe Hacker &lt;jhacker@foo.com&gt;
</pre>
or
<pre>
Acked-by: Joe Hacker &lt;jhacker@foo.com&gt;
</pre>
Rather than saying just "LGTM" or "Seems OK".
</p>
<p>
If small changes are suggested, it's OK to say something like:
<pre>
With the above fixes, Reviewed-by: Joe Hacker &lt;jhacker@foo.com&gt;
</pre>
which tells the patch author that the patch can be committed, as long
as the issues are resolved first.
</p>
<h3>Marking a commit as a candidate for a stable branch</h3>
<p>
If you want a commit to be applied to a stable branch,
you should add an appropriate note to the commit message.
</p>
<p>
Here are some examples of such a note:
</p>
<ul>
<li>CC: &lt;mesa-stable@lists.freedesktop.org&gt;</li>
<li>CC: "9.2 10.0" &lt;mesa-stable@lists.freedesktop.org&gt;</li>
<li>CC: "10.0" &lt;mesa-stable@lists.freedesktop.org&gt;</li>
</ul>
Simply adding the CC to the mesa-stable list address is adequate to nominate
the commit for the most-recently-created stable branch. It is only necessary
to specify a specific branch name, (such as "9.2 10.0" or "10.0" in the
examples above), if you want to nominate the commit for an older stable
branch. And, as in these examples, you can nominate the commit for the older
branch in addition to the more recent branch, or nominate the commit
exclusively for the older branch.
This "CC" syntax for patch nomination will cause patches to automatically be
copied to the mesa-stable@ mailing list when you use "git send-email" to send
patches to the mesa-dev@ mailing list. Also, if you realize that a commit
should be nominated for the stable branch after it has already been committed,
you can send a note directly to the mesa-stable@lists.freedesktop.org where
the Mesa stable-branch maintainers will receive it. Be sure to mention the
commit ID of the commit of interest (as it appears in the mesa master branch).
The latest set of patches that have been nominated, accepted, or rejected for
the upcoming stable release can always be seen on the
<a href="http://cworth.org/~cworth/mesa-stable-queue/">Mesa Stable Queue</a>
page.
<h3>Criteria for accepting patches to the stable branch</h3>
Mesa has a designated release manager for each stable branch, and the release
manager is the only developer that should be pushing changes to these
branches. Everyone else should simply nominate patches using the mechanism
described above.
The stable-release manager will work with the list of nominated patches, and
for each patch that meets the crtieria below will cherry-pick the patch with:
<code>git cherry-pick -x &lt;commit&gt;</code>. The <code>-x</code> option is
important so that the picked patch references the comit ID of the original
patch.
The stable-release manager may at times need to force-push changes to the
stable branches, for example, to drop a previously-picked patch that was later
identified as causing a regression). These force-pushes may cause changes to
be lost from the stable branch if developers push things directly. Consider
yourself warned.
The stable-release manager is also given broad discretion in rejecting patches
that have been nominated for the stable branch. The most basic rule is that
the stable branch is for bug fixes only, (no new features, no
regressions). Here is a non-exhaustive list of some reasons that a patch may
be rejected:
<ul>
<li>Patch introduces a regression. Any reported build breakage or other
regression caused by a particular patch, (game no longer work, piglit test
changes from PASS to FAIL), is justification for rejecting a patch.</li>
<li>Patch is too large, (say, larger than 100 lines)</li>
<li>Patch is not a fix. For example, a commit that moves code around with no
functional change should be rejected.</li>
<li>Patch fix is not clearly described. For example, a commit message
of only a single line, no description of the bug, no mention of bugzilla,
etc.</li>
<li>Patch has not obviously been reviewed, For example, the commit message
has no Reviewed-by, Signed-off-by, nor Tested-by tags from anyone but the
author.</li>
<li>Patch has not already been merged to the master branch. As a rule, bug
fixes should never be applied first to a stable branch. Patches should land
first on the master branch and then be cherry-picked to a stable
branch. (This is to avoid future releases causing regressions if the patch
is not also applied to master.) The only things that might look like
exceptions would be backports of patches from master that happen to look
significantly different.</li>
<li>Patch depends on too many other patches. Ideally, all stable-branch
patches should be self-contained. It sometimes occurs that a single, logical
bug-fix occurs as two separate patches on master, (such as an original
patch, then a subsequent fix-up to that patch). In such a case, these two
patches should be squashed into a single, self-contained patch for the
stable branch. (Of course, if the squashing makes the patch too large, then
that could be a reason to reject the patch.)</li>
<li>Patch includes new feature development, not bug fixes. New OpenGL
features, extensions, etc. should be applied to Mesa master and included in
the next major release. Stable releases are intended only for bug fixes.
Note: As an exception to this rule, the stable-release manager may accept
hardware-enabling "features". For example, backports of new code to support
a newly-developed hardware product can be accepted if they can be reasonably
determined to not have effects on other hardware.</li>
<li>Patch is a performance optimization. As a rule, performance patches are
not candidates for the stable branch. The only exception might be a case
where an application's performance was recently severely impacted so as to
become unusable. The fix for this performance regression could then be
considered for a stable branch. The optimization must also be
non-controversial and the patches still need to meet the other criteria of
being simple and self-contained</li>
<li>Patch introduces a new failure mode (such as an assert). While the new
assert might technically be correct, for example to make Mesa more
conformant, this is not the kind of "bug fix" we want in a stable
release. The potential problem here is that an OpenGL program that was
previously working, (even if technically non-compliant with the
specification), could stop working after this patch. So that would be a
regression that is unaacceptable for the stable branch.</li>
</ul>
<h2 id="release">Making a New Mesa Release</h2>
<p>
These are the instructions for making a new Mesa release.
</p>
<h3>Get latest source files</h3>
<p>
Use git to get the latest Mesa files from the git repository, from whatever
branch is relevant. This document uses the convention X.Y.Z for the release
being created, which should be created from a branch named X.Y.
</p>
<h3>Perform basic testing</h3>
<p>
The release manager should, at the very least, test the code by compiling it,
installing it, and running the latest piglit to ensure that no piglit tests
have regressed since the previous release.
</p>
<p>
The release manager should do this testing with at least one hardware driver,
(say, whatever is contained in the local development machine), as well as on
both Gallium and non-Gallium software drivers. The software testing can be
performed by running piglit with the following environment-variable set:
</p>
<pre>
LIBGL_ALWAYS_SOFTWARE=1
</pre>
And Gallium vs. non-Gallium software drivers can be obtained by using the
following configure flags on separate builds:
<pre>
--with-dri-drivers=swrast
--with-gallium-drivers=swrast
</pre>
<p>
Note: If both options are given in one build, both swrast_dri.so drivers will
be compiled, but only one will be installed. The following command can be used
to ensure the correct driver is being tested:
</p>
<pre>
LIBGL_ALWAYS_SOFTWARE=1 glxinfo | grep "renderer string"
</pre>
If any regressions are found in this testing with piglit, stop here, and do
not perform a release until regressions are fixed.
<h3>Update version in file VERSION</h3>
<p>
Increment the version contained in the file VERSION at Mesa's top-level, then
commit this change.
</p>
<h3>Create release notes for the new release</h3>
<p>
Create a new file docs/relnotes/X.Y.Z.html, (follow the style of the previous
release notes). Note that the sha256sums section of the release notes should
be empty at this point.
</p>
<p>
Two scripts are available to help generate portions of the release notes:
<pre>
./bin/bugzilla_mesa.sh
./bin/shortlog_mesa.sh
</pre>
<p>
The first script identifies commits that reference bugzilla bugs and obtains
the descriptions of those bugs from bugzilla. The second script generates a
log of all commits. In both cases, HTML-formatted lists are printed to stdout
to be included in the release notes.
</p>
<p>
Commit these changes
</p>
<h3>Make the release archives, signatures, and the release tag</h3>
<p>
From inside the Mesa directory:
<pre>
./autogen.sh
make -j1 tarballs
</pre>
<p>
After the tarballs are created, the sha256 checksums for the files will
be computed and printed. These will be used in a step below.
</p>
<p>
It's important at this point to also verify that the constructed tar file
actually builds:
</p>
<pre>
tar xjf MesaLib-X.Y.Z.tar.bz2
cd Mesa-X.Y.Z
./configure --enable-gallium-llvm
make -j6
make install
</pre>
<p>
Some touch testing should also be performed at this point, (run glxgears or
more involved OpenGL programs against the installed Mesa).
</p>
<p>
Create detached GPG signatures for each of the archive files created above:
</p>
<pre>
gpg --sign --detach MesaLib-X.Y.Z.tar.gz
gpg --sign --detach MesaLib-X.Y.Z.tar.bz2
gpg --sign --detach MesaLib-X.Y.Z.zip
</pre>
<p>
Tag the commit used for the build:
</p>
<pre>
git tag -s mesa-X.Y.X -m "Mesa X.Y.Z release"
</pre>
<p>
Note: It would be nice to investigate and fix the issue that causes the
tarballs target to fail with multiple build process, such as with "-j4". It
would also be nice to incorporate all of the above commands into a single
makefile target. And instead of a custom "tarballs" target, we should
incorporate things into the standard "make dist" and "make distcheck" targets.
</p>
<h3>Add the sha256sums to the release notes</h3>
<p>
Edit docs/relnotes/X.Y.Z.html to add the sha256sums printed as part of "make
tarballs" in the previous step. Commit this change.
</p>
<h3>Push all commits and the tag created above</h3>
<p>
This is the first step that cannot easily be undone. The release is going
forward from this point:
</p>
<pre>
git push origin X.Y --tags
</pre>
<h3>Install the release files and signatures on the distribution server</h3>
<p>
The following commands can be used to copy the release archive files and
signatures to the freedesktop.org server:
</p>
<pre>
scp MesaLib-X.Y.Z* people.freedesktop.org:
ssh people.freedesktop.org
cd /srv/ftp.freedesktop.org/pub/mesa
mkdir X.Y.Z
cd X.Y.Z
mv ~/MesaLib-X.Y.Z* .
</pre>
<h3>Back on mesa master, add the new release notes into the tree</h3>
<p>
Something like the following steps will do the trick:
</p>
<pre>
cp docs/relnotes/X.Y.Z.html /tmp
git checkout master
cp /tmp/X.Y.Z.html docs/relnotes
git add docs/relnotes/X.Y.Z.html
</pre>
<p>
Also, edit docs/relnotes.html to add a link to the new release notes, and edit
docs/index.html to add a news entry. Then commit and push:
</p>
<pre>
git commit -a -m "docs: Import X.Y.Z release notes, add news item."
git push origin
</pre>
<h3>Update the mesa3d.org website</h3>
<p>
NOTE: The recent release managers have not been performing this step
themselves, but leaving this to Brian Paul, (who has access to the
sourceforge.net hosting for mesa3d.org). Brian is more than willing to grant
the permission necessary to future release managers to do this step on their
own.
</p>
<p>
Update the web site by copying the docs/ directory's files to
/home/users/b/br/brianp/mesa-www/htdocs/ with:
<br>
<code>
sftp USERNAME,mesa3d@web.sourceforge.net
</code>
</p>
<h3>Announce the release</h3>
<p>
Make an announcement on the mailing lists:
<em>mesa-dev@lists.freedesktop.org</em>,
and
<em>mesa-announce@lists.freedesktop.org</em>
Follow the template of previously-sent release announcements. The following
command can be used to generate the log of changes to be included in the
release announcement:
<pre>
git shortlog mesa-X.Y.Z-1..mesa-X.Y.Z
</pre>
</p>
<h2 id="extensions">Adding Extensions</h2>
<p>

View File

@@ -23,44 +23,37 @@ or <a href="https://mesa.freedesktop.org/archive/">mesa.freedesktop.org</a>
(HTTP).
</p>
<p>
Starting with the first release of 2017, Mesa's version scheme is
year-based. Filenames are in the form <tt>mesa-Y.N.P.tar.gz</tt>, where
<tt>Y</tt> is the year (two digits), <tt>N</tt> is an incremental number
(starting at 0) and <tt>P</tt> is the patch number (0 for the first
release, 1 for the first patch after that).
</p>
<p>
When a new release is coming, release candidates (betas) may be found
<a href="ftp://ftp.freedesktop.org/pub/mesa/beta/">here</a>.
in the same directory, and are recognisable by the
<tt>mesa-Y.N.P-<b>rc</b>X.tar.gz</tt> filename.
</p>
<h1>Unpacking</h1>
<p>
Mesa releases are available in three formats: .tar.bz2, .tar.gz, and .zip
Mesa releases are available in two formats: <tt>.tar.xz</tt> and <tt>.tar.gz</tt>.
</p>
<p>
To unpack .tar.gz files:
</p>
To unpack the tarball:
<pre>
tar zxf MesaLib-x.y.z.tar.gz
tar xf mesa-Y.N.P.tar.xz
</pre>
or
<pre>
gzcat MesaLib-x.y.z.tar.gz | tar xf -
tar xf mesa-Y.N.P.tar.gz
</pre>
or
<pre>
gunzip MesaLib-x.y.z.tar.gz ; tar xf MesaLib-x.y.z.tar
</pre>
<p>
To unpack .tar.bz2 files:
</p>
<pre>
bunzip2 -c MesaLib-x.y.z.tar.gz | tar xf -
</pre>
<p>
To unpack .zip files:
</p>
<pre>
unzip MesaLib-x.y.z.zip
</pre>
<h1>Contents</h1>
@@ -69,8 +62,8 @@ To unpack .zip files:
After unpacking you'll have these files and directories (among others):
</p>
<pre>
Makefile - top-level Makefile for most systems
configs/ - makefile parameter files for various systems
autogen.sh - Autoconf script for *nix systems
scons/ - SCons script for Windows builds
include/ - GL header (include) files
bin/ - shell scripts for making shared libraries, etc
docs/ - documentation
@@ -109,9 +102,9 @@ In the past, GLUT, GLU and the Mesa demos were released in conjunction with
Mesa releases. But since GLUT, GLU and the demos change infrequently, they
were split off into their own git repositories:
<a href="http://cgit.freedesktop.org/mesa/glut/">GLUT</a>,
<a href="http://cgit.freedesktop.org/mesa/glu/">GLU</a> and
<a href="http://cgit.freedesktop.org/mesa/demos/">Demos</a>,
<a href="https://cgit.freedesktop.org/mesa/glut/">GLUT</a>,
<a href="https://cgit.freedesktop.org/mesa/glu/">GLU</a> and
<a href="https://cgit.freedesktop.org/mesa/demos/">Demos</a>,
</p>
</div>

View File

@@ -18,8 +18,8 @@
<p>The current version of EGL in Mesa implements EGL 1.4. More information
about EGL can be found at
<a href="http://www.khronos.org/egl/">
http://www.khronos.org/egl/</a>.</p>
<a href="https://www.khronos.org/egl/">
https://www.khronos.org/egl/</a>.</p>
<p>The Mesa's implementation of EGL uses a driver architecture. The main
library (<code>libEGL</code>) is window system neutral. It provides the EGL
@@ -44,7 +44,7 @@ the driver for your hardware. For example</p>
<p>The main library and OpenGL is enabled by default. The first two options
above enables <a href="opengles.html">OpenGL ES 1.x and 2.x</a>. The last two
options enables the listed classic and and Gallium drivers respectively.</p>
options enables the listed classic and Gallium drivers respectively.</p>
</li>
@@ -83,9 +83,9 @@ drivers will be installed to <code>${libdir}/egl</code>.</p>
<p>List the platforms (window systems) to support. Its argument is a comma
separated string such as <code>--with-egl-platforms=x11,drm</code>. It decides
the platforms a driver may support. The first listed platform is also used by
the main library to decide the native platform: the platform the EGL native
the main library to decide the native platform: this defines EGL native
types such as <code>EGLNativeDisplayType</code> or
<code>EGLNativeWindowType</code> defined for.</p>
<code>EGLNativeWindowType</code>.</p>
<p>The available platforms are <code>x11</code>, <code>drm</code>,
<code>wayland</code>, <code>surfaceless</code>, <code>android</code>,

View File

@@ -60,6 +60,8 @@ sometimes be useful for debugging end-user issues.
<li>flush - flush after each drawing command</li>
<li>incomplete_tex - extra debug messages when a texture is incomplete</li>
<li>incomplete_fbo - extra debug messages when a fbo is incomplete</li>
<li>context - create a debug context (see GLX_CONTEXT_DEBUG_BIT_ARB) and
print error and performance messages to stderr (or MESA_LOG_FILE).</li>
</ul>
<li>MESA_LOG_FILE - specifies a file name for logging all errors, warnings,
etc., rather than stderr
@@ -112,6 +114,20 @@ glGetString(GL_VERSION) for OpenGL ES.
glGetString(GL_SHADING_LANGUAGE_VERSION). Valid values are integers, such as
"130". Mesa will not really implement all the features of the given language version
if it's higher than what's normally reported. (for developers only)
<li>MESA_GLSL_CACHE_DISABLE - if set, disables the GLSL shader cache
<li>MESA_GLSL_CACHE_MAX_SIZE - if set, determines the maximum size of
the on-disk cache of compiled GLSL programs. Should be set to a number
optionally followed by 'K', 'M', or 'G' to specify a size in
kilobytes, megabytes, or gigabytes. By default, gigabytes will be
assumed. And if unset, a maximum size of 1GB will be used. Note: A separate
cache might be created for each architecture that Mesa is installed for on
your system. For example under the default settings you may end up with a 1GB
cache for x86_64 and another 1GB cache for i386.
<li>MESA_GLSL_CACHE_DIR - if set, determines the directory to be used
for the on-disk cache of compiled GLSL programs. If this variable is
not set, then the cache will be stored in $XDG_CACHE_HOME/mesa (if
that variable is set), or else within .cache/mesa within the user's
home directory.
<li>MESA_GLSL - <a href="shading.html#envvars">shading language compiler options</a>
<li>MESA_NO_MINMAX_CACHE - when set, the minmax index cache is globally disabled.
</ul>
@@ -144,6 +160,7 @@ See the <a href="xlibdriver.html">Xlib software driver page</a> for details.
This is useful for debugging hangs, etc.</li>
<li>INTEL_DEBUG - a comma-separated list of named flags, which do various things:
<ul>
<li>color - use color in output</li>
<li>tex - emit messages about textures.</li>
<li>state - emit messages about state flag tracking</li>
<li>blit - emit messages about blit operations</li>
@@ -185,6 +202,8 @@ See the <a href="xlibdriver.html">Xlib software driver page</a> for details.
<li>do32 - generate compute shader SIMD32 programs even if workgroup size doesn't exceed the SIMD16 limit</li>
<li>norbc - disable single sampled render buffer compression</li>
</ul>
<li>INTEL_PRECISE_TRIG - if set to 1, true or yes, then the driver prefers
accuracy over performance in trig functions.</li>
</ul>
@@ -217,6 +236,8 @@ Mesa EGL supports different sets of environment variables. See the
disable for unencumbered viewing the rest of the time. For example, set
GALLIUM_HUD_VISIBLE to false and GALLIUM_HUD_TOGGLE_SIGNAL to 10 (SIGUSR1).
Use kill -10 <pid> to toggle the hud as desired.
<li>GALLIUM_HUD_DUMP_DIR - specifies a directory for writing the displayed
hud values into files.
<li>GALLIUM_DRIVER - useful in combination with LIBGL_ALWAYS_SOFTWARE=1 for
choosing one of the software renderers "softpipe", "llvmpipe" or "swr".
<li>GALLIUM_LOG_FILE - specifies a file for logging all errors, warnings, etc.
@@ -235,6 +256,21 @@ Setting to "tgsi", for example, will print all the TGSI shaders.
See src/mesa/state_tracker/st_debug.c for other options.
</ul>
<h3>Clover state tracker environment variables</h3>
<ul>
<li>CLOVER_EXTRA_BUILD_OPTIONS - allows specifying additional compiler and linker
options. Specified options are appended after the options set by the OpenCL
program in clBuildProgram.
<li>CLOVER_EXTRA_COMPILE_OPTIONS - allows specifying additional compiler
options. Specified options are appended after the options set by the OpenCL
program in clCompileProgram.
<li>CLOVER_EXTRA_LINK_OPTIONS - allows specifying additional linker
options. Specified options are appended after the options set by the OpenCL
program in clLinkProgram.
</ul>
<h3>Softpipe driver environment variables</h3>
<ul>
<li>SOFTPIPE_DUMP_FS - if set, the softpipe driver will print fragment shaders

View File

@@ -41,7 +41,7 @@ Last updated: 9 October 2012
<p>
Mesa is an open-source implementation of the OpenGL specification.
OpenGL is a programming library for writing interactive 3D applications.
See the <a href="http://www.opengl.org/">OpenGL website</a> for more
See the <a href="https://www.opengl.org/">OpenGL website</a> for more
information.
</p>
<p>
@@ -55,13 +55,13 @@ Yes. Specifically, Mesa serves as the OpenGL core for the open-source DRI
drivers for X.org.
</p>
<ul>
<li>See the <a href="http://dri.freedesktop.org/">DRI website</a>
<li>See the <a href="https://dri.freedesktop.org/">DRI website</a>
for more information.</li>
<li>See <a href="https://01.org/linuxgraphics">01.org</a>
for more information about Intel drivers.</li>
<li>See <a href="http://nouveau.freedesktop.org">nouveau.freedesktop.org</a>
<li>See <a href="https://nouveau.freedesktop.org">nouveau.freedesktop.org</a>
for more information about Nouveau drivers.</li>
<li>See <a href="http://www.x.org/wiki/RadeonFeature">www.x.org/wiki/RadeonFeature</a>
<li>See <a href="https://www.x.org/wiki/RadeonFeature">www.x.org/wiki/RadeonFeature</a>
for more information about Radeon drivers.</li>
</ul>
@@ -144,7 +144,7 @@ Mesa is much more up to date with modern features and extensions.
</p>
<p>
<a href="http://sourceforge.net/projects/ogl-es/">Vincent</a> is
<a href="https://sourceforge.net/projects/ogl-es/">Vincent</a> is
an open-source implementation of OpenGL ES for mobile devices.
<p>
@@ -157,7 +157,7 @@ is a subset of OpenGL.
</p>
<p>
<a href="http://sourceforge.net/projects/softgl/">SoftGL</a>
<a href="https://sourceforge.net/projects/softgl/">SoftGL</a>
is an OpenGL subset for mobile devices.
</p>
@@ -213,7 +213,7 @@ If you don't already have GLUT installed, you should grab
<h2>2.4 Where is the GLw library?</h2>
<p>
GLw (OpenGL widget library) is now available from a separate <a href="http://cgit.freedesktop.org/mesa/glw/">git repository</a>. Unless you're using very old Xt/Motif applications with OpenGL, you shouldn't need it.
GLw (OpenGL widget library) is now available from a separate <a href="https://cgit.freedesktop.org/mesa/glw/">git repository</a>. Unless you're using very old Xt/Motif applications with OpenGL, you shouldn't need it.
</p>
@@ -276,7 +276,7 @@ If you're using a hardware accelerated driver you want <code>direct rendering: Y
</p>
<p>
If your DRI-based driver isn't working, go to the
<a href="http://dri.freedesktop.org/">DRI website</a> for trouble-shooting information.
<a href="https://dri.freedesktop.org/">DRI website</a> for trouble-shooting information.
</p>
@@ -284,7 +284,7 @@ If your DRI-based driver isn't working, go to the
<p>
Make sure the ratio of the far to near clipping planes isn't too great.
Look
<a href="http://www.opengl.org/resources/faq/technical/depthbuffer.htm#0040">here</a>
<a href="https://www.opengl.org/resources/faq/technical/depthbuffer.htm#0040">here</a>
for details.
</p>
<p>
@@ -339,7 +339,7 @@ First, join the <a href="lists.html">mesa-dev mailing list</a>.
That's where Mesa development is discussed.
</p>
<p>
The <a href="http://www.opengl.org/documentation">
The <a href="https://www.opengl.org/documentation">
OpenGL Specification</a> is the bible for OpenGL implementation work.
You should read it.
</p>
@@ -383,7 +383,7 @@ implement the extension (specifically the compression/decompression
algorithms).
</p>
<p>
In the mean time, a 3rd party <a href="http://dri.freedesktop.org/wiki/S3TC">
In the mean time, a 3rd party <a href="https://dri.freedesktop.org/wiki/S3TC">
plug-in library</a> is available.
</p>

View File

@@ -33,7 +33,7 @@ are exposed in the 3.0 context as extensions.
Feature Status
------------------------------------------------------- ------------------------
GL 3.0, GLSL 1.30 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr
GL 3.0, GLSL 1.30 --- all DONE: freedreno, i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr
glBindFragDataLocation, glGetFragDataLocation DONE
GL_NV_conditional_render (Conditional rendering) DONE ()
@@ -60,12 +60,12 @@ GL 3.0, GLSL 1.30 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, soft
glVertexAttribI commands DONE
Depth format cube textures DONE ()
GLX_ARB_create_context (GLX 1.4 is required) DONE
Multisample anti-aliasing DONE (llvmpipe (*), softpipe (*), swr (*))
Multisample anti-aliasing DONE (freedreno (*), llvmpipe (*), softpipe (*), swr (*))
(*) llvmpipe, softpipe, and swr have fake Multisample anti-aliasing support
(*) freedreno, llvmpipe, softpipe, and swr have fake Multisample anti-aliasing support
GL 3.1, GLSL 1.40 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr
GL 3.1, GLSL 1.40 --- all DONE: freedreno, i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr
Forward compatible context support/deprecations DONE ()
GL_ARB_draw_instanced (Instanced drawing) DONE ()
@@ -78,38 +78,38 @@ GL 3.1, GLSL 1.40 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, soft
GL_EXT_texture_snorm (Signed normalized textures) DONE ()
GL 3.2, GLSL 1.50 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe
GL 3.2, GLSL 1.50 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr
Core/compatibility profiles DONE
Geometry shaders DONE ()
GL_ARB_vertex_array_bgra (BGRA vertex order) DONE (swr)
GL_ARB_draw_elements_base_vertex (Base vertex offset) DONE (swr)
GL_ARB_fragment_coord_conventions (Frag shader coord) DONE (swr)
GL_ARB_provoking_vertex (Provoking vertex) DONE (swr)
GL_ARB_seamless_cube_map (Seamless cubemaps) DONE (swr)
GL_ARB_texture_multisample (Multisample textures) DONE (swr)
GL_ARB_depth_clamp (Frag depth clamp) DONE (swr)
GL_ARB_sync (Fence objects) DONE (swr)
GL_ARB_vertex_array_bgra (BGRA vertex order) DONE (freedreno)
GL_ARB_draw_elements_base_vertex (Base vertex offset) DONE (freedreno)
GL_ARB_fragment_coord_conventions (Frag shader coord) DONE (freedreno)
GL_ARB_provoking_vertex (Provoking vertex) DONE (freedreno)
GL_ARB_seamless_cube_map (Seamless cubemaps) DONE (freedreno)
GL_ARB_texture_multisample (Multisample textures) DONE ()
GL_ARB_depth_clamp (Frag depth clamp) DONE (freedreno)
GL_ARB_sync (Fence objects) DONE (freedreno)
GLX_ARB_create_context_profile DONE
GL 3.3, GLSL 3.30 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe
GL_ARB_blend_func_extended DONE (swr)
GL_ARB_blend_func_extended DONE (freedreno/a3xx, swr)
GL_ARB_explicit_attrib_location DONE (all drivers that support GLSL)
GL_ARB_occlusion_query2 DONE (swr)
GL_ARB_occlusion_query2 DONE (freedreno, swr)
GL_ARB_sampler_objects DONE (all drivers)
GL_ARB_shader_bit_encoding DONE (swr)
GL_ARB_texture_rgb10_a2ui DONE (swr)
GL_ARB_texture_swizzle DONE (swr)
GL_ARB_shader_bit_encoding DONE (freedreno, swr)
GL_ARB_texture_rgb10_a2ui DONE (freedreno, swr)
GL_ARB_texture_swizzle DONE (freedreno, swr)
GL_ARB_timer_query DONE (swr)
GL_ARB_instanced_arrays DONE (swr)
GL_ARB_vertex_type_2_10_10_10_rev DONE (swr)
GL_ARB_instanced_arrays DONE (freedreno, swr)
GL_ARB_vertex_type_2_10_10_10_rev DONE (freedreno, swr)
GL 4.0, GLSL 4.00 --- all DONE: i965/gen8+, nvc0, r600, radeonsi
GL 4.0, GLSL 4.00 --- all DONE: i965/gen7+, nvc0, r600, radeonsi
GL_ARB_draw_buffers_blend DONE (i965/gen6+, nv50, llvmpipe, softpipe, swr)
GL_ARB_draw_buffers_blend DONE (freedreno, i965/gen6+, nv50, llvmpipe, softpipe, swr)
GL_ARB_draw_indirect DONE (i965/gen7+, llvmpipe, softpipe, swr)
GL_ARB_gpu_shader5 DONE (i965/gen7+)
- 'precise' qualifier DONE
@@ -124,7 +124,7 @@ GL 4.0, GLSL 4.00 --- all DONE: i965/gen8+, nvc0, r600, radeonsi
- Enhanced per-sample shading DONE ()
- Interpolation functions DONE ()
- New overload resolution rules DONE
GL_ARB_gpu_shader_fp64 DONE (llvmpipe, softpipe)
GL_ARB_gpu_shader_fp64 DONE (i965/gen7+, llvmpipe, softpipe)
GL_ARB_sample_shading DONE (i965/gen6+, nv50)
GL_ARB_shader_subroutine DONE (i965/gen6+, nv50, llvmpipe, softpipe, swr)
GL_ARB_tessellation_shader DONE (i965/gen7+)
@@ -132,21 +132,21 @@ GL 4.0, GLSL 4.00 --- all DONE: i965/gen8+, nvc0, r600, radeonsi
GL_ARB_texture_cube_map_array DONE (i965/gen6+, nv50, llvmpipe, softpipe)
GL_ARB_texture_gather DONE (i965/gen6+, nv50, llvmpipe, softpipe, swr)
GL_ARB_texture_query_lod DONE (i965, nv50, softpipe)
GL_ARB_transform_feedback2 DONE (i965/gen7+, nv50, llvmpipe, softpipe, swr)
GL_ARB_transform_feedback3 DONE (i965/gen7+, nv50, llvmpipe, softpipe, swr)
GL_ARB_transform_feedback2 DONE (i965/gen6+, nv50, llvmpipe, softpipe, swr)
GL_ARB_transform_feedback3 DONE (i965/gen7+, llvmpipe, softpipe, swr)
GL 4.1, GLSL 4.10 --- all DONE: i965/gen8+, nvc0, r600, radeonsi
GL 4.1, GLSL 4.10 --- all DONE: i965/gen7+, nvc0, r600, radeonsi
GL_ARB_ES2_compatibility DONE (i965, nv50, llvmpipe, softpipe, swr)
GL_ARB_get_program_binary DONE (0 binary formats)
GL_ARB_separate_shader_objects DONE (all drivers)
GL_ARB_shader_precision DONE (all drivers that support GLSL 4.10)
GL_ARB_vertex_attrib_64bit DONE (llvmpipe, softpipe)
GL_ARB_shader_precision DONE (i965/gen7+, all drivers that support GLSL 4.10)
GL_ARB_vertex_attrib_64bit DONE (i965/gen7+, llvmpipe, softpipe)
GL_ARB_viewport_array DONE (i965, nv50, llvmpipe, softpipe)
GL 4.2, GLSL 4.20 -- all DONE: i965/gen8+, nvc0, radeonsi
GL 4.2, GLSL 4.20 -- all DONE: i965/gen7+, nvc0, radeonsi
GL_ARB_texture_compression_bptc DONE (i965, r600)
GL_ARB_compressed_texture_pixel_storage DONE (all drivers)
@@ -191,8 +191,8 @@ GL 4.3, GLSL 4.30 -- all DONE: i965/gen8+, nvc0, radeonsi
GL 4.4, GLSL 4.40 -- all DONE: i965/gen8+, nvc0, radeonsi
GL_MAX_VERTEX_ATTRIB_STRIDE DONE (all drivers)
GL_ARB_buffer_storage DONE (i965, nv50, r600)
GL_ARB_clear_texture DONE (i965, nv50, r600)
GL_ARB_buffer_storage DONE (i965, nv50, r600, llvmpipe, swr)
GL_ARB_clear_texture DONE (i965, nv50, r600, llvmpipe, softpipe, swr)
GL_ARB_enhanced_layouts DONE (i965, nv50, llvmpipe, softpipe)
- compile-time constant expressions DONE
- explicit byte offsets for blocks DONE
@@ -253,25 +253,25 @@ GLES3.1, GLSL ES 3.1 -- all DONE: i965/hsw+, nvc0, radeonsi
GLES3.2, GLSL ES 3.2 -- all DONE: i965/gen9+
GL_EXT_color_buffer_float DONE (all drivers)
GL_KHR_blend_equation_advanced DONE (i965)
GL_KHR_blend_equation_advanced DONE (i965, nvc0)
GL_KHR_debug DONE (all drivers)
GL_KHR_robustness DONE (i965, nvc0, radeonsi)
GL_KHR_texture_compression_astc_ldr DONE (i965/gen9+)
GL_OES_copy_image DONE (all drivers)
GL_OES_draw_buffers_indexed DONE (all drivers that support GL_ARB_draw_buffers_blend)
GL_OES_draw_elements_base_vertex DONE (all drivers)
GL_OES_geometry_shader DONE (i965/gen8+, nvc0, radeonsi)
GL_OES_geometry_shader DONE (i965/hsw+, nvc0, radeonsi)
GL_OES_gpu_shader5 DONE (all drivers that support GL_ARB_gpu_shader5)
GL_OES_primitive_bounding_box DONE (i965/gen7+, nvc0, radeonsi)
GL_OES_sample_shading DONE (i965, nvc0, r600, radeonsi)
GL_OES_sample_variables DONE (i965, nvc0, r600, radeonsi)
GL_OES_shader_image_atomic DONE (all drivers that support GL_ARB_shader_image_load_store)
GL_OES_shader_io_blocks DONE (i965/gen8+, nvc0, radeonsi)
GL_OES_shader_io_blocks DONE (All drivers that support GLES 3.1)
GL_OES_shader_multisample_interpolation DONE (i965, nvc0, r600, radeonsi)
GL_OES_tessellation_shader DONE (all drivers that support GL_ARB_tessellation_shader)
GL_OES_texture_border_clamp DONE (all drivers)
GL_OES_texture_buffer DONE (i965, nvc0, radeonsi)
GL_OES_texture_cube_map_array DONE (i965/gen8+, nvc0, radeonsi)
GL_OES_texture_cube_map_array DONE (i965/hsw+, nvc0, radeonsi)
GL_OES_texture_stencil8 DONE (all drivers that support GL_ARB_texture_stencil8)
GL_OES_texture_storage_multisample_2d_array DONE (all drivers that support GL_ARB_texture_multisample)
@@ -283,27 +283,27 @@ Khronos, ARB, and OES extensions that are not part of any OpenGL or OpenGL ES ve
GL_ARB_ES3_2_compatibility DONE (i965/gen8+)
GL_ARB_fragment_shader_interlock not started
GL_ARB_gl_spirv not started
GL_ARB_gpu_shader_int64 started (airlied for core and Gallium, idr for i965)
GL_ARB_gpu_shader_int64 DONE (i965/gen8+, nvc0, radeonsi, softpipe, llvmpipe)
GL_ARB_indirect_parameters DONE (nvc0, radeonsi)
GL_ARB_parallel_shader_compile not started, but Chia-I Wu did some related work in 2014
GL_ARB_pipeline_statistics_query DONE (i965, nvc0, radeonsi, softpipe, swr)
GL_ARB_post_depth_coverage not started
GL_ARB_post_depth_coverage DONE (i965)
GL_ARB_robustness_isolation not started
GL_ARB_sample_locations not started
GL_ARB_seamless_cubemap_per_texture DONE (i965, nvc0, radeonsi, r600, softpipe, swr)
GL_ARB_shader_atomic_counter_ops DONE (nvc0, radeonsi, softpipe)
GL_ARB_shader_ballot not started
GL_ARB_shader_clock DONE (i965/gen7+)
GL_ARB_shader_atomic_counter_ops DONE (i965/gen7+, nvc0, radeonsi, softpipe)
GL_ARB_shader_ballot DONE (nvc0, radeonsi)
GL_ARB_shader_clock DONE (i965/gen7+, nv50, nvc0, radeonsi)
GL_ARB_shader_draw_parameters DONE (i965, nvc0, radeonsi)
GL_ARB_shader_group_vote DONE (nvc0)
GL_ARB_shader_group_vote DONE (nvc0, radeonsi)
GL_ARB_shader_stencil_export DONE (i965/gen9+, radeonsi, softpipe, llvmpipe, swr)
GL_ARB_shader_viewport_layer_array DONE (i965/gen6+)
GL_ARB_sparse_buffer not started
GL_ARB_shader_viewport_layer_array DONE (i965/gen6+, radeonsi)
GL_ARB_sparse_buffer DONE (radeonsi/CIK+)
GL_ARB_sparse_texture not started
GL_ARB_sparse_texture2 not started
GL_ARB_sparse_texture_clamp not started
GL_ARB_texture_filter_minmax not started
GL_ARB_transform_feedback_overflow_query not started
GL_ARB_transform_feedback_overflow_query DONE (i965/gen6+)
GL_KHR_blend_equation_advanced_coherent DONE (i965/gen9+)
GL_KHR_no_error not started
GL_KHR_texture_compression_astc_hdr DONE (core only)
@@ -333,5 +333,6 @@ we DO NOT WANT implementations of these extensions for Mesa.
GL_ARB_shadow_ambient Superseded by GL_ARB_fragment_program
GL_ARB_vertex_blend Superseded by GL_ARB_vertex_program
More info about these features and the work involved can be found at
http://dri.freedesktop.org/wiki/MissingFunctionality
A graphical representation of this information can be found at
https://mesamatrix.net/

View File

@@ -24,7 +24,7 @@ Here are some specific ideas and areas where help would be appreciated:
<ol>
<li>
<b>Driver patching and testing.</b>
Patches are often posted to the <a href="http://lists.freedesktop.org/mailman/listinfo/mesa-dev">mesa-dev mailing list</a>, but aren't
Patches are often posted to the <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev">mesa-dev mailing list</a>, but aren't
immediately checked into git because not enough people are testing them.
Just applying patches, testing and reporting back is helpful.
<li>
@@ -39,7 +39,7 @@ issues in the code.
Fixing MSVC builds.
<li>
<b>Contribute more tests to
<a href="http://piglit.freedesktop.org/">Piglit</a>.</b>
<a href="https://piglit.freedesktop.org/">Piglit</a>.</b>
<li>
<b>Automatic testing.
</b>
@@ -56,9 +56,9 @@ You can find some further To-do lists here:
<b>Common To-Do lists:</b>
</p>
<ul>
<li><a href="http://cgit.freedesktop.org/mesa/mesa/tree/docs/features.txt">
<li><a href="https://cgit.freedesktop.org/mesa/mesa/tree/docs/features.txt">
<b>features.txt</b></a> - Status of OpenGL 3.x / 4.x features in Mesa.</li>
<li><a href="http://dri.freedesktop.org/wiki/MissingFunctionality">
<li><a href="https://dri.freedesktop.org/wiki/MissingFunctionality">
<b>MissingFunctionality</b></a> - Detailed information about missing OpenGL features.</li>
</ul>
@@ -66,15 +66,15 @@ You can find some further To-do lists here:
<b>Driver specific To-Do lists:</b>
</p>
<ul>
<li><a href="http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/docs/llvm-todo.txt">
<li><a href="https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/docs/llvm-todo.txt">
<b>LLVMpipe</b></a> - Software driver using LLVM for runtime code generation.</li>
<li><a href="http://dri.freedesktop.org/wiki/RadeonsiToDo">
<li><a href="https://dri.freedesktop.org/wiki/RadeonsiToDo">
<b>radeonsi</b></a> - Driver for AMD Southern Island.</li>
<li><a href="http://dri.freedesktop.org/wiki/R600ToDo">
<li><a href="https://dri.freedesktop.org/wiki/R600ToDo">
<b>r600g</b></a> - Driver for ATI/AMD R600 - Northern Island.</li>
<li><a href="http://dri.freedesktop.org/wiki/R300ToDo">
<li><a href="https://dri.freedesktop.org/wiki/R300ToDo">
<b>r300g</b></a> - Driver for ATI R300 - R500.</li>
<li><a href="http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/i915/TODO">
<li><a href="https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/i915/TODO">
<b>i915g</b></a> - Driver for Intel i915/i945.</li>
</ul>

View File

@@ -16,6 +16,101 @@
<h1>News</h1>
<h2>April 1, 2017</h2>
<p>
<a href="relnotes/17.0.3.html">Mesa 17.0.3</a> is released.
This is a bug-fix release.
</p>
<h2>March 20, 2017</h2>
<p>
<a href="relnotes/13.0.6.html">Mesa 13.0.6</a> and
<a href="relnotes/17.0.2.html">Mesa 17.0.2</a> are released.
These are bug-fix releases from the 13.0 and 17.0 branches, respectively.
<br>
NOTE: It is anticipated that 13.0.6 will be the final release in the 13.0
series. Users of 13.0 are encouraged to migrate to the 17.0 series in order
to obtain future fixes.
</p>
<h2>March 4, 2017</h2>
<p>
<a href="relnotes/17.0.1.html">Mesa 17.0.1</a> is released.
This is a bug-fix release.
</p>
<h2>February 20, 2017</h2>
<p>
<a href="relnotes/13.0.5.html">Mesa 13.0.5</a> is released.
This is a bug-fix release.
</p>
<h2>February 13, 2017</h2>
<p>
<a href="relnotes/17.0.0.html">Mesa 17.0.0</a> is released. This is a
new development release. See the release notes for more information
about the release.
</p>
<h2>February 1, 2017</h2>
<p>
<a href="relnotes/13.0.4.html">Mesa 13.0.4</a> is released.
This is a bug-fix release.
</p>
<h2>January 23, 2017</h2>
<p>
<a href="relnotes/12.0.6.html">Mesa 12.0.6</a> is released.
This is a bug-fix release.
<br>
NOTE: This is an extra release for the 12.0 stable branch, as per developers'
feedback. It is anticipated that 12.0.6 will be the final release in the 12.0
series. Users of 12.0 are encouraged to migrate to the 13.0 series in order
to obtain future fixes.
</p>
<h2>January 5, 2017</h2>
<p>
<a href="relnotes/13.0.3.html">Mesa 13.0.3</a> is released.
This is a bug-fix release.
</p>
<h2>December 5, 2016</h2>
<p>
<a href="relnotes/12.0.5.html">Mesa 12.0.5</a> is released.
This is a bug-fix release.
<br>
NOTE: It is anticipated that 12.0.5 will be the final release in the 12.0
series. Users of 12.0 are encouraged to migrate to the 13.0 series in order
to obtain future fixes.
</p>
<h2>November 28, 2016</h2>
<p>
<a href="relnotes/13.0.2.html">Mesa 13.0.2</a> is released.
This is a bug-fix release.
</p>
<h2>November 14, 2016</h2>
<p>
<a href="relnotes/13.0.1.html">Mesa 13.0.1</a> is released.
This is a bug-fix release.
</p>
<h2>November 10, 2016</h2>
<p>
<a href="relnotes/12.0.4.html">Mesa 12.0.4</a> is released.
This is a bug-fix release.
</p>
<h2>November 1, 2016</h2>
<p>
<a href="relnotes/13.0.0.html">Mesa 13.0.0</a> is released. This is a
new development release. See the release notes for more information
about the release.
</p>
<h2>September 15, 2016</h2>
<p>
<a href="relnotes/12.0.3.html">Mesa 12.0.3</a> is released.
@@ -110,7 +205,7 @@ This is a bug-fix release.
</p>
<p>
Mesa demos 8.3.0 is also released.
See the <a href="http://lists.freedesktop.org/archives/mesa-announce/2015-December/000191.html">announcement</a> for more information about the release.
See the <a href="https://lists.freedesktop.org/archives/mesa-announce/2015-December/000191.html">announcement</a> for more information about the release.
You can download it from <a href="ftp://ftp.freedesktop.org/pub/mesa/demos/8.3.0/">ftp.freedesktop.org/pub/mesa/demos/8.3.0/</a>.
</p>
@@ -425,7 +520,7 @@ This is a bug-fix release.
<p>
Mesa demos 8.2.0 is released.
See the <a href="http://lists.freedesktop.org/archives/mesa-announce/2014-July/000100.html">announcement</a> for more information about the release.
See the <a href="https://lists.freedesktop.org/archives/mesa-announce/2014-July/000100.html">announcement</a> for more information about the release.
You can download it from <a href="ftp://ftp.freedesktop.org/pub/mesa/demos/8.2.0/">ftp.freedesktop.org/pub/mesa/demos/8.2.0/</a>.
</p>
@@ -604,7 +699,7 @@ This is a bug fix release.
<p>
Mesa demos 8.1.0 is released.
See the <a href="http://lists.freedesktop.org/archives/mesa-dev/2013-February/035180.html">announcement</a> for more information about the release.
See the <a href="https://lists.freedesktop.org/archives/mesa-dev/2013-February/035180.html">announcement</a> for more information about the release.
You can download it from <a href="ftp://ftp.freedesktop.org/pub/mesa/demos/8.1.0/">ftp.freedesktop.org/pub/mesa/demos/8.1.0/</a>.
</p>
@@ -1300,7 +1395,7 @@ and primarily just incorporates bug fixes.
<h2>December 28, 2003</h2>
<p>
The Mesa CVS server has been moved to <a href="http://www.freedesktop.org">
The Mesa CVS server has been moved to <a href="https://www.freedesktop.org">
freedesktop.org</a> because of problems with SourceForge's anonymous
CVS service.
</p>
@@ -1872,7 +1967,7 @@ Here's what's new:</p>
</pre>
<h2>March 23, 2000</h2>
<p>I've just upload the Mesa 3.2 beta 1 files to SourceForge at <a href="http://sourceforge.net/project/showfiles.php?group_id=3">http://sourceforge.net/project/filelist.php?group_id=3</a></p>
<p>I've just upload the Mesa 3.2 beta 1 files to SourceForge at <a href="https://sourceforge.net/project/showfiles.php?group_id=3">https://sourceforge.net/project/filelist.php?group_id=3</a></p>
<p>3.2 (note even number) is a stabilization release of Mesa 3.1 meaning it's mainly
just bug fixes.</p>
<p>Here's what's changed:</p>
@@ -1920,7 +2015,7 @@ After 3.2 is wrapped up I hope to release 3.3 beta 1 soon afterward.</p>
<h2>December 17, 1999</h2>
<p>A Slashdot interview with Brian about Mesa (questions submitted by Slashdot readers)
can be found at <a href="http://slashdot.org/interviews/99/12/17/0927212.shtml">http://slashdot.org/interviews/99/12/17/0927212.shtml</a>.</p>
can be found at <a href="https://slashdot.org/interviews/99/12/17/0927212.shtml">https://slashdot.org/interviews/99/12/17/0927212.shtml</a>.</p>
<h2>December 14, 1999</h2>
<p>Mesa 3.1 is released!</p>
@@ -1954,7 +2049,7 @@ BOF meeting is now available.</p>
<p>-Brian</p>
<h2>August 14, 1999</h2>
<p><a href="http://www.mesa3d.org">www.mesa3d.org</a> is having
<p><a href="https://www.mesa3d.org">www.mesa3d.org</a> is having
technical problems due to hardware failures at VA Linux systems. The Mac pages,
ftp, and CVS services aren't fully restored yet. Please be patient.</p>
<p>-Brian</p>
@@ -1963,9 +2058,9 @@ ftp, and CVS services aren't fully restored yet. Please be patient.</p>
<p>RPMS of the nVidia RIVA server can be found at <code>ftp://ftp.mesa3d.org/mesa/misc/nVidia/</code>.</p>
<h2>June 2, 1999</h2>
<p><a href="http://www.nvidia.com/">nVidia</a> has released some Linux binaries for
<p><a href="https://www.nvidia.com/">nVidia</a> has released some Linux binaries for
xfree86 3.3.3.1, along with the <b>full source</b>, which includes GLX acceleration
based on Mesa 3.0. They can be downloaded from <code>http://www.nvidia.com/Products.nsf/htmlmedia/software_drivers.html</code>.</p>
based on Mesa 3.0. They can be downloaded from <code>https://www.nvidia.com/Products.nsf/htmlmedia/software_drivers.html</code>.</p>
<h2>May 24, 1999</h2>
<p>Beta 2 of Mesa 3.1 has been make available at <code>ftp://ftp.mesa3d.org/mesa/beta/</code>.
@@ -2013,11 +2108,11 @@ grateful.
<p>The new webpages are now online. Enjoy, and let me know if you find any errors.
<h2>February 16, 1999</h2>
<p><a href="http://www.sgi.com/">SGI</a> releases its
<a href="http://www.sgi.com/software/opensource/glx/">GLX source code</a>.</p>
<p><a href="https://www.sgi.com/">SGI</a> releases its
<a href="https://www.sgi.com/software/opensource/glx/">GLX source code</a>.</p>
<h2>January 22, 1999</h2>
<p><a href="http://www.mesa3d.org">www.mesa3d.org</a> established</p>
<p><a href="https://www.mesa3d.org">www.mesa3d.org</a> established</p>
</div>
</body>

View File

@@ -24,7 +24,7 @@
</ul>
<li><a href="#autoconf">Building with autoconf (Linux/Unix/X11)</a>
<li><a href="#scons">Building with SCons (Windows/Linux)</a>
<li><a href="#other">Building for other systems</a>
<li><a href="#android">Building with AOSP (Android)</a>
<li><a href="#libs">Library Information</a>
<li><a href="#pkg-config">Building OpenGL programs with pkg-config</a>
</ol>
@@ -33,62 +33,85 @@
<h1 id="prereq-general">1. Prerequisites for building</h1>
<h2>1.1 General</h2>
<p>
Build system.
</p>
<ul>
<li><a href="http://www.python.org/">Python</a> - Python is required.
Version 2.6.4 or later should work.
</li>
<br>
<li><a href="http://www.makotemplates.org/">Python Mako module</a> -
Python Mako module is required. Version 0.3.4 or later should work.
</li>
</br>
<li>Autoconf is required when building on *nix platforms.
<li><a href="http://www.scons.org/">SCons</a> is required for building on
Windows and optional for Linux (it's an alternative to autoconf/automake.)
</li>
<li>Android Build system when building as native Android component. Autoconf
is used when when building ARC.
</li>
</ul>
<p>
The following compilers are known to work, if you know of others or you're
willing to maintain support for other compiler get in touch.
</p>
<ul>
<li>GCC 4.2.0 or later (some parts of Mesa may require later versions)
<li>clang - exact minimum requirement is currently unknown.
<li>Microsoft Visual Studio 2013 Update 4 or later is required, for building on Windows.
</ul>
<p>
Third party/extra tools.
<br>
<li>lex / yacc - for building the GLSL compiler.
<br>
<br>
On Linux systems, flex and bison are used.
Versions 2.5.35 and 2.4.1, respectively, (or later) should work.
<br>
<br>
<strong>Note</strong>: These should not be required, when building from a release tarball. If
you think you've spotted a bug let developers know by filing a
<a href="bugs.html">bug report</a>.
</p>
<ul>
<li><a href="https://www.python.org/">Python</a> - Python is required.
Version 2.6.4 or later should work.
</li>
<li><a href="http://www.makotemplates.org/">Python Mako module</a> -
Python Mako module is required. Version 0.3.4 or later should work.
</li>
<li>lex / yacc - for building the Mesa IR and GLSL compiler.
<div>
On Linux systems, flex and bison versions 2.5.35 and 2.4.1, respectively,
(or later) should work.
On Windows with MinGW, install flex and bison with:
<pre>mingw-get install msys-flex msys-bison</pre>
For MSVC on Windows, install
<a href="http://winflexbison.sourceforge.net/">Win flex-bison</a>.
</li>
<br>
<li>For building on Windows, Microsoft Visual Studio 2013 or later is required.
</li>
</div>
</ul>
<p><strong>Note</strong>: Some versions can be buggy (eg. flex 2.6.2) so do try others if things fail.</p>
<h3 id="prereq-dri">1.2 For DRI and hardware acceleration</h3>
<h3 id="prereq-dri">1.2 Requirements</h3>
<p>
The following are required for DRI-based hardware acceleration with Mesa:
The requirements depends on the features selected at configure stage.
Check/install the respective -devel package as prompted by the configure error
message.
</p>
<ul>
<li><a href="http://xorg.freedesktop.org/releases/individual/proto/">
dri2proto</a> version 2.6 or later
<li><a href="http://dri.freedesktop.org/libdrm/">libDRM</a> latest version
<li>Xorg server version 1.5 or later
<li>Linux 2.6.28 or later
</ul>
<p>
If you're using a fedora distro the following command should install all
the needed dependencies:
Here are some common ways to retrieve most/all of the dependencies based on
the packaging tool used by your distro.
</p>
<pre>
sudo yum install flex bison imake libtool xorg-x11-proto-devel libdrm-devel \
gcc-c++ xorg-x11-server-devel libXi-devel libXmu-devel libXdamage-devel git \
expat-devel llvm-devel python-mako
zypper source-install --build-deps-only Mesa # openSUSE/SLED/SLES
yum-builddep mesa # yum Fedora, OpenSuse(?)
dnf builddep mesa # dnf Fedora
apt-get build-dep mesa # Debian and derivatives
... # others
</pre>
<h1 id="autoconf">2. Building with autoconf (Linux/Unix/X11)</h1>
<p>
@@ -139,22 +162,30 @@ This will create:
</ul>
<p>
Put them all in the same directory to test them.
Additional information is available in <a href="README.WIN32">README.WIN32</a>.
</p>
<h1 id="other">4. Building for other systems</h1>
<h1 id="android">4. Building with AOSP (Android)</h1>
<p>
Documentation for other environments (some may be very out of date):
Currently one can build Mesa for Android as part of the AOSP project, yet
your experience might vary.
</p>
<ul>
<li><a href="README.VMS">README.VMS</a> - VMS
<li><a href="README.CYGWIN">README.CYGWIN</a> - Cygwin
<li><a href="README.WIN32">README.WIN32</a> - Win32
</ul>
<p>
In order to achieve that one should update their local manifest to point to the
upstream repo, set the appropriate BOARD_GPU_DRIVERS and build the
libGLES_mesa library.
</p>
<p>
FINISHME: Improve on the instructions add references to Rob H repos/Jenkins,
Android-x86 and/or other resources.
</p>
<h1 id="libs">5. Library Information</h1>

View File

@@ -17,22 +17,34 @@
<h1>Introduction</h1>
<p>
Mesa is an open-source implementation of the
<a href="http://www.opengl.org/">OpenGL</a> specification -
The Mesa project began as an open-source implementation of the
<a href="https://www.opengl.org/">OpenGL</a> specification -
a system for rendering interactive 3D graphics.
</p>
<p>
A variety of device drivers allows Mesa to be used in many different
environments ranging from software emulation to complete hardware acceleration
for modern GPUs.
Over the years the project has grown to implement more graphics APIs,
including
<a href="https://www.khronos.org/opengles/">OpenGL ES</a> (versions 1, 2, 3),
<a href="https://www.khronos.org/opencl/">OpenCL</a>,
<a href="https://www.khronos.org/openmax/">OpenMAX</a>,
<a href="https://en.wikipedia.org/wiki/VDPAU">VDPAU</a>,
<a href="https://en.wikipedia.org/wiki/Video_Acceleration_API">VA API</a>,
<a href="https://en.wikipedia.org/wiki/X-Video_Motion_Compensation">XvMC</a> and
<a href="https://www.khronos.org/vulkan/">Vulkan</a>.
</p>
<p>
Mesa ties into several other open-source projects: the
<a href="http://dri.freedesktop.org/">Direct Rendering
Infrastructure</a> and <a href="http://x.org">X.org</a> to
provide OpenGL support to users of X on Linux, FreeBSD and other operating
A variety of device drivers allows the Mesa libraries to be used in many
different environments ranging from software emulation to complete hardware
acceleration for modern GPUs.
</p>
<p>
Mesa ties into several other open-source projects: the
<a href="https://dri.freedesktop.org/">Direct Rendering
Infrastructure</a> and <a href="https://x.org">X.org</a> to
provide OpenGL support on Linux, FreeBSD and other operating
systems.
</p>
@@ -85,7 +97,7 @@ the OpenGL API, so they didn't feel threatened by the project.
1995-1996: I continue working on Mesa both during my spare time and during
my work hours at the Space Science and Engineering Center at the University
of Wisconsin in Madison. My supervisor, Bill Hibbard, lets me do this because
Mesa is now being using for the <a href="http://www.ssec.wisc.edu/%7Ebillh/vis.html">Vis5D</a> project.
Mesa is now being using for the <a href="https://www.ssec.wisc.edu/%7Ebillh/vis.html">Vis5D</a> project.
</p><p>
October 1996: Mesa 2.0 is released. It implements the OpenGL 1.1 specification.
</p>
@@ -142,7 +154,7 @@ and OpenGL Shading Language.
<p>
2008: Keith Whitwell and other Tungsten Graphics employees develop
<a href="http://en.wikipedia.org/wiki/Gallium3D">Gallium</a>
<a href="https://en.wikipedia.org/wiki/Gallium3D">Gallium</a>
- a new GPU abstraction layer. The latest Mesa drivers are based on
Gallium and other APIs such as OpenVG are implemented on top of Gallium.
</p>
@@ -153,13 +165,22 @@ and version 1.30 of the OpenGL Shading Language.
</p>
<p>
Ongoing: Mesa is the OpenGL implementation for several types of hardware
made by Intel, AMD and NVIDIA, plus the VMware virtual GPU.
July 2016: Mesa 12.0 is released, including OpenGL 4.3 support and initial
support for Vulkan for Intel GPUs. Plus, there's another gallium software
driver ("swr") based on LLVM and developed by Intel.
</p>
<p>
Ongoing: Mesa is the OpenGL implementation for devices designed by
Intel, AMD, NVIDIA, Qualcomm, Broadcom, Vivante, plus the VMware and
VirGL virtual GPUs.
There's also several software-based renderers: swrast (the legacy
Mesa rasterizer), softpipe (a gallium reference driver) and llvmpipe
(LLVM/JIT-based high-speed rasterizer).
Mesa rasterizer), softpipe (a gallium reference driver), llvmpipe
(LLVM/JIT-based high-speed rasterizer) and swr (another LLVM-based driver).
</p>
<p>
Work continues on the drivers and core Mesa to implement newer versions
of the OpenGL specification.
of the OpenGL, OpenGL ES and Vulkan specifications.
</p>
@@ -178,6 +199,9 @@ of the OpenGL specification is implemented.
Version 12.x of Mesa implements the OpenGL 4.3 API, but not all drivers
support OpenGL 4.3.
</p>
<p>
Initial support for Vulkan is also included.
</p>
<h2>Version 11.x features</h2>
@@ -259,7 +283,7 @@ GL_SRC2_ALPHA GL_SOURCE2_ALPHA
</pre>
<p>
See the
<a href="http://www.opengl.org/documentation/spec.html">
<a href="https://www.opengl.org/documentation/spec.html">
OpenGL specification</a> for more details.
</p>

View File

@@ -18,10 +18,10 @@
<p>
Mesa is a 3-D graphics library with an API which is very similar to
that of <a href="http://www.opengl.org/">OpenGL</a>.*
that of <a href="https://www.opengl.org/">OpenGL</a>.*
To the extent that Mesa utilizes the OpenGL command syntax or state
machine, it is being used with authorization from <a
href="http://www.sgi.com/">Silicon Graphics,
href="https://www.sgi.com/">Silicon Graphics,
Inc.</a>(SGI). However, the author does not possess an OpenGL license
from SGI, and makes no claim that Mesa is in any way a compatible
replacement for OpenGL or associated with SGI. Those who want a
@@ -36,7 +36,7 @@ library</em>. <br>
</p>
<p>
* OpenGL is a trademark of <a href="http://www.sgi.com/"
* OpenGL is a trademark of <a href="https://www.sgi.com/"
>Silicon Graphics Incorporated</a>.
</p>

View File

@@ -21,23 +21,23 @@
</p>
<ul>
<li><p><a href="http://lists.freedesktop.org/mailman/listinfo/mesa-users">mesa-users</a>
<li><p><a href="https://lists.freedesktop.org/mailman/listinfo/mesa-users">mesa-users</a>
- intended for end-users of Mesa and DRI drivers. Newbie questions are OK,
but please try the general OpenGL resources and Mesa/DRI documentation first.</p>
</li>
<li><p><a href="http://lists.freedesktop.org/mailman/listinfo/mesa-dev">mesa-dev</a>
<li><p><a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev">mesa-dev</a>
- for Mesa, Gallium and DRI development
discussion. Not for beginners.</p>
</li>
<li><p><a href="http://lists.freedesktop.org/mailman/listinfo/mesa-commit">mesa-commit</a>
<li><p><a href="https://lists.freedesktop.org/mailman/listinfo/mesa-commit">mesa-commit</a>
- relays git check-in messages (for developers).
In general, people should not post to this list.</p>
</li>
<li><p><a href="http://lists.freedesktop.org/mailman/listinfo/mesa-announce">mesa-announce</a>
<li><p><a href="https://lists.freedesktop.org/mailman/listinfo/mesa-announce">mesa-announce</a>
- announcements of new Mesa
versions are sent to this list. Very low traffic.</p>
</li>
<li><p><a href="http://lists.freedesktop.org/mailman/listinfo/piglit">piglit</a>
<li><p><a href="https://lists.freedesktop.org/mailman/listinfo/piglit">piglit</a>
- for Piglit (OpenGL driver testing framework) discussion.</p>
</li>
</ul>
@@ -56,22 +56,22 @@ Follow the links above for list archives.
<p>
The old Mesa lists hosted at SourceForge are no longer in use.
The archives are still available, however:
<a href="http://sourceforge.net/mailarchive/forum.php?forum_name=mesa3d-announce">mesa3d-announce</a>,
<a href="http://sourceforge.net/mailarchive/forum.php?forum_name=mesa3d-users">mesa3d-users</a>,
<a href="http://sourceforge.net/mailarchive/forum.php?forum_name=mesa3d-dev">mesa3d-dev</a>.
<a href="https://sourceforge.net/mailarchive/forum.php?forum_name=mesa3d-announce">mesa3d-announce</a>,
<a href="https://sourceforge.net/mailarchive/forum.php?forum_name=mesa3d-users">mesa3d-users</a>,
<a href="https://sourceforge.net/mailarchive/forum.php?forum_name=mesa3d-dev">mesa3d-dev</a>.
</p>
<p>For mailing lists about Direct Rendering Modules (drm) in Linux/BSD
kernels, see the
<a href="http://dri.freedesktop.org/wiki/MailingLists">DRI wiki</a>.
<a href="https://dri.freedesktop.org/wiki/MailingLists">DRI wiki</a>.
</p>
<h1>IRC</h1>
<p>join <a href="irc://chat.freenode.net#dri-devel">#dri-devel channel</a>
on <a href="http://webchat.freenode.net/">irc.freenode.net</a>
on <a href="https://webchat.freenode.net/">irc.freenode.net</a>
</p>
@@ -82,7 +82,7 @@ Here are some other OpenGL-related forums you might find useful:
</p>
<ul>
<li><a href="http://www.opengl.org/cgi-bin/ubb/ultimatebb.cgi">OpenGL discussion forums</a>
<li><a href="https://www.opengl.org/discussion_boards/">OpenGL discussion forums</a>
at www.opengl.org</li>
<li>Usenet newsgroups:
<ul>

View File

@@ -34,7 +34,7 @@ It's the fastest software rasterizer for Mesa.
<li>
<p>An x86 or amd64 processor; 64-bit mode recommended.</p>
<p>
Support for SSE2 is strongly encouraged. Support for SSSE3 and SSE4.1 will
Support for SSE2 is strongly encouraged. Support for SSE3 and SSE4.1 will
yield the most efficient code. The fewer features the CPU has the more
likely is that you run into underperforming, buggy, or incomplete code.
</p>
@@ -165,8 +165,8 @@ any OpenGL drivers):
<li><p>load this registry settings:</p>
<pre>REGEDIT4
; http://technet.microsoft.com/en-us/library/cc749368.aspx
; http://www.msfn.org/board/topic/143241-portable-windows-7-build-from-winpe-30/page-5#entry942596
; https://technet.microsoft.com/en-us/library/cc749368.aspx
; https://www.msfn.org/board/topic/143241-portable-windows-7-build-from-winpe-30/page-5#entry942596
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Windows NT\CurrentVersion\OpenGLDrivers\MSOGL]
"DLL"="mesadrv.dll"
"DriverVersion"=dword:00000001
@@ -195,7 +195,7 @@ that no tail call optimizations are done by gcc.
<h2>Linux perf integration</h2>
<p>
On Linux, it is possible to have symbol resolution of JIT code with <a href="http://perf.wiki.kernel.org/">Linux perf</a>:
On Linux, it is possible to have symbol resolution of JIT code with <a href="https://perf.wiki.kernel.org/">Linux perf</a>:
</p>
<pre>
@@ -206,12 +206,12 @@ On Linux, it is possible to have symbol resolution of JIT code with <a href="htt
<p>
When run inside Linux perf, llvmpipe will create a /tmp/perf-XXXXX.map file with
symbol address table. It also dumps assembly code to /tmp/perf-XXXXX.map.asm,
which can be used by the bin/perf-annotate-jit script to produce disassembly of
which can be used by the bin/perf-annotate-jit.py script to produce disassembly of
the generated code annotated with the samples.
</p>
<p>You can obtain a call graph via
<a href="http://code.google.com/p/jrfonseca/wiki/Gprof2Dot#linux_perf">Gprof2Dot</a>.</p>
<a href="https://github.com/jrfonseca/gprof2dot#linux-perf">Gprof2Dot</a>.</p>
<h1>Unit testing</h1>
@@ -253,7 +253,7 @@ for posterior analysis, e.g.:
We use LLVM-C bindings for now. They are not documented, but follow the C++
interfaces very closely, and appear to be complete enough for code
generation. See
<a href="http://npcontemplation.blogspot.com/2008/06/secret-of-llvm-c-bindings.html">
<a href="https://npcontemplation.blogspot.com/2008/06/secret-of-llvm-c-bindings.html">
this stand-alone example</a>. See the llvm-c/Core.h file for reference.
</li>
</ul>
@@ -264,18 +264,18 @@ for posterior analysis, e.g.:
<li>
<p>Rasterization</p>
<ul>
<li><a href="http://www.cs.unc.edu/~olano/papers/2dh-tri/">Triangle Scan Conversion using 2D Homogeneous Coordinates</a></li>
<li><a href="https://www.cs.unc.edu/~olano/papers/2dh-tri/">Triangle Scan Conversion using 2D Homogeneous Coordinates</a></li>
<li><a href="http://www.drdobbs.com/parallel/rasterization-on-larrabee/217200602">Rasterization on Larrabee</a> (<a href="http://devmaster.net/posts/2887/rasterization-on-larrabee">DevMaster copy</a>)</li>
<li><a href="http://devmaster.net/posts/6133/rasterization-using-half-space-functions">Rasterization using half-space functions</a></li>
<li><a href="http://devmaster.net/posts/6145/advanced-rasterization">Advanced Rasterization</a></li>
<li><a href="http://fgiesen.wordpress.com/2013/02/17/optimizing-sw-occlusion-culling-index/">Optimizing Software Occlusion Culling</a></li>
<li><a href="https://fgiesen.wordpress.com/2013/02/17/optimizing-sw-occlusion-culling-index/">Optimizing Software Occlusion Culling</a></li>
</ul>
</li>
<li>
<p>Texture sampling</p>
<ul>
<li><a href="http://chrishecker.com/Miscellaneous_Technical_Articles#Perspective_Texture_Mapping">Perspective Texture Mapping</a></li>
<li><a href="http://www.flipcode.com/archives/Texturing_As_In_Unreal.shtml">Texturing As In Unreal</a></li>
<li><a href="https://www.flipcode.com/archives/Texturing_As_In_Unreal.shtml">Texturing As In Unreal</a></li>
<li><a href="http://www.gamasutra.com/view/feature/3301/runtime_mipmap_filtering.php">Run-Time MIP-Map Filtering</a></li>
<li><a href="http://alt.3dcenter.org/artikel/2003/10-26_a_english.php">Will "brilinear" filtering persist?</a></li>
<li><a href="http://ixbtlabs.com/articles2/gffx/nv40-rx800-3.html">Trilinear filtering</a></li>
@@ -294,21 +294,21 @@ for posterior analysis, e.g.:
<li><a href="http://www.drdobbs.com/optimizing-pixomatic-for-modern-x86-proc/184405807">Optimizing Pixomatic For Modern x86 Processors</a></li>
<li><a href="http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html">Intel 64 and IA-32 Architectures Optimization Reference Manual</a></li>
<li><a href="http://www.agner.org/optimize/">Software optimization resources</a></li>
<li><a href="http://software.intel.com/en-us/articles/intel-intrinsics-guide">Intel Intrinsics Guide</a><li>
<li><a href="https://software.intel.com/en-us/articles/intel-intrinsics-guide">Intel Intrinsics Guide</a><li>
</ul>
</li>
<li>
<p>LLVM</p>
<ul>
<li><a href="http://llvm.org/docs/LangRef.html">LLVM Language Reference Manual</a></li>
<li><a href="http://npcontemplation.blogspot.co.uk/2008/06/secret-of-llvm-c-bindings.html">The secret of LLVM C bindings</a></li>
<li><a href="https://npcontemplation.blogspot.co.uk/2008/06/secret-of-llvm-c-bindings.html">The secret of LLVM C bindings</a></li>
</ul>
</li>
<li>
<p>General</p>
<ul>
<li><a href="http://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-graphics-pipeline-2011-index/">A trip through the Graphics Pipeline</a></li>
<li><a href="http://msdn.microsoft.com/en-us/library/gg615082.aspx#architecture">WARP Architecture and Performance</a></li>
<li><a href="https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-graphics-pipeline-2011-index/">A trip through the Graphics Pipeline</a></li>
<li><a href="https://msdn.microsoft.com/en-us/library/gg615082.aspx#architecture">WARP Architecture and Performance</a></li>
</ul>
</li>
</ul>

View File

@@ -2,7 +2,7 @@
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Function Name Mangling</title>
<title>GL Function Name Mangling</title>
<link rel="stylesheet" type="text/css" href="mesa.css">
</head>
<body>
@@ -14,7 +14,7 @@
<iframe src="contents.html"></iframe>
<div class="content">
<h1>Function Name Mangling</h1>
<h1>GL Function Name Mangling</h1>
<p>
If you want to use both Mesa and another OpenGL library in the same
@@ -25,12 +25,11 @@ This results in all the Mesa functions being prefixed with
</p>
<p>
To do this, recompile Mesa with the compiler flag -DUSE_MGL_NAMESPACE.
Add the flag to CFLAGS in the configuration file which you want to use.
For example:
This option is supported only with the autoconf build. To use it add
--enable-mangling to your configure line.
</p>
<pre>
CFLAGS += -DUSE_MGL_NAMESPACE
<code>./configure --enable-mangling ...</code>
</pre>
</div>

View File

@@ -17,8 +17,8 @@
<h1>OpenGL ES</h1>
<p>Mesa implements OpenGL ES 1.1 and OpenGL ES 2.0. More information about
OpenGL ES can be found at <a href="http://www.khronos.org/opengles/">
http://www.khronos.org/opengles/</a>.</p>
OpenGL ES can be found at <a href="https://www.khronos.org/opengles/">
https://www.khronos.org/opengles/</a>.</p>
<p>OpenGL ES depends on a working EGL implementation. Please refer to
<a href="egl.html">Mesa EGL</a> for more information about EGL.</p>

View File

@@ -27,5 +27,5 @@ ARB_texture_float:
enable this extension.
[1] http://www.google.com/patents/about?id=mIIOAAAAEBAJ&dq=6650327
[2] http://www.opengl.org/registry/specs/ARB/texture_float.txt
[1] https://www.google.com/patents/about?id=mIIOAAAAEBAJ&dq=6650327
[2] https://www.opengl.org/registry/specs/ARB/texture_float.txt

View File

@@ -45,7 +45,7 @@ Multiple filters can be used together.
<li>pp_nored, pp_nogreen, pp_noblue - set to 1 to remove the corresponding color channel.
These are basic filters for easy testing of the PP queue.
<li>pp_jimenezmlaa, pp_jimenezmlaa_color -
<a href="http://www.iryokufx.com/mlaa/" target=_blank>Jimenez's MLAA</a>
<a href="https://www.iryokufx.com/mlaa/" target=_blank>Jimenez's MLAA</a>
is a morphological antialiasing filter.
The two versions use depth and color data, respectively.
Which works better depends on the app - depth will not blur text, but it will

View File

@@ -20,8 +20,14 @@
In general, precompiled Mesa libraries are not available.
</p>
<p>
However, some Linux distros (such as Ubuntu) seem to closely track
Mesa and often have the latest Mesa release available as an update.
Some Linux distributions closely follow the latest Mesa releases. On others one
has to use unofficial channels.
<br>
There are some general directions:
<li>Debian/Ubuntu based distros - PPA: xorg-edgers, oibaf and padoka</li>
<li>Fedora - Corp: erp and che</li>
<li>OpenSuse/SLES - OBS: X11:XOrg and pontostroy:X11</li>
<li>Gentoo/Archlinux - officially provided/supported</li>
</p>
</div>

548
docs/releasing.html Normal file
View File

@@ -0,0 +1,548 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Releasing process</title>
<link rel="stylesheet" type="text/css" href="mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="contents.html"></iframe>
<div class="content">
<h1>Releasing process</h1>
<ul>
<li><a href="#overview">Overview</a>
<li><a href="#schedule">Release schedule</a>
<li><a href="#pickntest">Cherry-pick and test</a>
<li><a href="#branch">Making a branchpoint</a>
<li><a href="#prerelease">Pre-release announcement</a>
<li><a href="#release">Making a new release</a>
<li><a href="#announce">Announce the release</a>
<li><a href="#website">Update the mesa3d.org website</a>
<li><a href="#bugzilla">Update Bugzilla</a>
</ul>
<h1 id="overview">Overview</h1>
<p>
This document uses the convention X.Y.Z for the release number with X.Y being
the stable branch name.
<br>
Mesa provides feature and bugfix releases. Former use zero as patch version (Z),
while the latter have a non-zero one.
</p>
<p>
For example:
</p>
<pre>
Mesa 10.1.0 - 10.1 branch, feature
Mesa 10.1.4 - 10.1 branch, bugfix
Mesa 12.0.0 - 12.0 branch, feature
Mesa 12.0.2 - 12.0 branch, bugfix
</pre>
<h1 id="schedule">Release schedule</h1>
<p>
Releases should happen on Fridays. Delays can occur although those should be keep
to a minimum.
</p>
<h2>Feature releases</h2>
<ul>
<li>Available approximately every three months.
<li>Initial timeplan available 2-4 weeks before the planned branchpoint (rc1)
on the mesa-announce@ mailing list.
<li>A <a href="#prerelease">pre-release</a> announcement should be available
approximately 24 hours before the final (non-rc) release.
</ul>
<h2>Stable releases</h2>
<ul>
<li>Normally available once every two weeks.
<li>Only the latest branch has releases. See note below.
<li>A <a href="#prerelease">pre-release</a> announcement should be available
approximately 48 hours before the actual release.
</ul>
<p>
Note: There is one or two releases overlap when changing branches. For example:
<br>
The final release from the 12.0 series Mesa 12.0.5 will be out around the same
time (or shortly after) 13.0.1 is out.
</p>
<h1 id="pickntest">Cherry-picking and testing</h1>
<p>
Commits nominated for the active branch are picked as based on the
<a href="submittingpatches.html#criteria" target="_parent">criteria</a> as
described in the same section.
<p>
Maintainer is responsible for testing in various possible permutations of
the autoconf and scons build.
</p>
<h2>Cherry-picking and build/check testing</h2>
<p>Done continuously up-to the <a href="#prerelease">pre-release</a> announcement.</p>
<p>
As an exception, patches can be applied up-to the last ~1h before the actual
release. This is made <strong>only</strong> with explicit permission/request,
and the patch <strong>must</strong> be very well contained. Thus it cannot
affect more than one driver/subsystem.
</p>
<p>
Currently Ilia Mirkin and AMD devs have requested "permanent" exception.
</p>
<ul>
<li>make distcheck, scons and scons check must pass
<li>Testing with different version of system components - LLVM and others is also
performed where possible.
</ul>
<p>
Achieved by combination of local ad-hoc scripts and AppVeyor plus Travis-CI,
the latter as part of their Github integration.
</p>
<p>
<strong>Note:</strong> If a patch in the current queue needs any additional
fix(es), then they should be squashed together.
<br>
The commit messages and the <code>cherry picked from</code> tags must be preserved.
</p>
<p>
This should be noted in the <a href="#prerelease">pre-announce</a> email.
<pre>
git show b10859ec41d09c57663a258f43fe57c12332698e
commit b10859ec41d09c57663a258f43fe57c12332698e
Author: Jonas Pfeil &ltpfeiljonas@gmx.de&gt
Date: Wed Mar 1 18:11:10 2017 +0100
ralloc: Make sure ralloc() allocations match malloc()'s alignment.
The header of ralloc needs to be aligned, because the compiler assumes
...
(cherry picked from commit cd2b55e536dc806f9358f71db438dd9c246cdb14)
Squashed with commit:
ralloc: don't leave out the alignment factor
Experimentation shows that without alignment factor gcc and clang choose
...
(cherry picked from commit ff494fe999510ea40e3ed5827e7818550b6de126)
</pre>
</p>
<h2>Regression/functionality testing</h2>
<p>
Less often (once or twice), shortly before the pre-release announcement.
Ensure that testing is redone if Intel devs have requested an exception, as per above.
</p>
<ul>
<li><em>no regressions should be observed for Piglit/dEQP/CTS/Vulkan on Intel platforms</em>
<li><em>no regressions should be observed for Piglit using the swrast, softpipe
and llvmpipe drivers</em>
</ul>
<p>
Currently testing is performed courtesy of the Intel OTC team and their Jenkins CI setup. Check with the Intel team over IRC how to get things setup.
</p>
<h1 id="branch">Making a branchpoint</h1>
<p>
A branchpoint is made such that new development can continue in parallel to
stabilisation and bugfixing.
</p>
<p>
Note: Before doing a branch ensure that basic build and <code>make check</code>
testing is done and there are little to-no issues.
<br>
Ideally all of those should be tackled already.
</p>
<p>
Check if the version number is going to remain as, alternatively
<code> git mv docs/relnotes/{current,new}.html </code> as appropriate.
</p>
<p>
To setup the branchpoint:
</p>
<pre>
git checkout master # make sure we're in master first
git tag -s X.Y-branchpoint -m "Mesa X.Y branchpoint"
git checkout -b X.Y
git checkout master
$EDITOR VERSION # bump the version number
git commit -as
cp docs/relnotes/{X.Y,X.Y+1}.html # copy/create relnotes template
git commit -as
git push origin X.Y-branchpoint X.Y
</pre>
<p>
Now go to
<a href="https://bugs.freedesktop.org/editversions.cgi?action=add&amp;product=Mesa" target="_parent">Bugzilla</a> and add the new Mesa version X.Y.
</p>
<p>
Check that there are no distribution breaking changes and revert them if needed.
For example: files being overwritten on install, etc. Happens extremely rarely -
we had only one case so far (see commit 2ced8eb136528914e1bf4e000dea06a9d53c7e04).
</p>
<p>
Proceed to <a href="#release">release</a> -rc1.
</p>
<h1 id="prerelease">Pre-release announcement</h1>
<p>
It comes shortly after outstanding patches in the respective branch are pushed.
Developers can check, in brief, what's the status of their patches. They,
alongside very early testers, are strongly encouraged to test the branch and
report any regressions.
<br>
It is followed by a brief period (normally 24 or 48 hours) before the actual
release is made.
</p>
<h2>Terminology used</h2>
<ul><li>Nominated</ul>
<p>
Patch that is nominated but yet to to merged in the patch queue/branch.
</p>
<ul><li>Queued</ul>
<p>
Patch is in the queue/branch and will feature in the next release.
Barring reported regressions or objections from developers.
</p>
<ul><li>Rejected</ul>
<p>
Patch does not fit the
<a href="submittingpatches.html#criteria" target="_parent">criteria</a> and
is followed by a brief information.
<br>
The release maintainer is human so if you believe you've spotted a mistake do
let them know.
</p>
<h2>Format/template</h2>
<pre>
Subject: [ANNOUNCE] Mesa X.Y.Z release candidate
To: mesa-announce@...
Cc: mesa-dev@...
Hello list,
The candidate for the Mesa X.Y.Z is now available. Currently we have:
- NUMBER queued
- NUMBER nominated (outstanding)
- and NUMBER rejected patches
BRIEF SUMMARY OF CHANGES
Take a look at section "Mesa stable queue" for more information.
Testing reports/general approval
--------------------------------
Any testing reports (or general approval of the state of the branch) will be
greatly appreciated.
The plan is to have X.Y.Z this DAY (DATE), around or shortly after TIME.
If you have any questions or suggestions - be that about the current patch
queue or otherwise, please go ahead.
Trivial merge conflicts
-----------------------
List of commits where manual intervention was required.
Keep the authors in the CC list.
commit SHA
Author: AUTHOR
COMMIT SUMMARY
CHERRY PICKED FROM
For example:
commit 990f395e007c3204639daa34efc3049f350ee819
Author: Emil Velikov &lt;emil.velikov@collabora.com&gt;
anv: automake: cleanup the generated json file during make clean
(cherry picked from commit 8df581520a823564be0ab5af7dbb7d501b1c9670)
Cheers,
Emil
Mesa stable queue
-----------------
Nominated (NUMBER)
==================
AUTHOR (NUMBER):
SHA COMMIT SUMMARY
For example:
Dave Airlie (1):
2de85eb radv: fix texturesamples to handle single sample case
Queued (NUMBER)
===============
AUTHOR (NUMBER):
COMMIT SUMMARY
For example:
Jonas Pfeil (1):
ralloc: Make sure ralloc() allocations match malloc()'s alignment.
Squashed with
ralloc: don't leave out the alignment factor
Rejected (NUMBER)
=================
Rejected (11)
=============
AUTHOR (NUMBER):
SHA COMMIT SUMMARY
Reason: ...
</pre>
<h1 id="release">Making a new release</h1>
<p>
These are the instructions for making a new Mesa release.
</p>
<h3>Get latest source files</h3>
<p>
Ensure the latest code is available - both in your local master and the
relevant branch.
</p>
<h3>Perform basic testing</h3>
<p>
Most of the testing should already be done during the
<a href="#pickntest">cherry-pick</a> and
<a href="#prerelease">pre-announce</a> stages.
So we do a quick 'touch test'
<ul>
<li>make distcheck (you can omit this if you're not using --dist below)
<li>scons (from release tarball)
<li>the produced binaries work
</ul>
<p>
Here is one solution that I've been using.
</p>
<pre>
git clean -fXd; git clean -nxd
read # quick cross check any outstanding files
export __version=`cat VERSION`
export __mesa_root=../
export __build_root=./foo
chmod 755 -fR $__build_root; rm -rf $__build_root
mkdir -p $__build_root &amp;&amp; cd $__build_root
$__mesa_root/autogen.sh &amp;&amp; make -j2 distcheck
# Build check the tarballs (scons, linux)
tar -xaf mesa-$__version.tar.xz &amp;&amp; cd mesa-$__version
scons
cd .. &amp;&amp; rm -rf mesa-$__version
# Build check the tarballs (scons, windows/mingw)
tar -xaf mesa-$__version.tar.xz &amp;&amp; cd mesa-$__version
scons platform=windows toolchain=crossmingw
cd .. &amp;&amp; rm -rf mesa-$__version
# Test the automake binaries
tar -xaf mesa-$__version.tar.xz &amp;&amp; cd mesa-$__version
./configure \
--with-dri-drivers=i965,swrast \
--with-gallium-drivers=swrast \
--with-vulkan-drivers=intel \
--enable-llvm-shared-libs \
--enable-llvm \
--enable-glx-tls \
--enable-gbm \
--enable-egl \
--with-egl-platforms=x11,drm,wayland
make -j2 &amp;&amp; DESTDIR=`pwd`/test make -j6 install
__glxinfo_cmd='glxinfo 2>&amp;1 | egrep -o "Mesa.*|Gallium.*|.*dri\.so"'
__glxgears_cmd='glxgears 2>&amp;1 | grep -v "configuration file"'
__es2info_cmd='es2_info 2>&amp;1 | egrep "GL_VERSION|GL_RENDERER|.*dri\.so"'
__es2gears_cmd='es2gears_x11 2>&amp;1 | grep -v "configuration file"'
export LD_LIBRARY_PATH=`pwd`/test/usr/local/lib/
export LIBGL_DRIVERS_PATH=`pwd`/test/usr/local/lib/dri/
export LIBGL_DEBUG=verbose
eval $__glxinfo_cmd
eval $__glxgears_cmd
eval $__es2info_cmd
eval $__es2gears_cmd
export LIBGL_ALWAYS_SOFTWARE=1
eval $__glxinfo_cmd
eval $__glxgears_cmd
eval $__es2info_cmd
eval $__es2gears_cmd
export LIBGL_ALWAYS_SOFTWARE=1
export GALLIUM_DRIVER=softpipe
eval $__glxinfo_cmd
eval $__glxgears_cmd
eval $__es2info_cmd
eval $__es2gears_cmd
# Smoke test DOTA2
unset LD_LIBRARY_PATH
unset LIBGL_DRIVERS_PATH
unset LIBGL_DEBUG
unset LIBGL_ALWAYS_SOFTWARE
export VK_ICD_FILENAMES=`pwd`/src/intel/vulkan/dev_icd.json
steam steam://rungameid/570 -vconsole -vulkan
</pre>
<h3>Update version in file VERSION</h3>
<p>
Increment the version contained in the file VERSION at Mesa's top-level, then
commit this change.
</p>
<h3>Create release notes for the new release</h3>
<p>
Create a new file docs/relnotes/X.Y.Z.html, (follow the style of the previous
release notes). Note that the sha256sums section of the release notes should
be empty (TBD) at this point.
</p>
<p>
Two scripts are available to help generate portions of the release notes:
<pre>
./bin/bugzilla_mesa.sh
./bin/shortlog_mesa.sh
</pre>
<p>
The first script identifies commits that reference bugzilla bugs and obtains
the descriptions of those bugs from bugzilla. The second script generates a
log of all commits. In both cases, HTML-formatted lists are printed to stdout
to be included in the release notes.
</p>
<p>
Commit these changes and push the branch.
</p>
<pre>
git push origin HEAD
</pre>
<h3>Use the release.sh script from xorg <a href="https://cgit.freedesktop.org/xorg/util/modular/">util-modular</a></h3>
<p>
Start the release process.
</p>
<pre>
../relative/path/to/release.sh . # append --dist if you've already done distcheck above
</pre>
<p>
Pay close attention to the prompts as you might be required to enter your GPG
and SSH passphrase(s) to sign and upload the files, respectively.
</p>
<h3>Add the sha256sums to the release notes</h3>
<p>
Edit docs/relnotes/X.Y.Z.html to add the sha256sums as available in the mesa-X.Y.Z.announce template. Commit this change.
</p>
<h3>Back on mesa master, add the new release notes into the tree</h3>
<p>
Something like the following steps will do the trick:
</p>
<pre>
git cherry-pick -x X.Y~1
git cherry-pick -x X.Y
</pre>
<p>
Also, edit docs/relnotes.html to add a link to the new release notes, and edit
docs/index.html to add a news entry. Then commit and push:
</p>
<pre>
git commit -as -m "docs: add news item and link release notes for X.Y.Z"
git push origin master X.Y
</pre>
<h1 id="announce">Announce the release</h1>
<p>
Use the generated template during the releasing process.
</p>
<h1 id="website">Update the mesa3d.org website</h1>
<p>
As the hosting was moved to freedesktop, git hooks are deployed to update the
website. Manually check that it is updated 5-10 minutes after the final <code>git push</code>
</p>
<h1 id="bugzilla">Update Bugzilla</h1>
<p>
Parse through the bugreports as listed in the docs/relnotes/X.Y.Z.html
document.
<br>
If there's outstanding action, close the bug referencing the commit ID which
addresses the bug and mention the Mesa version that has the fix.
</p>
<p>
Note: the above is not applicable to all the reports, so use common sense.
</p>
</div>
</body>
</html>

View File

@@ -21,6 +21,20 @@ The release notes summarize what's new or changed in each Mesa release.
</p>
<ul>
<li><a href="relnotes/17.0.3.html">17.0.3 release notes</a>
<li><a href="relnotes/17.0.2.html">17.0.2 release notes</a>
<li><a href="relnotes/13.0.6.html">13.0.6 release notes</a>
<li><a href="relnotes/17.0.1.html">17.0.1 release notes</a>
<li><a href="relnotes/13.0.5.html">13.0.5 release notes</a>
<li><a href="relnotes/17.0.0.html">17.0.0 release notes</a>
<li><a href="relnotes/13.0.4.html">13.0.4 release notes</a>
<li><a href="relnotes/12.0.6.html">12.0.6 release notes</a>
<li><a href="relnotes/13.0.3.html">13.0.3 release notes</a>
<li><a href="relnotes/12.0.5.html">12.0.5 release notes</a>
<li><a href="relnotes/13.0.2.html">13.0.2 release notes</a>
<li><a href="relnotes/13.0.1.html">13.0.1 release notes</a>
<li><a href="relnotes/12.0.4.html">12.0.4 release notes</a>
<li><a href="relnotes/13.0.0.html">13.0.0 release notes</a>
<li><a href="relnotes/12.0.3.html">12.0.3 release notes</a>
<li><a href="relnotes/12.0.2.html">12.0.2 release notes</a>
<li><a href="relnotes/12.0.1.html">12.0.1 release notes</a>

321
docs/relnotes/12.0.4.html Normal file
View File

@@ -0,0 +1,321 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 12.0.4 Release Notes / November 10, 2016</h1>
<p>
Mesa 12.0.4 is a bug fix release which fixes bugs found since the 12.0.4 release.
</p>
<p>
Mesa 12.0.4 implements the OpenGL 4.3 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.3. OpenGL
4.3 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
22026ce4f1c6a7908b0d10ff057decec0a5633afe7f38a0cef5c08d0689f02a6 mesa-12.0.4.tar.gz
5d6003da867d3f54e5000b4acdfc37e6cce5b6a4459274fdad73e24bd2f0065e mesa-12.0.4.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=71759">Bug 71759</a> - Intel driver fails with &quot;intel_do_flush_locked failed: No such file or directory&quot; if buffer imported with EGL_NATIVE_PIXMAP_KHR</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94354">Bug 94354</a> - R9285 Unigine Valley perf regression since radeonsi: use re-Z</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96770">Bug 96770</a> - include/GL/mesa_glinterop.h:62: error: redefinition of typedef GLXContext</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97231">Bug 97231</a> - GL_DEPTH_CLAMP doesn't clamp to the far plane</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97233">Bug 97233</a> - vkQuake VkSpecializationMapEntry related bug</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97260">Bug 97260</a> - R9 290 low performance in Linux 4.7</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97549">Bug 97549</a> - [SNB, BXT] up to 40% perf drop from &quot;loader/dri3: Overhaul dri3_update_num_back&quot; commit</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97887">Bug 97887</a> - llvm segfault in janusvr -render vive</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98025">Bug 98025</a> - [radeonsi] incorrect primitive restart index used</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98134">Bug 98134</a> - dEQP-GLES31.functional.debug.negative_coverage.get_error.buffer.draw_buffers wants a different GL error code</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98326">Bug 98326</a> - [dEQP, EGL] pbuffer depth/stencil tests fail</li>
</ul>
<h2>Changes</h2>
<p>Axel Davy (4):</p>
<ul>
<li>gallium/util: Really allow aliasing of dst for u_box_union_*</li>
<li>st/nine: Fix the calculation of the number of vs inputs</li>
<li>st/nine: Fix mistake in Volume9 UnlockBox</li>
<li>st/nine: Fix locking CubeTexture surfaces.</li>
</ul>
<p>Brendan King (1):</p>
<ul>
<li>configure.ac: fix the name of the Wayland Scanner pc file</li>
</ul>
<p>Brian Paul (1):</p>
<ul>
<li>st/mesa: fix swizzle issue in st_create_sampler_view_from_stobj()</li>
</ul>
<p>Chad Versace (3):</p>
<ul>
<li>egl: Fix truncation error in _eglParseSyncAttribList64</li>
<li>i965/sync: Fix uninitalized usage and leak of mutex</li>
<li>egl: Don't advertise unsupported platform extensions</li>
</ul>
<p>Chuanbo Weng (1):</p>
<ul>
<li>gbm: fix potential NULL deref of mapImage/unmapImage.</li>
</ul>
<p>Chuck Atkins (1):</p>
<ul>
<li>autoconf: Make header install distinct for various APIs (v2)</li>
</ul>
<p>Dave Airlie (3):</p>
<ul>
<li>anv: initialise and increment send_sbc</li>
<li>anv/wsi: fix apps that acquire multiple images up front</li>
<li>Revert "st/vdpau: use linear layout for output surfaces"</li>
</ul>
<p>Emil Velikov (12):</p>
<ul>
<li>docs: add sha256 checksums for 12.0.3</li>
<li>cherry-ignore: add non-applicable i965 commit</li>
<li>cherry-ignore: add vaapi encode fix</li>
<li>cherry-ignore: add EGL_KHR_debug fix</li>
<li>cherry-ignore: add update_renderbuffer_read_surfaces()</li>
<li>isl/gen6: correctly check msaa layout samples count</li>
<li>egl/x11: don't crash if dri2_dpy-&gt;conn is NULL</li>
<li>get-pick-list.sh: Require explicit "12.0" for nominating stable patches</li>
<li>automake: don't forget to pick wglext.h in the tarball</li>
<li>cherry-ignore: add N/A EGL revert</li>
<li>cherry-ignore: add ClientWaitSync fixes</li>
<li>Update version to 12.0.4</li>
</ul>
<p>Eric Anholt (5):</p>
<ul>
<li>travis: Parse configure.ac to pick an updated LIBDRM_VERSION.</li>
<li>travis: Update to the Ubuntu Trusty image.</li>
<li>travis: Enable vc4 in libdrm to satisfy vc4 test build dependency.</li>
<li>travis: Upgrade LLVM dependency to 3.5 and enable LLVM drivers.</li>
<li>gallium: Fix install-gallium-links.mk on non-bash /bin/sh</li>
</ul>
<p>Hans de Goede (1):</p>
<ul>
<li>pipe_loader_sw: Fix fd leak when instantiated via pipe_loader_sw_probe_kms</li>
</ul>
<p>Ian Romanick (1):</p>
<ul>
<li>glsl: Fix cut-and-paste bug in hierarchical visitor ir_expression::accept</li>
</ul>
<p>Ilia Mirkin (16):</p>
<ul>
<li>nv30: set usage to staging so that the buffer is allocated in GART</li>
<li>a3xx: make sure to actually clamp depth as requested</li>
<li>a3xx: make use of software clipping when hw can't handle it</li>
<li>a3xx: use window scissor to simulate viewport xy clip</li>
<li>main: GL_RGB10_A2UI does not come with GL 3.0/EXT_texture_integer</li>
<li>mesa/formatquery: limit ES target support, fix core context support</li>
<li>nir: fix definition of pack_uvec2_to_uint</li>
<li>gm107/ir: AL2P writes to a predicate register</li>
<li>st/mesa: fix is_scissor_enabled when X/Y are negative</li>
<li>nvc0/ir: fix overwriting of value backing non-constant gather offset</li>
<li>nv50/ir: copy over value's register id when resolving merge of a phi</li>
<li>nvc0/ir: fix textureGather with a single offset</li>
<li>gm107/ir: fix texturing with indirect samplers</li>
<li>gm107/ir: fix bit offset of tex lod setting for indirect texturing</li>
<li>nv50,nvc0: avoid reading out of bounds when getting bogus so info</li>
<li>nv50/ir: process texture offset sources as regular sources</li>
</ul>
<p>James Legg (1):</p>
<ul>
<li>radeonsi: Fix primitive restart when index changes</li>
</ul>
<p>Jason Ekstrand (9):</p>
<ul>
<li>nir/spirv: Swap the argument order for AtomicCompareExchange</li>
<li>nir/spirv: Use the correct sources for CompareExchange on images</li>
<li>nir/spirv: Break variable decoration handling into a helper</li>
<li>nir/spirv: Refactor variable deocration handling</li>
<li>nir/spirv/cfg: Handle switches whose break block is a loop continue</li>
<li>nir/spirv/cfg: Detect switch_break after loop_break/continue</li>
<li>nir: Add a nop intrinsic</li>
<li>nir/spirv/cfg: Use a nop intrinsic for tagging the ends of blocks</li>
<li>intel/blorp: Rework our usage of ralloc when compiling shaders</li>
</ul>
<p>Jonathan Gray (3):</p>
<ul>
<li>genxml: add generated headers to EXTRA_DIST</li>
<li>mapi: automake: set VISIBILITY_CFLAGS for shared glapi</li>
<li>mesa: automake: include mesa_glinterop.h in distfile</li>
</ul>
<p>Julien Isorce (1):</p>
<ul>
<li>st/va: also honors interlaced preference when providing a video format</li>
</ul>
<p>Kenneth Graunke (8):</p>
<ul>
<li>nir: Call nir_metadata_preserve from nir_lower_alu_to_scalar().</li>
<li>mesa: Expose RESET_NOTIFICATION_STRATEGY with KHR_robustness.</li>
<li>i965: Fix missing _NEW_TRANSFORM in Gen8+ 3DSTATE_DS atom.</li>
<li>i965: Add missing BRW_NEW_VS_PROG_DATA to 3DSTATE_CLIP.</li>
<li>i965: Move BRW_NEW_FRAGMENT_PROGRAM from 3DSTATE_PS to PS_EXTRA.</li>
<li>i965: Add missing BRW_NEW_CS_PROG_DATA to compute constant atom.</li>
<li>i965: Add missing BRW_CS_PROG_DATA to CS work group surface atom.</li>
<li>i965: Fix gl_InvocationID in dual object GS where invocations == 1.</li>
</ul>
<p>Marek Olšák (12):</p>
<ul>
<li>radeonsi: fix cubemaps viewed as 2D</li>
<li>radeonsi: take compute shader and dispatch indirect memory usage into account</li>
<li>radeonsi: fix FP64 UBO loads with indirect uniform block indexing</li>
<li>mesa: fix glGetFramebufferAttachmentParameteriv w/ on-demand FRONT_BACK alloc</li>
<li>radeonsi: fix interpolateAt opcodes for .zw components</li>
<li>radeonsi: fix texture border colors for compute shaders</li>
<li>radeonsi: disable ReZ</li>
<li>gallium/radeon: make sure the address of separate CMASK is aligned properly</li>
<li>winsys/amdgpu: fix radeon_surf::macro_tile_index for imported textures</li>
<li>egl: use util/macros.h</li>
<li>egl: make interop ABI visible again</li>
<li>glx: make interop ABI visible again</li>
</ul>
<p>Mario Kleiner (1):</p>
<ul>
<li>glx: Perform check for valid fbconfig against proper X-Screen.</li>
</ul>
<p>Martin Peres (2):</p>
<ul>
<li>loader/dri3: add get_dri_screen() to the vtable</li>
<li>loader/dri3: import prime buffers in the currently-bound screen</li>
</ul>
<p>Matt Whitlock (5):</p>
<ul>
<li>egl/android: replace call to dup(2) with fcntl(F_DUPFD_CLOEXEC)</li>
<li>gallium/auxiliary: replace call to dup(2) with fcntl(F_DUPFD_CLOEXEC)</li>
<li>st/dri: replace calls to dup(2) with fcntl(F_DUPFD_CLOEXEC)</li>
<li>st/xa: replace call to dup(2) with fcntl(F_DUPFD_CLOEXEC)</li>
<li>gallium/winsys: replace calls to dup(2) with fcntl(F_DUPFD_CLOEXEC)</li>
</ul>
<p>Max Staudt (1):</p>
<ul>
<li>r300g: Set R300_VAP_CNTL on RSxxx to avoid triangle flickering</li>
</ul>
<p>Michel Dänzer (1):</p>
<ul>
<li>loader/dri3: Overhaul dri3_update_num_back</li>
</ul>
<p>Nicholas Bishop (2):</p>
<ul>
<li>gbm: return appropriate error when queryImage() fails</li>
<li>st/dri: check pipe_screen-&gt;resource_get_handle() return value</li>
</ul>
<p>Nicolai Hähnle (10):</p>
<ul>
<li>gallium/radeon: cleanup and fix branch emits</li>
<li>st/glsl_to_tgsi: disable on-the-fly peephole for 64-bit operations</li>
<li>st/glsl_to_tgsi: simplify translate_tex_offset</li>
<li>st/glsl_to_tgsi: fix textureGatherOffset with indirectly loaded offsets</li>
<li>st/mesa: fix vertex elements setup for doubles</li>
<li>radeonsi: fix indirect loads of 64 bit constants</li>
<li>st/glsl_to_tgsi: fix atomic counter addressing</li>
<li>st/glsl_to_tgsi: fix block copies of arrays of doubles</li>
<li>st/mesa: only set primitive_restart when the restart index is in range</li>
<li>radeonsi: fix 64-bit loads from LDS</li>
</ul>
<p>Samuel Pitoiset (4):</p>
<ul>
<li>nvc0/ir: fix subops for IMAD</li>
<li>gk110/ir: fix wrong emission of OP_NOT</li>
<li>nvc0: use correct bufctx when invalidating CP textures</li>
<li>nvc0/ir: fix emission of IMAD with NEG modifiers</li>
</ul>
<p>Stencel, Joanna (1):</p>
<ul>
<li>egl/wayland: add missing destroy_window callback</li>
</ul>
<p>Tapani Pälli (5):</p>
<ul>
<li>egl: stop claiming support for pbuffer + msaa</li>
<li>egl/dri2: set max values for pbuffer width and height</li>
<li>egl: add check that eglCreateContext gets a valid config</li>
<li>mesa: fix error handling in DrawBuffers</li>
<li>egl: set preserved behavior for surface only if config supports it</li>
</ul>
<p>Tim Rowley (1):</p>
<ul>
<li>configure.ac: add llvm inteljitevents component if enabled</li>
</ul>
<p>Vedran Miletić (1):</p>
<ul>
<li>clover: Fix build against clang SVN &gt;= r273191</li>
</ul>
<p>Vinson Lee (1):</p>
<ul>
<li>Revert "mesa_glinterop: remove inclusion of GLX header"</li>
</ul>
</div>
</body>
</html>

138
docs/relnotes/12.0.5.html Normal file
View File

@@ -0,0 +1,138 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 12.0.5 Release Notes / December 5, 2016</h1>
<p>
Mesa 12.0.5 is a bug fix release which fixes bugs found since the 12.0.5 release.
</p>
<p>
Mesa 12.0.5 implements the OpenGL 4.3 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.3. OpenGL
4.3 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
44d08a27d98bfeacd864381189e434d98afbf451689d01f80380dc1d66450e5b mesa-12.0.5.tar.gz
2b0a972d8282860a11291c09c3ef01ac45171405951eb21a83c45ed2b4321924 mesa-12.0.5.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=77662">Bug 77662</a> - Fail to render to different faces of depth-stencil cube map</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97779">Bug 97779</a> - [regression, bisected][BDW, GPU hang] stuck on render ring, always reproducible</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98415">Bug 98415</a> - Vulkan Driver JSON file contains incorrect field</li>
</ul>
<h2>Changes</h2>
<p>Adam Jackson (2):</p>
<ul>
<li>glx/glvnd: Don't modify the dummy slot in the dispatch table</li>
<li>glx/glvnd: Fix dispatch function names and indices</li>
</ul>
<p>Anuj Phogat (1):</p>
<ul>
<li>i965: Fix GPU hang related to multiple render targets and alpha testing</li>
</ul>
<p>Emil Velikov (4):</p>
<ul>
<li>docs: add release notes for 12.0.4</li>
<li>docs: add sha256 checksums for 12.0.4</li>
<li>cherry-ignore: add reverted LLVM_LIBDIR patch</li>
<li>Update version to 12.0.5</li>
</ul>
<p>Haixia Shi (1):</p>
<ul>
<li>mesa: change state query return value for RGB565</li>
</ul>
<p>Jason Ekstrand (3):</p>
<ul>
<li>i965/fs/generator: Don't use the address immediate for MOV_INDIRECT</li>
<li>anv/cmd_buffer: Take a command buffer instead of a batch in two helpers</li>
<li>anv/cmd_buffer: Enable a CS stall workaround for Sky Lake gt4</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>intel: Fix pixel shader scratch space allocation on Gen9+ platforms.</li>
</ul>
<p>Marek Olšák (13):</p>
<ul>
<li>gallium/radeon: fix behavior of GLSL findLSB(0)</li>
<li>gallium/radeon: make sure HTILE address is aligned properly</li>
<li>radeonsi: fix an assertion failure in si_decompress_sampler_color_textures</li>
<li>gallium/radeon: unify viewport emission code</li>
<li>gallium/radeon: set VPORT_ZMIN/MAX registers correctly</li>
<li>radeonsi: fix gl_PatchVerticesIn for tessellation evaluation shader</li>
<li>radeonsi: fix a crash in imageSize for cubemap arrays</li>
<li>radeonsi: emit TA_CS_BC_BASE_ADDR on SI only if the kernel allows it</li>
<li>gallium/radeon: add support for sharing textures with DCC between processes</li>
<li>radeonsi: always set all blend registers</li>
<li>radeonsi: set CB_BLEND1_CONTROL.ENABLE for dual source blending</li>
<li>radeonsi: disable RB+ blend optimizations for dual source blending</li>
<li>radeonsi: silence runtime warnings with LLVM 3.9</li>
</ul>
<p>Matt Turner (1):</p>
<ul>
<li>anv: Replace "abi_versions" with correct "api_version".</li>
</ul>
<p>Nanley Chery (1):</p>
<ul>
<li>mesa/fbobject: Update CubeMapFace when reusing textures</li>
</ul>
<p>Steinar H. Gunderson (1):</p>
<ul>
<li>Fix races during _mesa_HashWalk().</li>
</ul>
<p>Tim Rowley (3):</p>
<ul>
<li>swr: [rasterizer jitter] cleanup supporting different llvm versions</li>
<li>swr: [rasterizer jitter] fix llvm-3.7 compile</li>
<li>swr: [rasterizer] add support for llvm-3.9</li>
</ul>
</div>
</body>
</html>

148
docs/relnotes/12.0.6.html Normal file
View File

@@ -0,0 +1,148 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 12.0.6 Release Notes / January 23, 2017</h1>
<p>
Mesa 12.0.6 is a bug fix release which fixes bugs found since the 12.0.5 release.
</p>
<p>
Mesa 12.0.6 implements the OpenGL 4.3 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.3. OpenGL
4.3 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
65339ba5d76a45225b8b56f9a1da9db15c569e1d163760faa2921da0a8461741 mesa-12.0.6.tar.gz
7d6da9744c1022a4c2ab6ad01a206984d00443fb691568011d01b3dd97e36448 mesa-12.0.6.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92234">Bug 92234</a> - [BDW] GPU hang in Shogun2</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95130">Bug 95130</a> - Derivatives of gl_Color wrong when helper pixels used</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98329">Bug 98329</a> - [dEQP, EGL, SKL, BDW, BSW] dEQP-EGL.functional.image.render_multiple_contexts.gles2_renderbuffer_depth16_depth_buffer</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99030">Bug 99030</a> - [HSW, regression] transform feedback fails on Linux 4.8</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99354">Bug 99354</a> - [G71] &quot;Assertion `bkref' failed&quot; reproducible with glmark2</li>
</ul>
<h2>Changes</h2>
<p>Chad Versace (3):</p>
<ul>
<li>i965/mt: Disable aux surfaces after making miptree shareable</li>
<li>i965/mt: Disable HiZ when sharing depth buffer externally (v2)</li>
<li>anv: Handle vkGetPhysicalDeviceQueueFamilyProperties with count == 0</li>
</ul>
<p>Emil Velikov (5):</p>
<ul>
<li>docs: add sha256 checksums for 12.0.5</li>
<li>get-typod-pick-list.sh: add new script</li>
<li>automake: use shared llvm libs for make distcheck</li>
<li>egl/wayland: use the destroy_window_callback for swrast</li>
<li>Update version to 12.0.6</li>
</ul>
<p>Fredrik Höglund (1):</p>
<ul>
<li>dri3: Fix MakeCurrent without a default framebuffer</li>
</ul>
<p>Ilia Mirkin (1):</p>
<ul>
<li>nouveau: take extra push space into account for pushbuf_space calls</li>
</ul>
<p>Jason Ekstrand (19):</p>
<ul>
<li>spirv/nir: Fix some texture opcode asserts</li>
<li>spirv/nir: Add support for shadow samplers that return vec4</li>
<li>spirv/nir: Properly handle gather components</li>
<li>anv/pipeline: Set binding_table.gather_texture_start</li>
<li>nir: Add a helper for determining the type of a texture source</li>
<li>nir/lower_tex: Add some helpers for working with tex sources</li>
<li>nir/lower_tex: Add support for lowering coordinate offsets</li>
<li>i965/nir: Enable NIR lowering of txf and rect offsets</li>
<li>i965: Get rid of the do_lower_unnormalized_offsets pass</li>
<li>spirv/nir: Don't increment coord_components for array lod queries</li>
<li>anv/image: Assert that the image format is actually supported</li>
<li>spirv/nir: Move opcode selection higher up in handle_texture</li>
<li>spirv/nir: Refactor type handling in handle_texture</li>
<li>nir/spirv: Refactor coordinate handling in handle_texture</li>
<li>spirv/nir: Handle texture projectors</li>
<li>spirv/nir: Add support for ImageQuerySamples</li>
<li>anv/device: Return the right error for failed maps</li>
<li>anv/device: Implicitly unmap memory objects in FreeMemory</li>
<li>anv/descriptor_set: Write the state offset in the surface state free list.</li>
</ul>
<p>Kenneth Graunke (2):</p>
<ul>
<li>spirv: Move cursor before calling vtn_ssa_value() in phi 2nd pass.</li>
<li>i965: Properly flush in hsw_pause_transform_feedback().</li>
</ul>
<p>Marek Olšák (6):</p>
<ul>
<li>cso: don't release sampler states that are bound</li>
<li>radeonsi: always restore sampler states when unbinding sampler views</li>
<li>radeonsi: fix incorrect FMASK checking in bind_sampler_states</li>
<li>radeonsi: disable CE on SI + AMDGPU</li>
<li>radeonsi: disable the constant engine (CE) on Carrizo and Stoney</li>
<li>gallium/radeon: fix the draw-calls HUD query</li>
</ul>
<p>Matt Turner (3):</p>
<ul>
<li>i965/fs: Rename opt_copy_propagate -&gt; opt_copy_propagation.</li>
<li>i965/fs: Add unit tests for copy propagation pass.</li>
<li>i965/fs: Reject copy propagation into SEL if not min/max.</li>
</ul>
<p>Michel Dänzer (1):</p>
<ul>
<li>cso: Don't restore nr_samplers in cso_restore_fragment_samplers</li>
</ul>
<p>Nicolai Hähnle (1):</p>
<ul>
<li>radeonsi: enable WQM in PS prolog when needed</li>
</ul>
</div>
</body>
</html>

View File

@@ -14,7 +14,7 @@
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 13.0.0 Release Notes / TBD</h1>
<h1>Mesa 13.0.0 Release Notes / November 1, 2016</h1>
<p>
Mesa 13.0.0 is a new development release.
@@ -33,7 +33,8 @@ because compatibility contexts are not supported.
<h2>SHA256 checksums</h2>
<pre>
TBD.
4a54d7cdc1a94a8dae05a75ccff48356406d51b0d6a64cbdc641c266e3e008eb mesa-13.0.0.tar.gz
94edb4ebff82066a68be79d9c2627f15995e1fe10f67ab3fc63deb842027d727 mesa-13.0.0.tar.xz
</pre>
@@ -74,11 +75,236 @@ Note: some of the new features are only available with certain drivers.
<h2>Bug fixes</h2>
TBD.
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=61907">Bug 61907</a> - Indirect rendering of multi-texture vertex arrays broken</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=69622">Bug 69622</a> - eglTerminate then eglMakeCurrent crahes</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=71759">Bug 71759</a> - Intel driver fails with &quot;intel_do_flush_locked failed: No such file or directory&quot; if buffer imported with EGL_NATIVE_PIXMAP_KHR</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=83036">Bug 83036</a> - [ILK]Piglit spec_ARB_copy_image_arb_copy_image-formats fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=89599">Bug 89599</a> - symbol 'x86_64_entry_start' is already defined when building with LLVM/clang</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90513">Bug 90513</a> - Odd gray and red flicker in The Talos Principle on GK104</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91342">Bug 91342</a> - Very dark textures on some objects in indoors environments in Postal 2</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92306">Bug 92306</a> - GL Excess demo renders incorrectly on nv43</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94148">Bug 94148</a> - Framebuffer considered invalid when a draw call is done before glCheckFramebufferStatus</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94354">Bug 94354</a> - R9285 Unigine Valley perf regression since radeonsi: use re-Z</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94561">Bug 94561</a> - [llvmpipe] PIPE_CAP_VIDEO_MEMORY reports negative value on 32 bits (with 16GB ram)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94627">Bug 94627</a> - Game Risen on wine black grass</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94681">Bug 94681</a> - dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23 takes 25 minutes to compile</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95000">Bug 95000</a> - deqp: assert in dEQP-GLES3.functional.vertex_arrays.single_attribute.strides.fixed.user_ptr_stride17_components2_quads1</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95130">Bug 95130</a> - Derivatives of gl_Color wrong when helper pixels used</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95246">Bug 95246</a> - Segfault in glBindFramebuffer()</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95419">Bug 95419</a> - [HSW][regression][bisect] RPG Maker game gives &quot;invalid floating point operation&quot; at startup</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95462">Bug 95462</a> - [BXT,BSW] arb_gpu_shader_fp64 causes gpu hang</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95529">Bug 95529</a> - [regression, bisected] Image corruption in Chrome</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96235">Bug 96235</a> - st_nir.h:34: error: redefinition of typedef nir_shader</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96274">Bug 96274</a> - [NVC0] Failure when compiling compute shader: Assertion `bb-&gt;getFirst()-&gt;serial &lt;= bb-&gt;getExit()-&gt;serial' failed</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96285">Bug 96285</a> - Mesa build broken</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96299">Bug 96299</a> - [vulkan] 64 regressions due to mesa d5f2f32</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96343">Bug 96343</a> - oom since st/mesa: implement PBO downloads for ReadPixels</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96346">Bug 96346</a> - [SNB,CTS] es2-cts.gtf.gl.atan regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96349">Bug 96349</a> - [CTS,SKL,BSW,BDW,KBL,BXT] es31-cts.arrays_of_arrays.interactionuniformbuffers3</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96351">Bug 96351</a> - [CTS,SKL,KBL,BXT] es2-cts.gtf.gl2extensiontests.egl_image.egl_image</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96358">Bug 96358</a> - SSO: wrong interface validation between GS and VS (regresion due to latest gles 3.1)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96425">Bug 96425</a> - [bisected] occasional dark render in The Talos Principle</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96484">Bug 96484</a> - [vulkan] deqp-vk.glsl.builtin.precision.sin / cos regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96504">Bug 96504</a> - [vulkancts] compute tests crash</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96516">Bug 96516</a> - [bisected: 482526] &quot;clover: Update OpenCL version string to match OpenGL&quot;: clover's build fails because of missing git_sha1.h</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96528">Bug 96528</a> - Location qualifier segfaults during shader compilation</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96541">Bug 96541</a> - Tonga Unreal elemental bad rendering since radeonsi: Decompress DCC textures in a render feedback loop</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96565">Bug 96565</a> - Clive Barker's Jericho displays strange,vivid colors when motion blur enabled</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96607">Bug 96607</a> - [bisected] texture misrender / flicker in The Talos Principle on SKL</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96617">Bug 96617</a> - gl_SecondaryFragDataEXT doesn't work for extended blend func</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96629">Bug 96629</a> - dEQP-GLES2.functional.texture.completeness.cube.not_positive_level_0: Assertion `width &gt;= 1' failed.</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96639">Bug 96639</a> - st/mesa: transfer_map with too-high level with dEQP-GLES2.functional.texture.completeness.cube.extra_level</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96674">Bug 96674</a> - [SNB, ILK] spec.ext_image_dma_buf_import.ext_image_dma_buf_import-sample_nv1</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96729">Bug 96729</a> - Wrong shader compilation error message</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96762">Bug 96762</a> - [radeonsi,apitrace] Firewatch: nothing rendered in scrollable (text) areas</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96765">Bug 96765</a> - BindFragDataLocationIndexed on array fragment shader output.</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96770">Bug 96770</a> - include/GL/mesa_glinterop.h:62: error: redefinition of typedef GLXContext</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96782">Bug 96782</a> - [regression bisected] R600 fp64 and glsl-4.00 piglit failures</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96791">Bug 96791</a> - Cannot use image from swapchains for sampling</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96825">Bug 96825</a> - anv_device.c:31:27: fatal error: anv_timestamp.h: No such file or directory</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96835">Bug 96835</a> - &quot;gallium: Force blend color to 16-byte alignment&quot; crash with &quot;-march=native -O3&quot; causes some 32bit games to crash</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96850">Bug 96850</a> - Crucible tests fail for 32bit mesa</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96878">Bug 96878</a> - [Bisected: cc2d0e6][HSW] &quot;GPU HANG&quot; msg after autologin to gnome-session</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96908">Bug 96908</a> - [radeonsi] MSAA causes graphical artifacts</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96911">Bug 96911</a> - webgl2 conformance2/textures/misc/tex-mipmap-levels.html crashes 12.1 Intel driver</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96949">Bug 96949</a> - [regression] Piglit numSamples assertion failures with 9a23a177b90</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96950">Bug 96950</a> - Another regression from bc4e0c486: vbo: Use a bitmask to track the active arrays in vbo_exec*.</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96971">Bug 96971</a> - invariant qualifier is not valid for shader inputs</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97019">Bug 97019</a> - [clover] build failure in llvm/codegen/native.cpp:129:52</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97032">Bug 97032</a> - [BDW,SKL] piglit.spec.arb_gpu_shader5.arb_gpu_shader5-interpolateatcentroid-flat</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97033">Bug 97033</a> - [BDW,SKL] piglit.spec.arb_gpu_shader_fp64.varying-packing.simple regressions</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97039">Bug 97039</a> - The Talos Principle and Serious Sam 3 GPU faults</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97083">Bug 97083</a> - [IVB,BYT] GPU hang on deqp-gles31.functional.separate.shader.random</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97140">Bug 97140</a> - dd_draw.c:949:11: error: implicit declaration of function 'fmemopen' is invalid in C99 [-Werror,-Wimplicit-function-declaration]</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97207">Bug 97207</a> - [IVY BRIDGE] Fragment shader discard writing to depth</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97214">Bug 97214</a> - X not running with error &quot;Failed to make EGL context current&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97225">Bug 97225</a> - [i965 on HD4600 Haswell] xcom switch to ingame cinematics cause segmentation fault</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97231">Bug 97231</a> - GL_DEPTH_CLAMP doesn't clamp to the far plane</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97233">Bug 97233</a> - vkQuake VkSpecializationMapEntry related bug</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97260">Bug 97260</a> - R9 290 low performance in Linux 4.7</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97267">Bug 97267</a> - [BDW] GL45-CTS.texture_cube_map_array.sampling asserts inside brw_fs.cpp</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97278">Bug 97278</a> - [vulkancts,HSW] all vulkancts tests assert on HSW</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97285">Bug 97285</a> - Darkness in Dota 2 after Patch &quot;Make Gallium's BlitFramebuffer follow the GL 4.4 sRGB rules&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97286">Bug 97286</a> - `make check` fails uniform-initializer-test</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97305">Bug 97305</a> - Gallium: TBOs and images set the offset in elements, not bytes</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97307">Bug 97307</a> - glsl/glcpp/tests/glcpp-test regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97309">Bug 97309</a> - piglit.spec.glsl-1_30.compiler.switch-statement.switch-case-duplicated.vert regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97322">Bug 97322</a> - GenerateMipmap creates wrong mipmap for sRGB texture</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97331">Bug 97331</a> - glDrawElementsBaseVertex doesn't work in display list on i915</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97351">Bug 97351</a> - DrawElementsBaseVertex with VBO ignores base vertex on Intel GMA 9xx in some cases</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97413">Bug 97413</a> - BioShock Infinite crashes on startup with Mesa Git version, R7 370</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97426">Bug 97426</a> - glScissor gives vertically inverted result</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97448">Bug 97448</a> - [HSW] deqp-vk.api_.copy_and_blit.image_to_image_stencil regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97476">Bug 97476</a> - Shader binaries should not be stored in the PipelineCache</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97477">Bug 97477</a> - i915g: gl_FragCoord is always (0.0, max_y)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97513">Bug 97513</a> - clover reports wrong device pointer size</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97549">Bug 97549</a> - [SNB, BXT] up to 40% perf drop from &quot;loader/dri3: Overhaul dri3_update_num_back&quot; commit</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97587">Bug 97587</a> - make check nir/tests/control_flow_tests regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97761">Bug 97761</a> - es2-cts.gtf.gl2extensiontests.egl_image_external.testsimpleunassociated crashes</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97773">Bug 97773</a> - New Mesa master now results in warnings in glrender (and subsurfaces and simple-egl), black screen</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97779">Bug 97779</a> - [regression, bisected][BDW, GPU hang] stuck on render ring, always reproducible</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97790">Bug 97790</a> - Vulkan cts regressions due to 24be63066</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97804">Bug 97804</a> - Later precision statement isn't overriding earlier one</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97808">Bug 97808</a> - &quot;tgsi/scan: don't set interp flags for inputs only used by INTERP instructions&quot; causes glitches in wine with gallium nine</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97887">Bug 97887</a> - llvm segfault in janusvr -render vive</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97894">Bug 97894</a> - Crash in u_transfer_unmap_vtbl when unmapping a buffer mapped in different context</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97952">Bug 97952</a> - /usr/include/string.h:518:12: error: exception specification in declaration does not match previous declaration</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97969">Bug 97969</a> - [radeonsi, bisected: fb827c0] Video decoding shows green artifacts</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97976">Bug 97976</a> - VCE regression BO to small for addr since winsys/amdgpu: enable buffer allocation from slabs</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98005">Bug 98005</a> - VCE dual instance encoding inconsistent since st/va: enable dual instances encode by sync surface</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98025">Bug 98025</a> - [radeonsi] incorrect primitive restart index used</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98128">Bug 98128</a> - nir/tests/control_flow_tests.cpp:79:73: error: nir_loop_first_cf_node was not declared in this scope</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98131">Bug 98131</a> - Compiler should reject lowp/mediump qualifiers on atomic_uints</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98133">Bug 98133</a> - GetSynciv should raise an error if bufSize &lt; 0</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98134">Bug 98134</a> - dEQP-GLES31.functional.debug.negative_coverage.get_error.buffer.draw_buffers wants a different GL error code</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98135">Bug 98135</a> - dEQP-GLES31.functional.debug.negative_coverage.get_error.shader.transform_feedback_varyings wants a different GL error code</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98167">Bug 98167</a> - [vulkan, radv] missing libgcrypt and openssl devel results in linker error in libvulkan_common</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98172">Bug 98172</a> - Concurrent call to glClientWaitSync results in segfault in one of the waiters.</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98244">Bug 98244</a> - dEQP: textureOffset(sampler2DArrayShadow, ...) should not exist.</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98264">Bug 98264</a> - Build broken for i965 due to multiple deifnitions of intelFenceExtension</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98307">Bug 98307</a> - &quot;st/glsl_to_tgsi: explicitly track all input and output declaration&quot; broke flightgear colors on rs780</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98326">Bug 98326</a> - [dEQP, EGL] pbuffer depth/stencil tests fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98415">Bug 98415</a> - Vulkan Driver JSON file contains incorrect field</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98431">Bug 98431</a> - UnrealEngine v4 demos startup fails to blorp blit assert</li>
</ul>
<h2>Changes</h2>
TBD.
Mesa no longer depends on libudev.
</div>
</body>

188
docs/relnotes/13.0.1.html Normal file
View File

@@ -0,0 +1,188 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 13.0.1 Release Notes / November 14, 2016</h1>
<p>
Mesa 13.0.1 is a bug fix release which fixes bugs found since the 13.0.0 release.
</p>
<p>
Mesa 13.0.1 implements the OpenGL 4.4 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.4. OpenGL
4.4 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
7cbb91dead05cde279ee95f86e8321c8e1c8fc9deb88f12e0f587672a10d88c5 mesa-13.0.1.tar.gz
71962fb2bf77d33b0ad4a565b490dbbeaf4619099c6d9722f04a73187957a731 mesa-13.0.1.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97715">Bug 97715</a> - [ILK,G45,G965] piglit.spec.arb_separate_shader_objects.misc api error checks</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98012">Bug 98012</a> - [IVB] Segfault when running Dolphin twice with Vulkan</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98512">Bug 98512</a> - radeon r600 vdpau: Invalid command stream: texture bo too small</li>
</ul>
<h2>Changes</h2>
<p>Adam Jackson (2):</p>
<ul>
<li>glx/glvnd: Don't modify the dummy slot in the dispatch table</li>
<li>glx/glvnd: Fix dispatch function names and indices</li>
</ul>
<p>Andreas Boll (1):</p>
<ul>
<li>glx/windows: Add wgl.h to the sources list</li>
</ul>
<p>Anuj Phogat (1):</p>
<ul>
<li>i965: Fix GPU hang related to multiple render targets and alpha testing</li>
</ul>
<p>Chih-Wei Huang (1):</p>
<ul>
<li>android: avoid using libdrm with host modules</li>
</ul>
<p>Darren Salt (1):</p>
<ul>
<li>radv/pipeline: Don't dereference NULL dynamic state pointers</li>
</ul>
<p>Dave Airlie (8):</p>
<ul>
<li>radv: expose xlib platform extension</li>
<li>radv: fix dual source blending</li>
<li>Revert "st/vdpau: use linear layout for output surfaces"</li>
<li>radv: emit correct last export when Z/stencil export is enabled</li>
<li>ac/nir: add support for discard_if intrinsic (v2)</li>
<li>nir: add conditional discard optimisation (v4)</li>
<li>radv: enable conditional discard optimisation on radv.</li>
<li>radv: fix GetFenceStatus for signaled fences</li>
</ul>
<p>Emil Velikov (6):</p>
<ul>
<li>docs: add sha256 checksums for 13.0.0</li>
<li>amd/addrlib: limit fastcall/regparm to GCC i386</li>
<li>anv: use correct .specVersion for extensions</li>
<li>radv: use correct .specVersion for extensions</li>
<li>radv: Suffix the radeon_icd file with the host CPU</li>
<li>Update version to 13.0.1</li>
</ul>
<p>Eric Anholt (1):</p>
<ul>
<li>vc4: Use Newton-Raphson on the 1/W write to fix glmark2 terrain.</li>
</ul>
<p>Francisco Jerez (1):</p>
<ul>
<li>nir: Flip gl_SamplePosition in nir_lower_wpos_ytransform().</li>
</ul>
<p>Fredrik Höglund (1):</p>
<ul>
<li>radv: add support for anisotropic filtering on VI+</li>
</ul>
<p>Jason Ekstrand (21):</p>
<ul>
<li>anv/device: Return DEVICE_LOST if execbuf2 fails</li>
<li>vulkan/wsi/x11: Better handle wsi_x11_connection_create failure</li>
<li>vulkan/wsi/x11: Clean up connections in finish_wsi</li>
<li>anv: Better handle return codes from anv_physical_device_init</li>
<li>intel/blorp: Use wm_prog_data instead of hand-rolling our own</li>
<li>intel/blorp: Pass a brw_stage_prog_data to upload_shader</li>
<li>anv/pipeline: Put actual pointers in anv_shader_bin</li>
<li>anv/pipeline: Properly cache prog_data::param</li>
<li>intel/blorp: Emit all the binding tables</li>
<li>anv/device: Add an execbuf wrapper</li>
<li>anv: Add a cmd_buffer_execbuf helper</li>
<li>anv: Don't presume to know what address is in a surface relocation</li>
<li>anv: Add a new bo_pool_init helper</li>
<li>anv/allocator: Simplify anv_scratch_pool</li>
<li>anv: Initialize anv_bo::offset to -1</li>
<li>anv/batch_chain: Improve write_reloc</li>
<li>anv: Add an anv_execbuf helper struct</li>
<li>anv/batch: Move last_ss_pool_bo_offset to the command buffer</li>
<li>anv: Move relocation handling from EndCommandBuffer to QueueSubmit</li>
<li>anv/cmd_buffer: Take a command buffer instead of a batch in two helpers</li>
<li>anv/cmd_buffer: Enable a CS stall workaround for Sky Lake gt4</li>
</ul>
<p>Kenneth Graunke (2):</p>
<ul>
<li>glsl: Update deref types when resizing implicitly sized arrays.</li>
<li>mesa: Fix pixel shader scratch space allocation on Gen9+ platforms.</li>
</ul>
<p>Kristian Høgsberg (1):</p>
<ul>
<li>anv: Do relocations in userspace before execbuf ioctl</li>
</ul>
<p>Marek Olšák (4):</p>
<ul>
<li>egl: use util/macros.h</li>
<li>egl: make interop ABI visible again</li>
<li>glx: make interop ABI visible again</li>
<li>radeonsi: fix an assertion failure in si_decompress_sampler_color_textures</li>
</ul>
<p>Nicolai Hähnle (4):</p>
<ul>
<li>radeonsi: fix BFE/BFI lowering for GLSL semantics</li>
<li>glsl: fix lowering of UBO references of named blocks</li>
<li>st/glsl_to_tgsi: fix dvec[34] loads from SSBO</li>
<li>st/mesa: fix the layer of VDPAU surface samplers</li>
</ul>
<p>Steven Toth (3):</p>
<ul>
<li>gallium/hud: fix a problem where objects are free'd while in use.</li>
<li>gallium/hud: close a previously opened handle</li>
<li>gallium/hud: protect against and initialization race</li>
</ul>
<p>Timothy Arceri (1):</p>
<ul>
<li>mesa/glsl: delete previously linked shaders earlier when linking</li>
</ul>
</div>
</body>
</html>

189
docs/relnotes/13.0.2.html Normal file
View File

@@ -0,0 +1,189 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 13.0.2 Release Notes / November 28, 2016</h1>
<p>
Mesa 13.0.2 is a bug fix release which fixes bugs found since the 13.0.1 release.
</p>
<p>
Mesa 13.0.2 implements the OpenGL 4.4 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.4. OpenGL
4.4 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
6014233a5db6032ab8de4881384871bbe029de684502707794ce7b3e6beec308 mesa-13.0.2.tar.gz
a6ed622645f4ed61da418bf65adde5bcc4bb79023c36ba7d6b45b389da4416d5 mesa-13.0.2.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97321">Bug 97321</a> - Query INFO_LOG_LENGTH for empty info log should return 0</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97420">Bug 97420</a> - &quot;#version 0&quot; crashes glsl_compiler</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98632">Bug 98632</a> - Fix build on Hurd without PATH_MAX</li>
</ul>
<h2>Changes</h2>
<p>Ben Widawsky (3):</p>
<ul>
<li>i965: Add some APL and KBL SKU strings</li>
<li>i965: Reorder PCI ID list to match release order</li>
<li>i965/glk: Add basic Geminilake support</li>
</ul>
<p>Dave Airlie (14):</p>
<ul>
<li>radv: fix texturesamples to handle single sample case</li>
<li>wsi: fix VK_INCOMPLETE for vkGetSwapchainImagesKHR</li>
<li>radv: don't crash on null swapchain destroy.</li>
<li>ac/nir/llvm: fix channel in texture gather lowering code.</li>
<li>radv: make sure to flush input attachments correctly.</li>
<li>radv: fix image view creation for depth and stencil only</li>
<li>radv: spir-v allows texture size query with and without lod.</li>
<li>vulkan/wsi/x11: handle timeouts properly in next image acquire (v1.1)</li>
<li>vulkan/wsi: store present mode in swapchain base class</li>
<li>vulkan/wsi/x11: add support for IMMEDIATE present mode</li>
<li>radv: fix texel fetch offset with 2d arrays.</li>
<li>radv/si: fix optimal micro tile selection</li>
<li>radv/ac/llvm: shadow samplers only return one value.</li>
<li>radv: fix 3D clears with baseMiplevel</li>
</ul>
<p>Eduardo Lima Mitev (2):</p>
<ul>
<li>vulkan/wsi/x11: Fix behavior of vkGetPhysicalDeviceSurfaceFormatsKHR</li>
<li>vulkan/wsi/x11: Fix behavior of vkGetPhysicalDeviceSurfacePresentModesKHR</li>
</ul>
<p>Emil Velikov (5):</p>
<ul>
<li>docs: add sha256 checksums for 13.0.1</li>
<li>cherry-ignore: add reverted LLVM_LIBDIR patch</li>
<li>anv: fix enumeration of properties</li>
<li>radv: honour the number of properties available</li>
<li>Update version to 13.0.2</li>
</ul>
<p>Eric Anholt (3):</p>
<ul>
<li>vc4: Don't abort when a shader compile fails.</li>
<li>vc4: Clamp the shadow comparison value.</li>
<li>vc4: Fix register class handling of DDX/DDY arguments.</li>
</ul>
<p>Gwan-gyeong Mun (2):</p>
<ul>
<li>util/disk_cache: close a previously opened handle in disk_cache_put (v2)</li>
<li>anv: Fix unintentional integer overflow in anv_CreateDmaBufImageINTEL</li>
</ul>
<p>Iago Toral Quiroga (1):</p>
<ul>
<li>anv/format: handle unsupported formats properly</li>
</ul>
<p>Ian Romanick (2):</p>
<ul>
<li>glcpp: Handle '#version 0' and other invalid values</li>
<li>glsl: Parse 0 as a preprocessor INTCONSTANT</li>
</ul>
<p>Jason Ekstrand (15):</p>
<ul>
<li>anv/gen8: Stall when needed in Cmd(Set|Reset)Event</li>
<li>anv/wsi: Set the fence to signaled in AcquireNextImageKHR</li>
<li>anv: Rework fences</li>
<li>vulkan/wsi/wayland: Include pthread.h</li>
<li>vulkan/wsi/wayland: Clean up some error handling paths</li>
<li>vulkan/wsi: Report the correct min/maxImageCount</li>
<li>i965/gs: Allow primitive id to be a system value</li>
<li>anv: Handle null in all destructors</li>
<li>anv/fence: Handle ANV_FENCE_CREATE_SIGNALED_BIT</li>
<li>nir/spirv: Fix handling of gl_PrimitiveId</li>
<li>anv/blorp: Ignore clears for attachments first used as resolve destinations</li>
<li>anv: Implement a depth stall restriction on gen7</li>
<li>anv/cmd_buffer: Handle running out of binding tables in compute shaders</li>
<li>anv/cmd_buffer: Emit a CS stall before setting a CS pipeline</li>
<li>vulkan/wsi/x11: Implement FIFO mode.</li>
</ul>
<p>Jordan Justen (2):</p>
<ul>
<li>isl: Fix height calculation in isl_msaa_interleaved_scale_px_to_sa</li>
<li>i965/hsw: Set integer mode in sampling state for stencil texturing</li>
</ul>
<p>Kenneth Graunke (4):</p>
<ul>
<li>intel: Set min_ds_entries on Broxton.</li>
<li>i965: Fix compute shader crash.</li>
<li>mesa: Drop PATH_MAX usage.</li>
<li>i965: Fix GS push inputs with enhanced layouts.</li>
</ul>
<p>Kevin Strasser (1):</p>
<ul>
<li>vulkan/wsi: Add a thread-safe queue implementation</li>
</ul>
<p>Lionel Landwerlin (1):</p>
<ul>
<li>anv: fix multi level clears with VK_REMAINING_MIP_LEVELS</li>
</ul>
<p>Lucas Stach (1):</p>
<ul>
<li>gbm: request correct version of the DRI2_FENCE extension</li>
</ul>
<p>Nicolai Hähnle (2):</p>
<ul>
<li>radeonsi: store group_size_variable in struct si_compute</li>
<li>glsl/lower_output_reads: fix geometry shader output handling with conditional emit</li>
</ul>
<p>Steinar H. Gunderson (1):</p>
<ul>
<li>Fix races during _mesa_HashWalk().</li>
</ul>
<p>Tapani Pälli (1):</p>
<ul>
<li>mesa: fix empty program log length</li>
</ul>
</div>
</body>
</html>

177
docs/relnotes/13.0.3.html Normal file
View File

@@ -0,0 +1,177 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 13.0.3 Release Notes / January 5, 2017</h1>
<p>
Mesa 13.0.3 is a bug fix release which fixes bugs found since the 13.0.2 release.
</p>
<p>
Mesa 13.0.3 implements the OpenGL 4.4 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.4. OpenGL
4.4 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
55b07d056f9b855ba9d7c8b2ddc7d3b220a61c6ab1bdc73cbfc2f607721094c2 mesa-13.0.3.tar.gz
d9aa8be5c176d00d0cd503cb2f64a5a403ea471ec819c022581414860d7ba40e mesa-13.0.3.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=77662">Bug 77662</a> - Fail to render to different faces of depth-stencil cube map</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92234">Bug 92234</a> - [BDW] GPU hang in Shogun2</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98329">Bug 98329</a> - [dEQP, EGL, SKL, BDW, BSW] dEQP-EGL.functional.image.render_multiple_contexts.gles2_renderbuffer_depth16_depth_buffer</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99038">Bug 99038</a> - [dEQP, EGL, SKL, BDW, BSW] dEQP-EGL.functional.negative_api.create_pixmap_surface crashes</li>
</ul>
<h2>Changes</h2>
<p>Chad Versace (2):</p>
<ul>
<li>i965/mt: Disable aux surfaces after making miptree shareable</li>
<li>egl: Fix crashes in eglCreate*Surface()</li>
</ul>
<p>Dave Airlie (4):</p>
<ul>
<li>anv: set maxFragmentDualSrcAttachments to 1</li>
<li>radv: set maxFragmentDualSrcAttachments to 1</li>
<li>radv: fix another regression since shadow fixes.</li>
<li>radv: add missing license file to radv_meta_bufimage.</li>
</ul>
<p>Emil Velikov (5):</p>
<ul>
<li>docs: add sha256 checksums for 13.0.2</li>
<li>anv: don't double-close the same fd</li>
<li>anv: don't leak memory if anv_init_wsi() fails</li>
<li>radv: don't leak the fd if radv_physical_device_init() succeeds</li>
<li>Update version to 13.0.3</li>
</ul>
<p>Eric Anholt (1):</p>
<ul>
<li>vc4: In a loop break/continue, jump if everyone has taken the path.</li>
</ul>
<p>Gwan-gyeong Mun (3):</p>
<ul>
<li>anv: Add missing error-checking to anv_block_pool_init (v2)</li>
<li>anv: Update the teardown in reverse order of the anv_CreateDevice</li>
<li>vulkan/wsi: Fix resource leak in success path of wsi_queue_init()</li>
</ul>
<p>Haixia Shi (1):</p>
<ul>
<li>compiler/glsl: fix precision problem of tanh</li>
</ul>
<p>Ilia Mirkin (1):</p>
<ul>
<li>mesa: only verify that enabled arrays have backing buffers</li>
</ul>
<p>Jason Ekstrand (8):</p>
<ul>
<li>anv/cmd_buffer: Re-emit MEDIA_CURBE_LOAD when CS push constants are dirty</li>
<li>anv/image: Rename hiz_surface to aux_surface</li>
<li>anv/cmd_buffer: Remove the 1-D case from the HiZ QPitch calculation</li>
<li>genxml/gen9: Change the default of MI_SEMAPHORE_WAIT::RegisterPoleMode</li>
<li>anv/device: Return the right error for failed maps</li>
<li>anv/device: Implicitly unmap memory objects in FreeMemory</li>
<li>anv/descriptor_set: Write the state offset in the surface state free list.</li>
<li>spirv: Use a simpler and more correct implementaiton of tanh()</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>i965: Allocate at least some URB space even when max_vertices = 0.</li>
</ul>
<p>Marek Olšák (17):</p>
<ul>
<li>radeonsi: always set all blend registers</li>
<li>radeonsi: set CB_BLEND1_CONTROL.ENABLE for dual source blending</li>
<li>radeonsi: disable RB+ blend optimizations for dual source blending</li>
<li>radeonsi: consolidate max-work-group-size computation</li>
<li>radeonsi: apply a multi-wave workgroup SPI bug workaround to affected CIK chips</li>
<li>radeonsi: apply a TC L1 write corruption workaround for SI</li>
<li>radeonsi: apply a tessellation bug workaround for SI</li>
<li>radeonsi: add a tess+GS hang workaround for VI dGPUs</li>
<li>radeonsi: apply the double EVENT_WRITE_EOP workaround to VI as well</li>
<li>cso: don't release sampler states that are bound</li>
<li>radeonsi: always restore sampler states when unbinding sampler views</li>
<li>radeonsi: fix incorrect FMASK checking in bind_sampler_states</li>
<li>radeonsi: allow specifying simm16 of emit_waitcnt at call sites</li>
<li>radeonsi: wait for outstanding memory instructions in TCS barriers</li>
<li>tgsi: fix the src type of TGSI_OPCODE_MEMBAR</li>
<li>radeonsi: wait for outstanding LDS instructions in memory barriers if needed</li>
<li>radeonsi: disable the constant engine (CE) on Carrizo and Stoney</li>
</ul>
<p>Matt Turner (3):</p>
<ul>
<li>i965/fs: Rename opt_copy_propagate -&gt; opt_copy_propagation.</li>
<li>i965/fs: Add unit tests for copy propagation pass.</li>
<li>i965/fs: Reject copy propagation into SEL if not min/max.</li>
</ul>
<p>Nanley Chery (1):</p>
<ul>
<li>mesa/fbobject: Update CubeMapFace when reusing textures</li>
</ul>
<p>Nicolai Hähnle (4):</p>
<ul>
<li>radeonsi: fix isolines tess factor writes to control ring</li>
<li>radeonsi: update all GSVS ring descriptors for new buffer allocations</li>
<li>radeonsi: do not kill GS with memory writes</li>
<li>radeonsi: fix an off-by-one error in the bounds check for max_vertices</li>
</ul>
<p>Rhys Kidd (1):</p>
<ul>
<li>glsl: Add pthread libs to cache_test</li>
</ul>
<p>Timothy Arceri (2):</p>
<ul>
<li>mesa: fix active subroutine uniforms properly</li>
<li>Revert "nir: Turn imov/fmov of undef into undef."</li>
</ul>
</div>
</body>
</html>

255
docs/relnotes/13.0.4.html Normal file
View File

@@ -0,0 +1,255 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 13.0.4 Release Notes / February 1, 2017</h1>
<p>
Mesa 13.0.4 is a bug fix release which fixes bugs found since the 13.0.3 release.
</p>
<p>
Mesa 13.0.4 implements the OpenGL 4.4 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.4. OpenGL
4.4 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
a78518030b0b7d77a6c426ac3ff40f4b27fb0e2cdb0dfbe685024a46cae59bad mesa-13.0.4.tar.gz
a95d7ce8f7bd5f88585e4be3144a341236d8c0fc91f6feaec59bb8ba3120e726 mesa-13.0.4.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92634">Bug 92634</a> - gallium's vl_mpeg12_decoder does not work with st/va</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94512">Bug 94512</a> - X segfaults with glx-tls enabled in a x32 environment</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94900">Bug 94900</a> - HD6950 GPU lockup loop with various steam games (octodad[always], saints row 4[always], dead island[always], grid autosport[sometimes])</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98263">Bug 98263</a> - [radv] The Talos Principle fails to launch with &quot;Fatal error: Cannot set display mode.&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98914">Bug 98914</a> - mesa-vdpau-drivers: breaks vdpau for mpeg2video</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98975">Bug 98975</a> - Wasteland 2 Directors Cut: Hangs. GPU fault</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99030">Bug 99030</a> - [HSW, regression] transform feedback fails on Linux 4.8</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99085">Bug 99085</a> - [EGL] dEQP-EGL.functional.sharing.gles2.multithread intermittent</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99097">Bug 99097</a> - [vulkancts] dEQP-VK.image.store regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99100">Bug 99100</a> - [SKL,BDW,BSW,KBL] dEQP-VK.glsl.return.return_in_dynamic_loop_dynamic_vertex regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99144">Bug 99144</a> - Incorrect rendering using glDrawArraysInstancedBaseInstance and first != 0 on Skylake</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99154">Bug 99154</a> - Link time error when using multiple builtin functions</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99158">Bug 99158</a> - vdpau segfaults and gpu locks with kodi on R9285</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99185">Bug 99185</a> - dEQP-EGL.functional.image.modify.tex_rgb5_a1_tex_subimage_rgba8</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99188">Bug 99188</a> - dEQP-EGL.functional.create_context_ext.robust_gl_30.rgb565_no_depth_no_stencil</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99210">Bug 99210</a> - ES3-CTS.functional.texture.mipmap.cube.generate.rgba5551_*</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99354">Bug 99354</a> - [G71] &quot;Assertion `bkref' failed&quot; reproducible with glmark2</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99450">Bug 99450</a> - [amdgpu] Payday 2 visual glitches on some models</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99451">Bug 99451</a> - polygon offset use after free</li>
</ul>
<h2>Changes</h2>
<p>Andres Rodriguez (2):</p>
<ul>
<li>vulkan/wsi: clarify the severity of lack of DRI3 v2</li>
<li>radv: fix include order for installed headers v2</li>
</ul>
<p>Arda Coskunses (2):</p>
<ul>
<li>vulkan/wsi/x11: don't crash on null visual</li>
<li>vulkan/wsi/x11: don't crash on null wsi x11 connection</li>
</ul>
<p>Bas Nieuwenhuizen (1):</p>
<ul>
<li>radv: Support loader interface version 3.</li>
</ul>
<p>Chad Versace (10):</p>
<ul>
<li>egl: Check config's surface types in eglCreate*Surface()</li>
<li>dri: Add __DRI_IMAGE_FORMAT_ARGB1555</li>
<li>mesa/texformat: Handle GL_RGBA + GL_UNSIGNED_SHORT_5_5_5_1</li>
<li>egl: Emit correct error when robust context creation fails</li>
<li>anv: Handle vkGetPhysicalDeviceQueueFamilyProperties with count == 0</li>
<li>mesa/shaderobj: Fix races on refcounts</li>
<li>meta: Disable dithering during glGenerateMipmap</li>
<li>vulkan: Add new cast macros for VkIcd types</li>
<li>vulkan: Update vk_icd.h to interface version 3</li>
<li>anv: Support loader interface version 3 (patch v2)</li>
</ul>
<p>Christian König (1):</p>
<ul>
<li>vl/zscan: fix "Fix trivial sign compare warnings"</li>
</ul>
<p>Chuck Atkins (1):</p>
<ul>
<li>glx: Add missing glproto dependency for gallium-xlib glx</li>
</ul>
<p>Damien Grassart (1):</p>
<ul>
<li>anv: return count of queue families written</li>
</ul>
<p>Dave Airlie (1):</p>
<ul>
<li>radv: flush smem for uniform buffer bit.</li>
</ul>
<p>Emil Velikov (10):</p>
<ul>
<li>docs: add sha256 checksums for 13.0.3</li>
<li>cherry-ignore: add couple of intel_miptree_copy related patches</li>
<li>cherry-ignore: add radv: Call nir_lower_constant_initializers."</li>
<li>get-typod-pick-list.sh: add new script</li>
<li>cherry-ignore: add "_mesa_ClampColor extension/version fix"</li>
<li>cherry-ignore: add wayland race condition fix</li>
<li>egl/wayland: use the destroy_window_callback for swrast</li>
<li>automake: use shared llvm libs for make distcheck</li>
<li>get-pick-list.sh: Require explicit "13.0" for nominating stable patches</li>
<li>Update version to 13.0.4</li>
</ul>
<p>Francisco Jerez (1):</p>
<ul>
<li>anv: Fix uniform and storage buffer offset alignment limits.</li>
</ul>
<p>Fredrik Höglund (2):</p>
<ul>
<li>radv: fix dual source blending</li>
<li>dri3: Fix MakeCurrent without a default framebuffer</li>
</ul>
<p>Grazvydas Ignotas (1):</p>
<ul>
<li>mapi: update the asm code to support x32</li>
</ul>
<p>Heiko Przybyl (1):</p>
<ul>
<li>r600/sb: Fix loop optimization related hangs on eg</li>
</ul>
<p>Ilia Mirkin (1):</p>
<ul>
<li>nouveau: take extra push space into account for pushbuf_space calls</li>
</ul>
<p>Jason Ekstrand (4):</p>
<ul>
<li>i965/generator/tex: Handle an immediate sampler with an indirect texture</li>
<li>anv/formats: Use the real format for B4G4R4A4_UNORM_PACK16 on gen8</li>
<li>nir/search: Only allow matching SSA values</li>
<li>isl: Mark A4B4G4R4_UNORM as supported on gen8</li>
</ul>
<p>Jonas Ådahl (1):</p>
<ul>
<li>egl/wayland: Cleanup private display connection when init fails</li>
</ul>
<p>Kenneth Graunke (7):</p>
<ul>
<li>i965: Don't bail on vertex element processing if we need draw params.</li>
<li>i965: Fix last slot calculations</li>
<li>i965: Fix texturing in the vec4 TCS and GS backends.</li>
<li>spirv: Move cursor before calling vtn_ssa_value() in phi 2nd pass.</li>
<li>i965: Make BLORP disable the NP Z PMA stall fix.</li>
<li>glsl: Use ir_var_temporary when generating inline functions.</li>
<li>i965: Properly flush in hsw_pause_transform_feedback().</li>
</ul>
<p>Marek Olšák (4):</p>
<ul>
<li>vdpau: call texture_get_handle while the mutex is being held</li>
<li>va: call texture_get_handle while the mutex is being held</li>
<li>radeonsi: for the tess barrier, only use emit_waitcnt on SI and LLVM 3.9+</li>
<li>radeonsi: don't forget to add HTILE to the buffer list for texturing</li>
</ul>
<p>Michel Dänzer (1):</p>
<ul>
<li>cso: Don't restore nr_samplers in cso_restore_fragment_samplers</li>
</ul>
<p>Nanley Chery (3):</p>
<ul>
<li>anv/cmd_buffer: Fix arrayed depth/stencil attachments</li>
<li>anv/cmd_buffer: Fix programmed HiZ qpitch</li>
<li>anv/image: Disable HiZ for depth buffer arrays</li>
</ul>
<p>Nayan Deshmukh (1):</p>
<ul>
<li>st/va: delay calling begin_frame until we have all parameters</li>
</ul>
<p>Rob Clark (1):</p>
<ul>
<li>freedreno: some fence cleanup</li>
</ul>
<p>Samuel Pitoiset (1):</p>
<ul>
<li>gallium/hud: add missing break in hud_cpufreq_graph_install()</li>
</ul>
<p>Timothy Arceri (3):</p>
<ul>
<li>nir: Turn imov/fmov of undef into undef</li>
<li>glsl: fix opt_minmax redundancy checks against baserange</li>
<li>util: fix list_is_singular()</li>
</ul>
<p>Zachary Michaels (1):</p>
<ul>
<li>radeonsi: Always leave poly_offset in a valid state</li>
</ul>
</div>
</body>
</html>

210
docs/relnotes/13.0.5.html Normal file
View File

@@ -0,0 +1,210 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 13.0.5 Release Notes / February 20, 2017</h1>
<p>
Mesa 13.0.5 is a bug fix release which fixes bugs found since the 13.0.4 release.
</p>
<p>
Mesa 13.0.5 implements the OpenGL 4.4 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.4. OpenGL
4.4 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
7e45e3812078726eabca6d9384364bf035a3c4279024ec9090dd1b19a8989926 mesa-13.0.5.tar.gz
bfcea7e2c801525a60895c8aff11aa68457ee9aa35d01a4638e1f310a3f5ef87 mesa-13.0.5.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98329">Bug 98329</a> - [dEQP, EGL, SKL, BDW, BSW] dEQP-EGL.functional.image.render_multiple_contexts.gles2_renderbuffer_depth16_depth_buffer</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98421">Bug 98421</a> - src/loader/loader.c:111:40: error: unknown type name drmDevicePtr</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98526">Bug 98526</a> - glsl/tests/general-ir-test regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99532">Bug 99532</a> - Compute shader doesn't give right result under some circumstances</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99631">Bug 99631</a> - segfault with OSVRTrackerView and openscenegraph git master</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99633">Bug 99633</a> - rasterizer/core/clip.h:279:49: error: const struct API_STATE has no member named linkageCount</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99692">Bug 99692</a> - [radv] Mostly broken on Hawaii PRO/CIK ASICs</li>
</ul>
<h2>Changes</h2>
<p>Bartosz Tomczyk (2):</p>
<ul>
<li>r600: Fix stack overflow</li>
<li>r600/sb: Fix memory leak</li>
</ul>
<p>Bruce Cherniak (1):</p>
<ul>
<li>swr: [rasterizer core] Remove dead code Clipper::ClipScalar()</li>
</ul>
<p>Chad Versace (1):</p>
<ul>
<li>i965/mt: Disable HiZ when sharing depth buffer externally (v2)</li>
</ul>
<p>Dave Airlie (3):</p>
<ul>
<li>radv: change base aligmment for allocated memory.</li>
<li>radv: fix cik macroModeIndex.</li>
<li>radv: adopt some init config workarounds from radeonsi.</li>
</ul>
<p>Derek Foreman (1):</p>
<ul>
<li>egl/dri2: add image_loader_extension back into loader extensions for wayland</li>
</ul>
<p>Emil Velikov (26):</p>
<ul>
<li>docs: add sha256 checksums for 13.0.4</li>
<li>configure.ac: list radeon in --with-vulkan-drivers help string</li>
<li>i965: automake: correctly set MKDIR_GEN</li>
<li>freedreno: automake: correctly set MKDIR_GEN</li>
<li>i965: automake: include builddir prior to srcdir</li>
<li>i915: automake: include builddir prior to srcdir</li>
<li>egl: automake: include builddir prior to srcdir</li>
<li>clover: automake: include builddir prior to srcdir</li>
<li>st/dri: automake: include builddir prior to srcdir</li>
<li>d3dadapter9: automake: include builddir prior to srcdir</li>
<li>glx: automake: include builddir prior to srcdir</li>
<li>glx/apple: automake: include builddir prior to srcdir</li>
<li>glx/windows: automake: include builddir prior to srcdir</li>
<li>loader: automake: include builddir prior to srcdir</li>
<li>mapi: automake: include builddir prior to srcdir</li>
<li>radeon, r200: automake: include builddir prior to srcdir</li>
<li>dri/swrast: automake: include builddir prior to srcdir</li>
<li>dri/osmesa: automake: include builddir prior to srcdir</li>
<li>mesa/tests: automake: include builddir prior to srcdir</li>
<li>bin/get-extra-pick-list: use git merge-base to get the branchpoint</li>
<li>bin/get-extra-pick-list: rework to use already_picked list</li>
<li>bin/get-typod-pick-list.sh: limit `git grep ...' to only as needed</li>
<li>bin/get-pick-list.sh: limit `git grep ...' only as needed</li>
<li>bin/get-pick-list.sh: remove ancient way of nominating patches</li>
<li>bin/get-fixes-pick-list.sh: add new script</li>
<li>Update version to 13.0.5</li>
</ul>
<p>Eric Anholt (1):</p>
<ul>
<li>vc4: Avoid emitting small immediates for UBO indirect load address guards.</li>
</ul>
<p>Hans de Goede (1):</p>
<ul>
<li>glx/glvnd: Fix GLXdispatchIndex sorting</li>
</ul>
<p>Ian Romanick (11):</p>
<ul>
<li>linker: Slight code rearrange to prevent duplication in the next commit</li>
<li>linker: Accurately track gl_uniform_block::stageref</li>
<li>glsl: Split process_block_array into two functions</li>
<li>glsl: Fix wonkey indentation left from previous commit</li>
<li>glsl: Track the linearized array index for each UBO instance array element</li>
<li>glsl: Use simpler visitor to determine which UBO and SSBO blocks are used</li>
<li>glsl: Add tracking for elements of an array-of-arrays that have been accessed</li>
<li>glsl: Add structures to track accessed elements of a single array</li>
<li>glsl: Mark a set of array elements as accessed using a list of array_deref_range</li>
<li>glsl: Walk a list of ir_dereference_array to mark array elements as accessed</li>
<li>linker: Accurately mark a uniform block instance array element as used in a stage</li>
</ul>
<p>Ilia Mirkin (3):</p>
<ul>
<li>vbo: process buffer binding state changes on draw when recording</li>
<li>st/mesa: MAX_VARYING is the max supported number of patch varyings, not min</li>
<li>nvc0: disable linked tsc mode in compute launch descriptor</li>
</ul>
<p>Jason Ekstrand (11):</p>
<ul>
<li>nir/search: Use the correct bit size for integer comparisons</li>
<li>i965/blorp: Use the correct ISL format for combined depth/stencil</li>
<li>intel/blorp: Handle clearing of A4B4G4R4 on all platforms</li>
<li>isl/formats: Only advertise sampling for A4B4G4R4 on Broadwell</li>
<li>anv: Flush render cache before STATE_BASE_ADDRESS on gen7</li>
<li>anv: Improve flushing around STATE_BASE_ADDRESS</li>
<li>vulkan/wsi/wayland: Handle VK_INCOMPLETE for GetFormats</li>
<li>vulkan/wsi/wayland: Handle VK_INCOMPLETE for GetPresentModes</li>
<li>vulkan/wsi: Lower the maximum image sizes</li>
<li>i965/sampler_state: Pass texObj into update_sampler_state</li>
<li>i965/sampler_state: Set the "Base Mip Level" field on Sandy Bridge</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>i965: Unbind deleted shaders from brw_context, fixing malloc heisenbug.</li>
</ul>
<p>Lionel Landwerlin (5):</p>
<ul>
<li>anv: don't require render target isl bit for depth/stencil surfaces</li>
<li>anv: set command buffer to NULL when allocations fail</li>
<li>anv: fix descriptor pool internal size allocation</li>
<li>spirv: handle OpUndef as part of the variable parsing pass</li>
<li>spirv: handle undefined components for OpVectorShuffle</li>
</ul>
<p>Marc-André Lureau (1):</p>
<ul>
<li>tgsi-dump: dump label if instruction has one</li>
</ul>
<p>Marek Olšák (2):</p>
<ul>
<li>radeonsi: always set the TCL1_ACTION_ENA when invalidating L2</li>
<li>gallium/radeon: fix performance of buffer readbacks</li>
</ul>
<p>Topi Pohjolainen (2):</p>
<ul>
<li>i965: Make depth clear flushing more explicit</li>
<li>i965/gen6: Issue direct depth stall and flush after depth clear</li>
</ul>
<p>Vinson Lee (2):</p>
<ul>
<li>scons: Require libdrm &gt;= 2.4.66 for DRM.</li>
<li>util: Fix Clang trivial destructor check.</li>
</ul>
</div>
</body>
</html>

287
docs/relnotes/13.0.6.html Normal file
View File

@@ -0,0 +1,287 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 13.0.6 Release Notes / March 20, 2017</h1>
<p>
Mesa 13.0.6 is a bug fix release which fixes bugs found since the 13.0.5 release.
</p>
<p>
Mesa 13.0.6 implements the OpenGL 4.4 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.4. OpenGL
4.4 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
1076590f29103f022a2cd87e6dff6ae77072013745603d06b0410c373ab2bb1a mesa-13.0.6.tar.gz
29ef104a7fc082d352b1599bd6cb1d040be424ccd22f5e0eb7ee9b0e9acd3597 mesa-13.0.6.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=68504">Bug 68504</a> - 9.2-rc1 workaround for clover build failure on ppc/altivec: cannot convert 'bool' to '__vector(4) __bool int' in return</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97102">Bug 97102</a> - [dri][swr] stack overflow / infinite loop with GALLIUM_DRIVER=swr</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98869">Bug 98869</a> - Electronic Super Joy graphic artefacts (regression,bisected)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99401">Bug 99401</a> - [g33] regression: piglit.spec.!opengl 1_0.gl-1_0-beginend-coverage</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99456">Bug 99456</a> - Firefox crashing when opening about:support with WebGL2 enabled</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99677">Bug 99677</a> - heap-use-after-free in glsl</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99715">Bug 99715</a> - Don't print: &quot;Note: Buggy applications may crash, if they do please report to vendor&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99850">Bug 99850</a> - Tessellation bug on Carrizo</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100049">Bug 100049</a> - &quot;ralloc: Make sure ralloc() allocations match malloc()'s alignment.&quot; causes seg fault in 32bit build</li>
</ul>
<h2>Changes</h2>
<p>Alex Smith (2):</p>
<ul>
<li>radv: Emit pending flushes before executing a secondary command buffer</li>
<li>radv: Flush before copying with PKT3_WRITE_DATA in CmdUpdateBuffer</li>
</ul>
<p>Bartosz Tomczyk (1):</p>
<ul>
<li>glsl: fix heap-buffer-overflow</li>
</ul>
<p>Bas Nieuwenhuizen (8):</p>
<ul>
<li>radv: Pass CMASK alignment to application.</li>
<li>radv: Pass DCC alignment to application.</li>
<li>radv: Never try to create more than max_sets descriptor sets.</li>
<li>radv: Reset emitted compute pipeline when calling secondary cmd buffer.</li>
<li>radv: Only use PKT3_OCCLUSION_QUERY when it doesn't hang.</li>
<li>radv: Use correct size for availability flag.</li>
<li>radv: Disable HTILE for textures with multiple layers/levels.</li>
<li>radv: Emit cache flushes before CP DMA.</li>
</ul>
<p>Ben Crocker (3):</p>
<ul>
<li>gallivm: Improve debug output (V2)</li>
<li>gallivm: Override getHostCPUName() "generic" w/ "pwr8" (v4)</li>
<li>gallivm: Reenable PPC VSX (v3)</li>
</ul>
<p>Brendan King (1):</p>
<ul>
<li>egl/dri3: implement query surface hook</li>
</ul>
<p>Bruce Cherniak (1):</p>
<ul>
<li>swr: Prune empty nodes in CalculateProcessorTopology.</li>
</ul>
<p>Connor Abbott (1):</p>
<ul>
<li>anv: fix Get*MemoryRequirements for !LLC</li>
</ul>
<p>Dave Airlie (13):</p>
<ul>
<li>radv: program a default point size.</li>
<li>radv: handle transfer_write as a dst flag.</li>
<li>radv/ac: handle nir irem opcode.</li>
<li>radv/ac: implement txs for buffer textures.</li>
<li>radv/ac: correctly size shared memory usage.</li>
<li>radv/ac: avoid the fmask path when doing txs.</li>
<li>radv: pass FMASK alignment to application</li>
<li>tgsi: fix memory leak in tgsi sanity check</li>
<li>radv: fix depth format in blit2d.</li>
<li>radv: fix txs for sampler buffers</li>
<li>radv: drop Z24 support.</li>
<li>radv: disable mip point pre clamping.</li>
<li>radv: setup llvm target data layout</li>
</ul>
<p>Emil Velikov (6):</p>
<ul>
<li>docs: add sha256 checksums for 13.0.5</li>
<li>Revert "get-pick-list.sh: Require explicit "13.0" for nominating stable patches"</li>
<li>cherry-ignore: don't pick nir_op_pack_double optimisation fix</li>
<li>i965: move brw_define.h ifndef guard to the top</li>
<li>cherry-ignore: add ANV fast clears related fixes</li>
<li>Update version to 13.0.6</li>
</ul>
<p>Fredrik Höglund (2):</p>
<ul>
<li>radv: fix the dynamic buffer index in vkCmdBindDescriptorSets</li>
<li>radv/ac: fix multiple descriptor sets with dynamic buffers</li>
</ul>
<p>George Kyriazis (1):</p>
<ul>
<li>swr: Align query results allocation</li>
</ul>
<p>Grazvydas Ignotas (3):</p>
<ul>
<li>r300g: only allow byteswapped formats on big endian</li>
<li>gallium/u_queue: fix a crash with atexit handlers</li>
<li>gallium/u_queue: set num_threads correctly if not all threads start</li>
</ul>
<p>Gregory Hainaut (1):</p>
<ul>
<li>glapi: fix typo in count_scale</li>
</ul>
<p>Ian Romanick (1):</p>
<ul>
<li>mesa: Don't advertise GL_OES_read_format in core profile</li>
</ul>
<p>Ilia Mirkin (8):</p>
<ul>
<li>nvc0: increase number of ubo binding points</li>
<li>nvc0/ir: fix robustness guarantees for constbuf loads on kepler+ compute</li>
<li>nvc0/ir: fix ubo max clamp, reset file index</li>
<li>gm107/ir: fix address offset bitfield for ATOMS</li>
<li>nvc0: set the render condition in the compute object</li>
<li>st/mesa: don't pass compare mode for stencil-sampled textures</li>
<li>nvc0: take extra pushbuf space into account for pushbuf_space calls</li>
<li>nvc0: increase alignment to 256 for texture buffers on fermi</li>
</ul>
<p>Jacob Lifshay (1):</p>
<ul>
<li>vulkan/wsi: Improve the DRI3 error message</li>
</ul>
<p>Jason Ekstrand (11):</p>
<ul>
<li>i965: Use a better guardband calculation.</li>
<li>intel/blorp: Swizzle clear colors on the CPU</li>
<li>i965/fs: Remove the inline pack_double_2x32 optimization</li>
<li>anv: Add an invalidate_range helper</li>
<li>anv/query: clflush the bo map on non-LLC platforms</li>
<li>genxml: Make MI_STORE_DATA_IMM more consistent</li>
<li>anv/query: Perform CmdResetQueryPool on the GPU</li>
<li>blorp/exec: Use uint32_t for copying varying data</li>
<li>intel/blorp: Explicitly flush all allocated state</li>
<li>anv: Accurately advertise dynamic descriptor limits</li>
<li>anv: Properly handle destroying NULL devices and instances</li>
</ul>
<p>Jonas Pfeil (1):</p>
<ul>
<li>ralloc: Make sure ralloc() allocations match malloc()'s alignment.</li>
</ul>
<p>Jose Maria Casanova Crespo (1):</p>
<ul>
<li>glsl: non-last member unsized array on SSBO must fail compilation on GLSL ES 3.1</li>
</ul>
<p>Kenneth Graunke (7):</p>
<ul>
<li>i965: Fix fast depth clears for surfaces with a dimension of 16384.</li>
<li>i965: Use a UW source type for CS_OPCODE_CS_TERMINATE.</li>
<li>i965: Fix check for negative pitch in can_do_fast_copy_blit().</li>
<li>i965: Support the force_glsl_version driconf option.</li>
<li>i965: Combine the Gen6 SF and Clip viewport atoms.</li>
<li>mesa: Do (TCS &amp;&amp; !TES) draw time validation in ES as well.</li>
<li>egl: Ensure ResetNotificationStrategy matches for shared contexts.</li>
</ul>
<p>Lionel Landwerlin (3):</p>
<ul>
<li>spirv: don't assert with location decorations on non i/o variables</li>
<li>anv: wsi: report presentation error per image request</li>
<li>i965/fs: fix uninitialized memory access</li>
</ul>
<p>Marc Di Luzio (1):</p>
<ul>
<li>glsl: correct compute shader checks for memoryBarrier functions</li>
</ul>
<p>Marek Olšák (10):</p>
<ul>
<li>st/mesa: destroy pipe_context before destroying st_context (v2)</li>
<li>radeonsi: don't invoke DCC decompression in update_all_texture_descriptors</li>
<li>radeonsi: fix UNSIGNED_BYTE index buffer fallback with non-zero start (v2)</li>
<li>gallium/util: remove unused u_index_modify helpers</li>
<li>gallium/u_index_modify: don't add PIPE_TRANSFER_UNSYNCHRONIZED unconditionally</li>
<li>gallium/u_queue: fix random crashes when the app calls exit()</li>
<li>st/mesa: reset sample_mask, min_sample, and render_condition for PBO ops</li>
<li>st/mesa: set blend state for PBO readbacks</li>
<li>radeonsi: fix broken tessellation on Carrizo and Stoney</li>
<li>radeonsi: mark all bound shader buffer ranges as initialized</li>
</ul>
<p>Matt Turner (1):</p>
<ul>
<li>clover: Work around build failure with AltiVec.</li>
</ul>
<p>Nicolai Hähnle (12):</p>
<ul>
<li>mesa/main: fix meta caller of _mesa_ClampColor</li>
<li>radeonsi: fix texture gather on stencil textures</li>
<li>glsl: split DIV_TO_MUL_RCP into single- and double-precision flags</li>
<li>glx/dri3: handle NULL pointers in loader-to-DRI3 drawable conversion</li>
<li>glx/dri3: guard in_current_context against a disappeared drawable</li>
<li>glx: guard swap-interval functions against destroyed drawables</li>
<li>dri/common: clear the loaderPrivate pointer in driDestroyDrawable</li>
<li>winsys/amdgpu: reduce max_alloc_size based on GTT limits</li>
<li>radeonsi: handle MultiDrawIndirect in si_get_draw_start_count</li>
<li>radeonsi: fix UINT/SINT clamping for 10-bit formats on &lt;= CIK</li>
<li>st/glsl_to_tgsi: avoid iterating past the head of the instruction list</li>
<li>st/mesa: inform the driver of framebuffer changes before compute dispatches</li>
</ul>
<p>Samuel Iglesias Gonsálvez (6):</p>
<ul>
<li>glsl: fix heap-use-after-free in ast_declarator_list::hir()</li>
<li>i965/fs: mark last DF uniform array element as 64 bit live one</li>
<li>i965/fs: detect different bit size accesses to uniforms to push them in proper locations</li>
<li>i965/fs: fix indirect load DF uniforms on BSW/BXT</li>
<li>i965/fs: fix source type when emitting MOV_INDIRECT to read ICP handles</li>
<li>i965/fs: emit MOV_INDIRECT with the source with the right register type</li>
</ul>
<p>Samuel Pitoiset (1):</p>
<ul>
<li>winsys/amdgpu: avoid potential segfault in amdgpu_bo_map()</li>
</ul>
</div>
</body>
</html>

285
docs/relnotes/17.0.0.html Normal file
View File

@@ -0,0 +1,285 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 17.0.0 Release Notes / February 13, 2017</h1>
<p>
Mesa 17.0.0 is a new development release.
People who are concerned with stability and reliability should stick
with a previous release or wait for Mesa 17.0.1.
</p>
<p>
Mesa 17.0.0 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
696578f0b83796470511a88a95fff15a2a25fa201a9e487716f2ca20c177c3ab mesa-17.0.0.tar.gz
39db3d59700159add7f977307d12a7dfe016363e760ad82280ac4168ea668481 mesa-17.0.0.tar.xz
</pre>
<h2>New features</h2>
<p>
Note: some of the new features are only available with certain drivers.
</p>
<ul>
<li>GL_ARB_post_depth_coverage on i965/gen9+</li>
<li>GL_KHR_blend_equation_advanced on nvc0</li>
<li>GL_INTEL_conservative_rasterization on i965/gen9+</li>
<li>GL_NV_image_formats on any driver supporting GL_ARB_shader_image_load_store (i965, nvc0, radeonsi, softpipe)</li>
<li>GL_ARB_gpu_shader_fp64 in i965/haswell</li>
<li>GL_ARB_vertex_attrib_64bit in i965/haswell</li>
<li>GL_ARB_shader_precision in i965/haswell</li>
<li>Intel Haswell now supports OpenGL 4.2</li>
<li>GL_OES_geometry_shader on i965/haswell</li>
<li>GL_OES_texture_cube_map_array on i965/haswell</li>
<li>GL_OES_viewport_array on i965/haswell</li>
<li>Vulkan Float64 capability support on Intel's ANV driver</li>
</ul>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=70623">Bug 70623</a> - libglx.so: undefined symbol: _glapi_tls_Context</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=72902">Bug 72902</a> - [IVB/HSW/BDW] DOTA2 segfaults unless Mesa is configured with (non-default) --enable-glx-tls</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=73778">Bug 73778</a> - _glapi_tls_Dispatch undefined</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=77662">Bug 77662</a> - Fail to render to different faces of depth-stencil cube map</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=89043">Bug 89043</a> - undefined symbol: _glapi_tls_Dispatch</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91281">Bug 91281</a> - Tonga VCE 2160p encode fails with BO to small for addr</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92234">Bug 92234</a> - [BDW] GPU hang in Shogun2</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92634">Bug 92634</a> - gallium's vl_mpeg12_decoder does not work with st/va</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92760">Bug 92760</a> - Add FP64 support to the i965 shader backends</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92925">Bug 92925</a> - Incorrect GEN for ASTC in Surface Format Table</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93551">Bug 93551</a> - Divinity: Original Sin Enhanced Edition(Native) crash on start</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94512">Bug 94512</a> - X segfaults with glx-tls enabled in a x32 environment</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94900">Bug 94900</a> - HD6950 GPU lockup loop with various steam games (octodad[always], saints row 4[always], dead island[always], grid autosport[sometimes])</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94904">Bug 94904</a> - [vulkan, BSW] dEQP-VK.api.object_management.multithreaded_per_thread_device intermittent crash</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=95460">Bug 95460</a> - Please add more drivers (freedreno, virgl) to features.txt status document</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96959">Bug 96959</a> - nop.sat generated by pow workaround?</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97102">Bug 97102</a> - [dri][swr] stack overflow / infinite loop with GALLIUM_DRIVER=swr</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97232">Bug 97232</a> - Line rendering broken in Dolphin when using gl_ClipDistance</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97287">Bug 97287</a> - GL45-CTS.vertex_attrib_binding.basic-inputL-case1 fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97321">Bug 97321</a> - Query INFO_LOG_LENGTH for empty info log should return 0</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97420">Bug 97420</a> - &quot;#version 0&quot; crashes glsl_compiler</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97422">Bug 97422</a> - trying to call a number as a function results into a crash</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97447">Bug 97447</a> - GL 3.0 compatibility context exposes GL_ARB_compute_shader</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97473">Bug 97473</a> - Memory corruption when uploading DXT5 cubemap faces</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97715">Bug 97715</a> - [ILK,G45,G965] piglit.spec.arb_separate_shader_objects.misc api error checks</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97779">Bug 97779</a> - [regression, bisected][BDW, GPU hang] stuck on render ring, always reproducible</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97804">Bug 97804</a> - Later precision statement isn't overriding earlier one</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97952">Bug 97952</a> - /usr/include/string.h:518:12: error: exception specification in declaration does not match previous declaration</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97967">Bug 97967</a> - glsl/tests/cache-test regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98005">Bug 98005</a> - VCE dual instance encoding inconsistent since st/va: enable dual instances encode by sync surface</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98012">Bug 98012</a> - [IVB] Segfault when running Dolphin twice with Vulkan</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98134">Bug 98134</a> - dEQP-GLES31.functional.debug.negative_coverage.get_error.buffer.draw_buffers wants a different GL error code</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98172">Bug 98172</a> - Concurrent call to glClientWaitSync results in segfault in one of the waiters.</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98238">Bug 98238</a> - witcher 2: objects are black when changing lod</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98243">Bug 98243</a> - dEQP mismatched UBO precision qualifiers</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98245">Bug 98245</a> - GLES3.1 link negative dEQP &quot;expected linking to fail, but passed.&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98250">Bug 98250</a> - dEQP-GLES31.functional.debug.negative_coverage.get_error.texture.texparameterIiv/texparameterIuiv failure</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98263">Bug 98263</a> - [radv] The Talos Principle fails to launch with &quot;Fatal error: Cannot set display mode.&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98297">Bug 98297</a> - Can't configure a desktop with 3x4k monitors in one row</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98299">Bug 98299</a> - Compute shaders generate stupid divides</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98307">Bug 98307</a> - &quot;st/glsl_to_tgsi: explicitly track all input and output declaration&quot; broke flightgear colors on rs780</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98326">Bug 98326</a> - [dEQP, EGL] pbuffer depth/stencil tests fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98327">Bug 98327</a> - [dEQP, EGL] dEQP-EGL.functional.resize not supported</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98328">Bug 98328</a> - [dEQP, EGL] luminance tests fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98329">Bug 98329</a> - [dEQP, EGL, SKL, BDW, BSW] dEQP-EGL.functional.image.render_multiple_contexts.gles2_renderbuffer_depth16_depth_buffer</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98330">Bug 98330</a> - [dEQP, EGL] dEQP-EGL.functional.buffer_age.no_preserve fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98339">Bug 98339</a> - dEQP-EGL: Got EGL_BAD_MATCH: eglCreateSyncKHR()</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98343">Bug 98343</a> - dEQP-EGL: GL_INVALID_ENUM at teglCreateContextExtTests</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98415">Bug 98415</a> - Vulkan Driver JSON file contains incorrect field</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98421">Bug 98421</a> - src/loader/loader.c:111:40: error: unknown type name drmDevicePtr</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98431">Bug 98431</a> - UnrealEngine v4 demos startup fails to blorp blit assert</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98480">Bug 98480</a> - Support R8 image texture in ES 3.1</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98512">Bug 98512</a> - radeon r600 vdpau: Invalid command stream: texture bo too small</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98518">Bug 98518</a> - [r600g, bisected] regression: NI/Turks MSAA texture corruption with FreeCAD and Wine games</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98526">Bug 98526</a> - glsl/tests/general-ir-test regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98595">Bug 98595</a> - glsl: ralloc assertion &quot;info-&gt;canary == CANARY&quot; failed</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98599">Bug 98599</a> - xterm menus corrupt since tgsi/scan: handle indirect image indexing correctly</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98632">Bug 98632</a> - Fix build on Hurd without PATH_MAX</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98681">Bug 98681</a> - ir_builder_print_visitor.cpp:401:67: error: expected ')' before 'PRIx64'</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98694">Bug 98694</a> - &quot;(5=2)?1:1&quot; as array size decleration crashes glsl_compiler</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98740">Bug 98740</a> - bitcode.cpp:102:8: error: Error is not a member of llvm</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98767">Bug 98767</a> - [swrast] ralloc.c:84: get_header: Assertion `info-&gt;canary == CANARY' failed.</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98774">Bug 98774</a> - glsl/tests/warnings-test regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98815">Bug 98815</a> - [SKL/BDW GT2] large perf regression in TessMark</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98840">Bug 98840</a> - nir clone test fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98893">Bug 98893</a> - [SKL] piglit.spec.arb_shader_image_load_store.semantics intermittent</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98914">Bug 98914</a> - mesa-vdpau-drivers: breaks vdpau for mpeg2video</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98917">Bug 98917</a> - [BDW SKL BSW KBL] Tessellation CTS tests regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98975">Bug 98975</a> - Wasteland 2 Directors Cut: Hangs. GPU fault</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99010">Bug 99010</a> - --disable-gallium-llvm no longer recognized</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99013">Bug 99013</a> - [regression, bisected] radeonsi: commit 4c8c13b3 &quot;Use amdgcn intrinsics for fs interpolation&quot; makes system unusable</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99030">Bug 99030</a> - [HSW, regression] transform feedback fails on Linux 4.8</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99038">Bug 99038</a> - [dEQP, EGL, SKL, BDW, BSW] dEQP-EGL.functional.negative_api.create_pixmap_surface crashes</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99072">Bug 99072</a> - [byt,ivb,snb] ES3-CTS.gtf.GL3Tests.shadow regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99085">Bug 99085</a> - [EGL] dEQP-EGL.functional.sharing.gles2.multithread intermittent</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99097">Bug 99097</a> - [vulkancts] dEQP-VK.image.store regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99100">Bug 99100</a> - [SKL,BDW,BSW,KBL] dEQP-VK.glsl.return.return_in_dynamic_loop_dynamic_vertex regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99119">Bug 99119</a> - swr_fence_work.cpp(42): error: argument of type &quot;std::nullptr_t&quot; is incompatible with parameter of type &quot;unsigned long&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99144">Bug 99144</a> - Incorrect rendering using glDrawArraysInstancedBaseInstance and first != 0 on Skylake</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99154">Bug 99154</a> - Link time error when using multiple builtin functions</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99158">Bug 99158</a> - vdpau segfaults and gpu locks with kodi on R9285</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99185">Bug 99185</a> - dEQP-EGL.functional.image.modify.tex_rgb5_a1_tex_subimage_rgba8</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99188">Bug 99188</a> - dEQP-EGL.functional.create_context_ext.robust_gl_30.rgb565_no_depth_no_stencil</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99210">Bug 99210</a> - ES3-CTS.functional.texture.mipmap.cube.generate.rgba5551_*</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99214">Bug 99214</a> - Crash in library libswrAVX.so when assigning vertex buffer object pointers with elements of type GL_DOUBLE</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99219">Bug 99219</a> - The Stanley Parable GPU hang when starting a new game</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99229">Bug 99229</a> - [G33] thousands of tests crash</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99231">Bug 99231</a> - [HSW][i965] Crash in upload_3dstate_streamout()</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99287">Bug 99287</a> - piglit.spec.glsl-1_10.execution.vs-nested-return-sibling-loop regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99303">Bug 99303</a> - [REGRESSION][BISECTED] DMs are crashing on start with &quot;radeon&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99314">Bug 99314</a> - [g33] glsl regressions</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99339">Bug 99339</a> - Blender line rendering broken after removing XY clipping of lines</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99354">Bug 99354</a> - [G71] &quot;Assertion `bkref' failed&quot; reproducible with glmark2</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99389">Bug 99389</a> - Mesa build broken: sid_tables.h</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99391">Bug 99391</a> - [ILK,G45,G965] piglit regressions</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99401">Bug 99401</a> - [g33] regression: piglit.spec.!opengl 1_0.gl-1_0-beginend-coverage</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99419">Bug 99419</a> - Crash(Segmentation fault) si_shader_select in Master Of Orion</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99450">Bug 99450</a> - [amdgpu] Payday 2 visual glitches on some models</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99451">Bug 99451</a> - polygon offset use after free</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99456">Bug 99456</a> - Firefox crashing when opening about:support with WebGL2 enabled</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99631">Bug 99631</a> - segfault with OSVRTrackerView and openscenegraph git master</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99633">Bug 99633</a> - rasterizer/core/clip.h:279:49: error: const struct API_STATE has no member named linkageCount</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99637">Bug 99637</a> - VLC video has corrupted colors when using VDPAU output on Radeon SI</li>
</ul>
<h2>Changes</h2>
<ul>
<li>Building RADV requires --enable-gallium-llvm</li>
<li>The vulkan headers vk_platform.h and vulkan.h are no longer installed</li>
<li>The configure options --with-sha1 and --disable-shader-cache are
removed alongside their respective library requirements</li>
</ul>
</div>
</body>
</html>

221
docs/relnotes/17.0.1.html Normal file
View File

@@ -0,0 +1,221 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 17.0.1 Release Notes / March 4, 2017</h1>
<p>
Mesa 17.0.1 is a bug fix release which fixes bugs found since the 17.0.0 release.
</p>
<p>
Mesa 17.0.1 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
e819bd3e515dac26faf9836d8f27a4ddf05323b9b23afb6c06536d4ac82e2743 mesa-17.0.1.tar.gz
96fd70ef5f31d276a17e424e7e1bb79447ccbbe822b56844213ef932e7ad1b0c mesa-17.0.1.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=98869">Bug 98869</a> - Electronic Super Joy graphic artefacts (regression,bisected)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99532">Bug 99532</a> - Compute shader doesn't give right result under some circumstances</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99677">Bug 99677</a> - heap-use-after-free in glsl</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99692">Bug 99692</a> - [radv] Mostly broken on Hawaii PRO/CIK ASICs</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99850">Bug 99850</a> - Tessellation bug on Carrizo</li>
</ul>
<h2>Changes</h2>
<p>Bas Nieuwenhuizen (4):</p>
<ul>
<li>radv: Never try to create more than max_sets descriptor sets.</li>
<li>radv: Reset emitted compute pipeline when calling secondary cmd buffer.</li>
<li>radv: Only use PKT3_OCCLUSION_QUERY when it doesn't hang.</li>
<li>radv: Use correct size for availability flag.</li>
</ul>
<p>Ben Crocker (3):</p>
<ul>
<li>gallivm: Reenable PPC VSX (v3)</li>
<li>gallivm: Improve debug output (V2)</li>
<li>gallivm: Override getHostCPUName() "generic" w/ "pwr8" (v4)</li>
</ul>
<p>Brendan King (1):</p>
<ul>
<li>egl/dri3: implement query surface hook</li>
</ul>
<p>Christian Gmeiner (2):</p>
<ul>
<li>etnaviv: move pctx initialisation to avoid a null dereference</li>
<li>etnaviv: remove number of pixel pipes validation</li>
</ul>
<p>Connor Abbott (1):</p>
<ul>
<li>anv: fix Get*MemoryRequirements for !LLC</li>
</ul>
<p>Daniel Stone (1):</p>
<ul>
<li>egl/wayland: Don't use DRM format codes for SHM</li>
</ul>
<p>Dave Airlie (6):</p>
<ul>
<li>tgsi: fix memory leak in tgsi sanity check</li>
<li>radv: change base aligmment for allocated memory.</li>
<li>radv: fix cik macroModeIndex.</li>
<li>radv: adopt some init config workarounds from radeonsi.</li>
<li>radv: fix depth format in blit2d.</li>
<li>radv: fix txs for sampler buffers</li>
</ul>
<p>Emil Velikov (8):</p>
<ul>
<li>docs: add sha256 checksums for 17.0.0</li>
<li>bin/get-extra-pick-list: use git merge-base to get the branchpoint</li>
<li>bin/get-extra-pick-list: rework to use already_picked list</li>
<li>bin/get-typod-pick-list.sh: limit `git grep ...' to only as needed</li>
<li>bin/get-pick-list.sh: limit `git grep ...' only as needed</li>
<li>bin/get-pick-list.sh: remove ancient way of nominating patches</li>
<li>bin/get-fixes-pick-list.sh: add new script</li>
<li>Update version to 17.0.1</li>
</ul>
<p>Eric Anholt (1):</p>
<ul>
<li>vc4: Avoid emitting small immediates for UBO indirect load address guards.</li>
</ul>
<p>Grazvydas Ignotas (3):</p>
<ul>
<li>r300g: only allow byteswapped formats on big endian</li>
<li>gallium/u_queue: fix a crash with atexit handlers</li>
<li>gallium/u_queue: set num_threads correctly if not all threads start</li>
</ul>
<p>Hans de Goede (1):</p>
<ul>
<li>glx/glvnd: Fix GLXdispatchIndex sorting</li>
</ul>
<p>Ilia Mirkin (4):</p>
<ul>
<li>gm107/ir: fix address offset bitfield for ATOMS</li>
<li>nvc0: set the render condition in the compute object</li>
<li>st/mesa: don't pass compare mode for stencil-sampled textures</li>
<li>nvc0: disable linked tsc mode in compute launch descriptor</li>
</ul>
<p>Jason Ekstrand (10):</p>
<ul>
<li>i965/sampler_state: Clamp min/max LOD to 14 on gen7+</li>
<li>i965/sampler_state: Pass texObj into update_sampler_state</li>
<li>i965/sampler_state: Set the "Base Mip Level" field on Sandy Bridge</li>
<li>intel/blorp: Swizzle clear colors on the CPU</li>
<li>i965/fs: Fix the inline nir_op_pack_double optimization</li>
<li>anv: Add an invalidate_range helper</li>
<li>anv/query: clflush the bo map on non-LLC platforms</li>
<li>genxml: Make MI_STORE_DATA_IMM more consistent</li>
<li>anv/query: Perform CmdResetQueryPool on the GPU</li>
<li>intel/blorp: Explicitly flush all allocated state</li>
</ul>
<p>Jose Maria Casanova Crespo (1):</p>
<ul>
<li>glsl: non-last member unsized array on SSBO must fail compilation on GLSL ES 3.1</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>mesa: Do (TCS &amp;&amp; !TES) draw time validation in ES as well.</li>
</ul>
<p>Leo Liu (1):</p>
<ul>
<li>configure.ac: check require_basic_egl only if egl enabled</li>
</ul>
<p>Lionel Landwerlin (2):</p>
<ul>
<li>anv: wsi: report presentation error per image request</li>
<li>i965/fs: fix uninitialized memory access</li>
</ul>
<p>Marek Olšák (6):</p>
<ul>
<li>radeonsi: fix UNSIGNED_BYTE index buffer fallback with non-zero start (v2)</li>
<li>gallium/util: remove unused u_index_modify helpers</li>
<li>gallium/u_index_modify: don't add PIPE_TRANSFER_UNSYNCHRONIZED unconditionally</li>
<li>gallium/u_queue: fix random crashes when the app calls exit()</li>
<li>radeonsi: fix broken tessellation on Carrizo and Stoney</li>
<li>amd/common: fix ASICREV_IS_POLARIS11_M for Polaris12</li>
</ul>
<p>Mauro Rossi (2):</p>
<ul>
<li>android: radeonsi: fix sid_table.h generated header include path</li>
<li>android: glsl: build shader cache sources</li>
</ul>
<p>Michel Dänzer (1):</p>
<ul>
<li>configure.ac: Drop LLVM compiler flags more radically</li>
</ul>
<p>Nicolai Hähnle (3):</p>
<ul>
<li>winsys/amdgpu: reduce max_alloc_size based on GTT limits</li>
<li>radeonsi: handle MultiDrawIndirect in si_get_draw_start_count</li>
<li>radeonsi: fix UINT/SINT clamping for 10-bit formats on &lt;= CIK</li>
</ul>
<p>Samuel Iglesias Gonsálvez (1):</p>
<ul>
<li>glsl: fix heap-use-after-free in ast_declarator_list::hir()</li>
</ul>
<p>Tapani Pälli (1):</p>
<ul>
<li>android: fix droid_create_image_from_prime_fd_yuv for YV12</li>
</ul>
</div>
</body>
</html>

185
docs/relnotes/17.0.2.html Normal file
View File

@@ -0,0 +1,185 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 17.0.2 Release Notes / March 20, 2017</h1>
<p>
Mesa 17.0.2 is a bug fix release which fixes bugs found since the 17.0.1 release.
</p>
<p>
Mesa 17.0.2 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
2e0f41e7974ba7a36ca32bbeaf8ebcd65c8fd4d2dc9872f04d4becbd5e7a8cb5 mesa-17.0.2.tar.gz
f8f191f909e01e65de38d5bdea5fb057f21649a3aed20948be02348e77a689d4 mesa-17.0.2.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=68504">Bug 68504</a> - 9.2-rc1 workaround for clover build failure on ppc/altivec: cannot convert 'bool' to '__vector(4) __bool int' in return</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=97988">Bug 97988</a> - [radeonsi] playing back videos with VDPAU exhibits deinterlacing/anti-aliasing issues not visible with VA-API</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99484">Bug 99484</a> - Crusader Kings 2 - Loading bars, siege bars, morale bars, etc. do not render correctly</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99715">Bug 99715</a> - Don't print: &quot;Note: Buggy applications may crash, if they do please report to vendor&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100049">Bug 100049</a> - &quot;ralloc: Make sure ralloc() allocations match malloc()'s alignment.&quot; causes seg fault in 32bit build</li>
</ul>
<h2>Changes</h2>
<p>Alex Smith (3):</p>
<ul>
<li>radv: Emit pending flushes before executing a secondary command buffer</li>
<li>radv: Flush before copying with PKT3_WRITE_DATA in CmdUpdateBuffer</li>
<li>radv/ac: Fix shared memory offset calculation</li>
</ul>
<p>Bas Nieuwenhuizen (3):</p>
<ul>
<li>radv: Disable HTILE for textures with multiple layers/levels.</li>
<li>radv: Emit cache flushes before CP DMA.</li>
<li>Revert "radv: Emit cache flushes before CP DMA."</li>
</ul>
<p>Dave Airlie (3):</p>
<ul>
<li>radv: drop Z24 support.</li>
<li>radv: disable mip point pre clamping.</li>
<li>radv: setup llvm target data layout</li>
</ul>
<p>Emil Velikov (4):</p>
<ul>
<li>docs: add sha256 checksums for 17.0.1</li>
<li>cherry-ignore: add the swizzle blorp_clear fix</li>
<li>i965: move brw_define.h ifndef guard to the top</li>
<li>Update version to 17.0.2</li>
</ul>
<p>Fredrik Höglund (2):</p>
<ul>
<li>radv: fix the dynamic buffer index in vkCmdBindDescriptorSets</li>
<li>radv/ac: fix multiple descriptor sets with dynamic buffers</li>
</ul>
<p>Gregory Hainaut (1):</p>
<ul>
<li>glapi: fix typo in count_scale</li>
</ul>
<p>Ilia Mirkin (2):</p>
<ul>
<li>nvc0: take extra pushbuf space into account for pushbuf_space calls</li>
<li>nvc0: increase alignment to 256 for texture buffers on fermi</li>
</ul>
<p>Jacob Lifshay (1):</p>
<ul>
<li>vulkan/wsi: Improve the DRI3 error message</li>
</ul>
<p>James Legg (1):</p>
<ul>
<li>radv: Fix using more than 4 bound descriptor sets</li>
</ul>
<p>Jason Ekstrand (7):</p>
<ul>
<li>anv/blorp/clear_subpass: Only set surface clear color for fast clears</li>
<li>anv: Accurately advertise dynamic descriptor limits</li>
<li>anv: Stall before fast-clear operations</li>
<li>anv: Properly handle destroying NULL devices and instances</li>
<li>anv/blorp: Turn off AUX after doing a CCS_D resolve</li>
<li>anv/blorp: Only set a clear color for resolves if fast-cleared</li>
<li>nir/intrinsics: Make load_barycentric_input take a 2-component coor</li>
</ul>
<p>Jonas Pfeil (1):</p>
<ul>
<li>ralloc: Make sure ralloc() allocations match malloc()'s alignment.</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>egl: Ensure ResetNotificationStrategy matches for shared contexts.</li>
</ul>
<p>Marek Olšák (3):</p>
<ul>
<li>st/mesa: reset sample_mask, min_sample, and render_condition for PBO ops</li>
<li>st/mesa: set blend state for PBO readbacks</li>
<li>radeonsi: mark all bound shader buffer ranges as initialized</li>
</ul>
<p>Matt Turner (1):</p>
<ul>
<li>clover: Work around build failure with AltiVec.</li>
</ul>
<p>Nanley Chery (2):</p>
<ul>
<li>anv/pass: Avoid accessing attachment array out of bounds</li>
<li>anv/image: Remove extra dependency on HiZ-specific variable</li>
</ul>
<p>Nicolai Hähnle (2):</p>
<ul>
<li>st/glsl_to_tgsi: avoid iterating past the head of the instruction list</li>
<li>st/mesa: inform the driver of framebuffer changes before compute dispatches</li>
</ul>
<p>Robert Foss (1):</p>
<ul>
<li>mesa: Avoid read of uninitialized variable</li>
</ul>
<p>Samuel Iglesias Gonsálvez (5):</p>
<ul>
<li>i965/fs: mark last DF uniform array element as 64 bit live one</li>
<li>i965/fs: detect different bit size accesses to uniforms to push them in proper locations</li>
<li>i965/fs: fix indirect load DF uniforms on BSW/BXT</li>
<li>i965/fs: fix source type when emitting MOV_INDIRECT to read ICP handles</li>
<li>i965/fs: emit MOV_INDIRECT with the source with the right register type</li>
</ul>
<p>Samuel Pitoiset (1):</p>
<ul>
<li>radeonsi: disable sinking common instructions down to the end block</li>
</ul>
</div>
</body>
</html>

189
docs/relnotes/17.0.3.html Normal file
View File

@@ -0,0 +1,189 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 17.0.3 Release Notes / April 1, 2017</h1>
<p>
Mesa 17.0.3 is a bug fix release which fixes bugs found since the 17.0.2 release.
</p>
<p>
Mesa 17.0.3 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
8253edf1bdd7b14ab63d5982349143a5c9ac3767f39a63257cc9d7e7d92f60f1 mesa-17.0.3.tar.gz
ca646f5075a002d60ef9123c8a4331cede155c01712ef945a65c59a5e69fe7ed mesa-17.0.3.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=96743">Bug 96743</a> - [BYT, HSW, SKL, BXT, KBL] GPU hangs with GfxBench 4.0 CarChase</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99246">Bug 99246</a> - [d3dadapter+radeonsi &amp; bisect] EVE-Online : hang on wormhole sight</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100061">Bug 100061</a> - LODQ instruction generated with invalid dst mask</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100182">Bug 100182</a> - Flickering in The Talos Principle on Sky Lake GT4.</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100201">Bug 100201</a> - Windows scons build with MSVC toolchain and LLVM 4.0 fails</li>
</ul>
<h2>Changes</h2>
<p>Alex Deucher (1):</p>
<ul>
<li>radeonsi: add new polaris12 pci id</li>
</ul>
<p>Andres Gomez (5):</p>
<ul>
<li>glsl: on UBO/SSBOs link error reset the number of active blocks to 0</li>
<li>cherry-ignore: add the Invalidate L2 for TRANSFER_WRITE barriers fix</li>
<li>cherry-ignore: add the Flush after unmap in gbm/dri fix</li>
<li>cherry-ignore: corrected typo in the Flush after unmap in gbm/dri fix</li>
<li>Update version to 17.0.3</li>
</ul>
<p>Axel Davy (2):</p>
<ul>
<li>st/nine: Resolve deadlock in surface/volume dtors when using csmt</li>
<li>st/nine: Use atomics for available_texture_mem</li>
</ul>
<p>Bas Nieuwenhuizen (1):</p>
<ul>
<li>radv: flush DB cache before and after HTILE decompress.</li>
</ul>
<p>Dave Airlie (1):</p>
<ul>
<li>radv: fix primitive reset index emission</li>
</ul>
<p>Emil Velikov (1):</p>
<ul>
<li>docs: add sha256 checksums for 17.0.2</li>
</ul>
<p>Ilia Mirkin (1):</p>
<ul>
<li>st/mesa: set result writemask based on ir type</li>
</ul>
<p>Jan Vesely (1):</p>
<ul>
<li>clover: use pipe_resource references</li>
</ul>
<p>Jason Ekstrand (9):</p>
<ul>
<li>anv/query: Invalidate the correct range</li>
<li>anv/GetQueryPoolResults: Actually implement the spec</li>
<li>anv/image: Return early when unbinding an image</li>
<li>anv/query: Fix the location of timestamp availability</li>
<li>anv: Make anv_get_layerCount a macro</li>
<li>anv/blorp: Use anv_get_layerCount everywhere</li>
<li>anv/cmd_buffer: Apply flush operations prior to executing secondaries</li>
<li>anv/cmd_buffer: Fix bad indentation</li>
<li>anv: Flush caches prior to PIPELINE_SELECT on all gens</li>
</ul>
<p>José Fonseca (1):</p>
<ul>
<li>c11/threads: Include thr/xtimec.h for xtime definition when building with MSVC.</li>
</ul>
<p>Juan A. Suarez Romero (1):</p>
<ul>
<li>tests/cache_test: allow crossing mount points</li>
</ul>
<p>Karol Herbst (1):</p>
<ul>
<li>nvc0/ir: treat FMA like MAD for operand propagation</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>i965: Fall back to GL 4.2/4.3 on Haswell if the kernel isn't new enough.</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>radeonsi: don't hang on shader compile failure</li>
</ul>
<p>Matt Turner (1):</p>
<ul>
<li>i965/fs: Don't emit SEL instructions for type-converting MOVs.</li>
</ul>
<p>Nanley Chery (1):</p>
<ul>
<li>intel: Correct the BDW surface state size</li>
</ul>
<p>Nicolai Hähnle (1):</p>
<ul>
<li>mesa/main: fix MultiDrawElements[BaseVertex] validation of primcount</li>
</ul>
<p>Rob Clark (1):</p>
<ul>
<li>freedreno: fix memory leak</li>
</ul>
<p>Tim Rowley (1):</p>
<ul>
<li>swr: [rasterizer jitter] fix llvm &gt;= 5.0 build break</li>
</ul>
<p>Timothy Arceri (2):</p>
<ul>
<li>glsl: fix lower jumps for returns when loop is inside an if</li>
<li>mesa: update lower_jumps tests after bug fix</li>
</ul>
<p>Topi Pohjolainen (1):</p>
<ul>
<li>i965/gen8+: Do full stall when switching pipeline</li>
</ul>
<p>Xu Randy (2):</p>
<ul>
<li>anv/blorp: Fix a crash in CmdClearColorImage</li>
<li>anv/genX: Solve the vkCreateGraphicsPipelines crash</li>
</ul>
</div>
</body>
</html>

81
docs/relnotes/17.1.0.html Normal file
View File

@@ -0,0 +1,81 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 17.1.0 Release Notes / TBD</h1>
<p>
Mesa 17.1.0 is a new development release.
People who are concerned with stability and reliability should stick
with a previous release or wait for Mesa 17.1.1.
</p>
<p>
Mesa 17.1.0 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
TBD.
</pre>
<h2>New features</h2>
<p>
Note: some of the new features are only available with certain drivers.
</p>
<ul>
<li>OpenGL 4.2 on i965/ivb</li>
<li>GL_ARB_gpu_shader_fp64 on i965/ivybridge</li>
<li>GL_ARB_gpu_shader_int64 on i965/gen8+, nvc0, radeonsi, softpipe, llvmpipe</li>
<li>GL_ARB_shader_ballot on nvc0, radeonsi</li>
<li>GL_ARB_shader_clock on nv50, nvc0, radeonsi</li>
<li>GL_ARB_shader_group_vote on radeonsi</li>
<li>GL_ARB_shader_precision on i965/ivb</li>
<li>GL_ARB_shader_viewport_layer_array on radeonsi</li>
<li>GL_ARB_sparse_buffer on radeonsi/CIK+</li>
<li>GL_ARB_transform_feedback2 on i965/gen6</li>
<li>GL_ARB_transform_feedback_overflow_query on i965/gen6+</li>
<li>GL_ARB_vertex_attrib_64bit on i965/ivb</li>
<li>GL_NV_fill_rectangle on nvc0</li>
<li>Geometry shaders enabled on swr</li>
</ul>
<h2>Bug fixes</h2>
<ul>
</ul>
<h2>Changes</h2>
<ul>
<li>Removed the ilo gallium driver.</li>
<li>The configure option --enable-gallium-llvm is superseded by --enable-llvm.</li>
<li>The swr driver now requires LLVM &gt;= 3.9.0 and a C++14 capable compiler.</li>
<li>The radeonsi driver now requires LLVM 3.8.0.</li>
<li>The MESA_GLSL=opt and MESA_GLSL=no_opt environment vars have been removed.</li>
<li>The --with-egl-platforms configure option is deprecated. Use --with-platforms instead.</li>
</ul>
</div>
</body>
</html>

View File

@@ -57,7 +57,7 @@ copy texturing).
<li>New Intel i965 DRI driver
<li>New <code>minstall</code> script to replace normal install program
<li>Faster fragment program execution in software
<li>Added (or fixed) support for <a href="http://www.opengl.org/registry/specs/SGI/make_current_read.txt">
<li>Added (or fixed) support for <a href="https://www.khronos.org/registry/OpenGL/extensions/SGI/GLX_SGI_make_current_read.txt">
GLX_SGI_make_current_read</a> to the following drivers:
<ul>
<li>radeon</li>

View File

@@ -226,7 +226,7 @@ did not exist in the 7.10 release series at all.</p>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=36086">Bug 36086</a> - [wine] Segfault r300_resource_copy_region with some wine apps and RADEON_HYPERZ</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=36182">Bug 36182</a> - Game Trine from http://www.humblebundle.com/ needs ATI_draw_buffers</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=36182">Bug 36182</a> - Game Trine from https://www.humblebundle.com/ needs ATI_draw_buffers</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=36268">Bug 36268</a> - [r300g, bisected] minor flickering in Unigine Sanctuary</li>

View File

@@ -21,7 +21,7 @@ Mesa 7.5.1 is a bug-fix release fixing issues found since the 7.5 release.
</p>
<p>
The main new feature of Mesa 7.5.x is the
<a href="http://wiki.freedesktop.org/wiki/Software/gallium">Gallium3D</a> infrastructure.
<a href="https://www.freedesktop.org/wiki/Software/gallium">Gallium3D</a> infrastructure.
</p>
<p>
Mesa 7.5.1 implements the OpenGL 2.1 API, but the version reported by

View File

@@ -21,7 +21,7 @@ Mesa 7.5.2 is a bug-fix release fixing issues found since the 7.5.1 release.
</p>
<p>
The main new feature of Mesa 7.5.x is the
<a href="http://wiki.freedesktop.org/wiki/Software/gallium">Gallium3D</a> infrastructure.
<a href="https://www.freedesktop.org/wiki/Software/gallium">Gallium3D</a> infrastructure.
</p>
<p>
Mesa 7.5.2 implements the OpenGL 2.1 API, but the version reported by

View File

@@ -23,7 +23,7 @@ with the 7.4.x branch or wait for Mesa 7.5.1.
</p>
<p>
The main new feature of Mesa 7.5 is the
<a href="http://wiki.freedesktop.org/wiki/Software/gallium">Gallium3D</a> infrastructure.
<a href="https://www.freedesktop.org/wiki/Software/gallium">Gallium3D</a> infrastructure.
</p>
<p>
Mesa 7.5 implements the OpenGL 2.1 API, but the version reported by

View File

@@ -90,7 +90,7 @@ The two supported build methods are now autoconf/automake and SCons.
<li>Removed support for GL_ARB_shadow_ambient extension</li>
<li>Removed Gallium3D - nvfx driver (use nv30 instead)</li>
<li>
libGLU has been moved into its own repository, found at <a href="http://cgit.freedesktop.org/mesa/glu/">http://cgit.freedesktop.org/mesa/glu/</a>
libGLU has been moved into its own repository, found at <a href="https://cgit.freedesktop.org/mesa/glu/">https://cgit.freedesktop.org/mesa/glu/</a>
</li>
</ul>

View File

@@ -68,9 +68,9 @@ b1ae5a4d9255953980bc9254f5323420 MesaLib-9.1.2.zip
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=62434">Bug 62434</a> - [bisected] 3284.073] (EE) AIGLX error: dlopen of /usr/lib/xorg/modules/dri/r600_dri.so failed (/usr/lib/libllvmradeon9.2.0.so: undefined symbol: lp_build_tgsi_intrinsic)</li>
<li><a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=349437">Debian bug #349437</a> - mesa - FTBFS: error: 'IEEE_ONE' undeclared</li>
<li><a href="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=349437">Debian bug #349437</a> - mesa - FTBFS: error: 'IEEE_ONE' undeclared</li>
<li><a href="http://bugzilla.redhat.com/show_bug.cgi?id=918661">Redhat bug #918661</a> - crash in routine Avogadro UI manipulation</li>
<li><a href="https://bugzilla.redhat.com/show_bug.cgi?id=918661">Redhat bug #918661</a> - crash in routine Avogadro UI manipulation</li>
</ul>

View File

@@ -17,13 +17,13 @@
<h1>Code Repository</h1>
<p>
Mesa uses <a href="http://git-scm.com">git</a>
Mesa uses <a href="https://git-scm.com">git</a>
as its source code management system.
</p>
<p>
The master git repository is hosted on
<a href="http://www.freedesktop.org">freedesktop.org</a>.
<a href="https://www.freedesktop.org">freedesktop.org</a>.
</p>
<p>
@@ -35,9 +35,9 @@ You may access the repository either as an
<p>
You may also
<a href="http://cgit.freedesktop.org/mesa/mesa/"
<a href="https://cgit.freedesktop.org/mesa/mesa/"
>browse the main Mesa git repository</a> and the
<a href="http://cgit.freedesktop.org/mesa/demos"
<a href="https://cgit.freedesktop.org/mesa/demos"
>Mesa demos and tests git repository</a>.
</p>
@@ -73,9 +73,10 @@ follow this procedure:
</p>
<ol>
<li>Subscribe to the
<a href="http://lists.freedesktop.org/mailman/listinfo/mesa-dev">mesa-dev</a>
<a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev">mesa-dev</a>
mailing list.
<li>Start contributing to the project by posting patches / review requests to
<li>Start contributing to the project by
<a href="submittingpatches.html" target="_parent">submitting patches</a> to
the mesa-dev list. Specifically,
<ul>
<li>Use <code>git send-mail</code> to post your patches to mesa-dev.
@@ -91,7 +92,7 @@ only if they're being supervised by another Mesa developer at the same
organization and planning to work in a limited area of the code or on a
separate branch.
<li>To apply for an account, follow
<a href="http://www.freedesktop.org/wiki/AccountRequests">these directions</a>.
<a href="https://www.freedesktop.org/wiki/AccountRequests">these directions</a>.
It's also appreciated if you briefly describe what you intend to do (work
on a particular driver, add a new extension, etc.) in the bugzilla record.
</ol>
@@ -120,7 +121,7 @@ Once your account is established:
<h2>Windows Users</h2>
<p>
If you're <a href="http://git.wiki.kernel.org/index.php/WindowsInstall">
If you're <a href="https://git.wiki.kernel.org/index.php/WindowsInstall">
using git on Windows</a> you'll want to enable automatic CR/LF conversion in
your local copy of the repository:
</p>
@@ -143,7 +144,7 @@ Unix users don't need to set this option.
<p>
At any given time, there may be several active branches in Mesa's
repository.
Generally, the trunk contains the latest development (unstable)
Generally, <tt>master</tt> contains the latest development (unstable)
code while a branch has the latest stable code.
</p>
@@ -234,7 +235,7 @@ If you want the rebase action to be the default action, then
git config --global branch.autosetuprebase=always
</pre>
<p>
See <a href="http://www.eecs.harvard.edu/~cduan/technical/git/">Understanding Git Conceptually</a> for a fairly clear explanation about all of this.
See <a href="https://www.eecs.harvard.edu/~cduan/technical/git/">Understanding Git Conceptually</a> for a fairly clear explanation about all of this.
</p>
</ol>

View File

@@ -18,7 +18,7 @@
<p>
This page describes the features and status of Mesa's support for the
<a href="http://opengl.org/documentation/glsl/">
<a href="https://opengl.org/documentation/glsl/">
OpenGL Shading Language</a>.
</p>
@@ -49,8 +49,7 @@ execution. These are generally used for debugging.
<li><b>log</b> - log all GLSL shaders to files.
The filenames will be "shader_X.vert" or "shader_X.frag" where X
the shader ID.
<li><b>nopt</b> - disable compiler optimizations
<li><b>opt</b> - force compiler optimizations
<li><b>cache_info</b> - print debug information about shader cache
<li><b>uniform</b> - print message to stdout when glUniform is called
<li><b>nopvert</b> - force vertex shaders to be a simple shader that just transforms
the vertex position with ftransform() and passes through the color and
@@ -172,7 +171,7 @@ This tool is useful for:
</ul>
<p>
After building Mesa, the compiler can be found at src/glsl/glsl_compiler
After building Mesa, the compiler can be found at src/compiler/glsl/glsl_compiler
</p>
<p>
@@ -180,7 +179,7 @@ Here's an example of using the compiler to compile a vertex shader and
emit GL_ARB_vertex_program-style instructions:
</p>
<pre>
src/glsl/glsl_compiler --dump-ast myshader.vert
src/compiler/glsl/glsl_compiler --version XXX --dump-ast myshader.vert
</pre>
Options include
@@ -188,7 +187,11 @@ Options include
<li><b>--dump-ast</b> - dump GPU code
<li><b>--dump-hir</b> - dump high-level IR code
<li><b>--dump-lir</b> - dump low-level IR code
<li><b>--link</b> - ???
<li><b>--dump-builder</b> - dump GLSL IR code
<li><b>--link</b> - link shaders
<li><b>--just-log</b> - display only shader / linker info if exist,
without any header or separator
<li><b>--version</b> - [Mandatory] define the GLSL version to use
</ul>
@@ -196,7 +199,7 @@ Options include
<p>
The source code for Mesa's shading language compiler is in the
<code>src/glsl/</code> directory.
<code>src/compiler/glsl/</code> directory.
</p>
<p>
@@ -217,7 +220,7 @@ regressions.
</p>
<p>
The <a href="http://piglit.freedesktop.org/">Piglit</a> project
The <a href="https://piglit.freedesktop.org/">Piglit</a> project
has many GLSL tests.
</p>

View File

@@ -31,7 +31,7 @@ the <code>doxygen</code> directory and run <code>make</code>.
<p>
For an example of Doxygen usage in Mesa, see a recent source file
such as <a href="http://cgit.freedesktop.org/mesa/mesa/tree/src/mesa/main/bufferobj.c">bufferobj.c</a>.
such as <a href="https://cgit.freedesktop.org/mesa/mesa/tree/src/mesa/main/bufferobj.c">bufferobj.c</a>.
</p>
@@ -41,6 +41,11 @@ run the doxygen scripts, you can read the documentation
<a href="../doxygen/main/index.html">here</a>
</p>
<p>
Gallium is also documented using Sphinx. The generated output can be found
<a href="https://gallium.readthedocs.io">on Gallium.ReadTheDocs.io</a>.
</p>
</div>
</body>
</html>

View File

@@ -27,14 +27,18 @@ each directory.
<li><b>include</b> - Public OpenGL header files
<li><b>src</b>
<ul>
<li><b>compiler</b> - Common utility sources for different compilers.
<ul>
<li><b>glsl</b> - the GLSL IR and compiler
<li><b>nir</b> - the NIR IR and compiler
<li><b>spirv</b> - the SPIR-V compiler
</ul>
<li><b>egl</b> - EGL library sources
<ul>
<li><b>docs</b> - EGL documentation
<li><b>drivers</b> - EGL drivers
<li><b>main</b> - main EGL library implementation. This is where all
the EGL API functions are implemented, like eglCreateContext().
</ul>
<li><b>glsl</b> - the GLSL compiler
<li><b>mapi</b> - Mesa APIs
<li><b>glapi</b> - OpenGL API dispatch layer. This is where all the
GL entrypoints like glClear, glBegin, etc. are generated, as well as
@@ -94,7 +98,8 @@ each directory.
<ul>
<li><b>i915</b> - Driver for Intel i915/i945.
<li><b>llvmpipe</b> - Software driver using LLVM for runtime code generation.
<li><b>nv*</b> - Drivers for NVIDIA GPUs.
<li><b>nouveau</b> - Driver for NVIDIA GPUs.
<li><b>radeon</b> - Shared module for the r600 and radeonsi drivers.
<li><b>radeonsi</b> - Driver for AMD Southern Island.
<li><b>r300</b> - Driver for ATI R300 - R500.
<li><b>r600</b> - Driver for ATI/AMD R600 - Northern Island.
@@ -128,16 +133,19 @@ each directory.
to another.
<li><b>util</b> - assorted utilities for arithmetic, hashing, surface
creation, memory management, 2D blitting, simple rendering, etc.
<li>XXX more
</ul>
<li><b>state_trackers</b> -
<ul>
<li><b>clover</b> - OpenCL state tracker
<li><b>dri</b> - Meta state tracker for DRI drivers
<li><b>glx</b> - Meta state tracker for GLX
<li><b>vdpau</b> - VDPAU state tracker
<li><b>wgl</b> -
<li><b>xorg</b> - Meta state tracker for Xorg video drivers
<li><b>wgl</b> - Windows WGL state tracker
<li><b>xa</b> - XA state tracker
<li><b>xvmc</b> - XvMC state tracker
<li><b>vdpau</b> - VDPAU state tracker
<li><b>va</b> - VA-API state tracker
<li><b>omx</b> - OpenMAX state tracker
</ul>
<li><b>winsys</b> -
<ul>
@@ -148,11 +156,11 @@ each directory.
</ul>
</ul>
<ul>
<li><b>glx</b> - The GLX library code for building libGL. This is used for
direct rendering drivers. It will dynamically load one of the
xxx_dri.so drivers.
<li><b>glx</b> - The GLX library code for building libGL using DRI drivers.
</ul>
<li><b>lib</b> - where the GL libraries are placed
<li><b>lib</b> - hardlinks to most binaries as produced by <strong>make</strong>.
These (shortcuts) are used for development purposes in conjunction with
LD_LIBRARY_PATH and/or LIBGL_DRIVERS_PATH.
</ul>
</div>

View File

@@ -0,0 +1,98 @@
Name
MESA_drm_image_formats
Name Strings
EGL_MESA_drm_image_formats
Contributors
Nicolai Hähnle <Nicolai.Haehnle@amd.com>
Qiang Yu <Qiang.Yu@amd.com>
Contact
Nicolai Hähnle <Nicolai.Haehnle@amd.com>
Status
Proposal
Version
Version 1, January 26, 2017
Number
EGL Extension #??
Dependencies
This extension requires the EGL_MESA_drm_image extension.
This extension is written against the wording of EGL_MESA_drm_image
specification.
Overview
This extension extends the functionality of EGL_MESA_drm_image by adding
additional formats required by Glamor for use with DRM buffers.
IP Status
Open-source; freely implementable.
New Procedures and Functions
None
New Tokens
Accepted as values for the EGL_IMAGE_FORMAT_MESA attribute:
EGL_DRM_BUFFER_FORMAT_ARGB2101010_MESA 0x3290
EGL_DRM_BUFFER_FORMAT_ARGB1555_MESA 0x3291
EGL_DRM_BUFFER_FORMAT_RGB565_MESA 0x3292
Additions to the EGL_MESA_drm_image Specification:
Remove the sentence "The only format specified ..." from the paragraph
describing eglCreateDRMImageMESA and add the following paragraph:
The formats specified for use with EGL_DRM_BUFFER_FORMAT_MESA are:
* EGL_DRM_BUFFER_FORMAT_ARGB32_MESA, where each pixel is a CPU-endian
32-bit quantity, with alpha in the upper 8 bits, then red, then green,
then blue,
* EGL_DRM_BUFFER_FORMAT_ARGB2101010_MESA, where each pixel is a CPU-
endian, 32-bit quantity, with alpha in the most significant 2 bits,
followed by 10 bits each for red, green, and blue,
* EGL_DRM_BUFFER_FORMAT_ARGB1555_MESA, where each pixel is a CPU-endian
16-bit quantity, with alpha in the most significant bit, followed by
5 bits each for red, green, and blue, and
* EGL_DRM_BUFFER_FORMAT_RGB565_MESA, where each pixel is a CPU-endian
16-bit quantity, with red in the 5 most significant bits, followed by
6 bits of green and 5 bits of blue.
Issues
1. Should we expose the full set of channel permutations for the formats,
e.g. ABGR2101010, RGBA1010102, and BGRA1010102 in addition to
ARGB2101010?
RESOLVED: No.
DISCUSSION: The original extension sets a precedent of only exposing one
of the possible permutations of 8-bit channel formats. It is also not
clear where the additional permutations would be used. For example,
Glamor has a fixed mapping from pixmap/screen depth to format that
doesn't allow for the other permutations.
Revision History
Version 1, January, 2017
Initial draft (Nicolai Hähnle)

View File

@@ -20,11 +20,11 @@ Status
Version
Version 2, July 7, 2016
Version 3, March 31, 2017
Number
TBD
OpenGL Extension #495
Dependencies
@@ -34,7 +34,7 @@ Dependencies
This extension is written against Version 1.50 (Revision 09) of the OpenGL
Shading Language Specification.
GLSL 1.30 is required.
GLSL 1.30 (OpenGL) or GLSL ES 3.00 (OpenGL ES) is required.
This extension interacts with ARB_gpu_shader5.
@@ -51,9 +51,10 @@ Overview
calculations).
This extension provides a set of new features to the OpenGL Shading
Language to support capabilities of these GPUs, extending the capabilities
of version 1.30 of the OpenGL Shading Language. Shaders
using the new functionality provided by this extension should enable this
Language to support capabilities of these GPUs, extending the
capabilities of version 1.30 of the OpenGL Shading Language and version
3.00 of the OpenGL ES Shading Language. Shaders using the new
functionality provided by this extension should enable this
functionality via the construct
#extension GL_MESA_shader_integer_functions : require (or enable)
@@ -516,5 +517,6 @@ Revision History
Rev. Date Author Changes
---- ----------- -------- -----------------------------------------
3 31-Mar-2017 Jon Leech Add ES support (OpenGL-Registry/issues/3)
2 7-Jul-2016 idr Fix typo in #extension line
1 20-Jun-2016 idr Initial version based on GL_ARB_gpu_shader5.

View File

@@ -76,9 +76,9 @@ Overview
References:
http://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=3;t=011557
http://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=3;t=000516
http://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=3;t=011903
https://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=3;t=011557
https://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=3;t=000516
https://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=3;t=011903
http://www.delphi3d.net/articles/viewarticle.php?article=terraintex.htm
New Procedures and Functions

View File

@@ -75,6 +75,7 @@ New Tokens
EGL_TEXTURE_Y_U_V_WL 0x31D7
EGL_TEXTURE_Y_UV_WL 0x31D8
EGL_TEXTURE_Y_XUXV_WL 0x31D9
EGL_TEXTURE_EXTERNAL_WL 0x31DA
Accepted in the <attribute> parameter of eglQueryWaylandBufferWL:
@@ -148,6 +149,10 @@ Additions to the EGL 1.4 Specification:
Two planes, samples Y from the first plane to r in
the shader, U and V from the second plane to g and a.
EGL_TEXTURE_EXTERNAL_WL
Treated as a single plane texture, but sampled with
samplerExternalOES according to OES_EGL_image_external
After querying the wl_buffer layout, create EGLImages for the
planes by calling eglCreateImageKHR with wl_buffer as
EGLClientBuffer, EGL_WAYLAND_BUFFER_WL as the target, NULL

View File

@@ -1,10 +1,10 @@
The definitive source for enum values and reserved ranges are the XML files in
the Khronos registry:
https://cvs.khronos.org/svn/repos/ogl/trunk/doc/registry/public/api/egl.xml
https://cvs.khronos.org/svn/repos/ogl/trunk/doc/registry/public/api/gl.xml
https://cvs.khronos.org/svn/repos/ogl/trunk/doc/registry/public/api/glx.xml
https://cvs.khronos.org/svn/repos/ogl/trunk/doc/registry/public/api/wgl.xml
https://github.com/KhronosGroup/EGL-Registry/blob/master/api/egl.xml
https://github.com/KhronosGroup/OpenGL-Registry/blob/master/xml/gl.xml
https://github.com/KhronosGroup/OpenGL-Registry/blob/master/xml/glx.xml
https://github.com/KhronosGroup/OpenGL-Registry/blob/master/xml/wgl.xml
GL blocks allocated to Mesa:
0x8750-0x875F
@@ -76,6 +76,11 @@ EGL_MESA_platform_gbm
EGL_MESA_platform_surfaceless
EGL_PLATFORM_SURFACELESS_MESA 0x31DD
EGL_MESA_drm_image
EGL_DRM_BUFFER_FORMAT_ARGB2101010_MESA 0x3290
EGL_DRM_BUFFER_FORMAT_ARGB1555_MESA 0x3291
EGL_DRM_BUFFER_FORMAT_RGB565_MESA 0x3292
EGL_WL_bind_wayland_display
EGL_TEXTURE_FORMAT 0x3080
EGL_WAYLAND_BUFFER_WL 0x31D5

374
docs/submittingpatches.html Normal file
View File

@@ -0,0 +1,374 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Submitting patches</title>
<link rel="stylesheet" type="text/css" href="mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="contents.html"></iframe>
<div class="content">
<h1>Submitting patches</h1>
<ul>
<li><a href="#guidelines">Basic guidelines</a>
<li><a href="#formatting">Patch formatting</a>
<li><a href="#testing">Testing Patches</a>
<li><a href="#mailing">Mailing Patches</a>
<li><a href="#reviewing">Reviewing Patches</a>
<li><a href="#nominations">Nominating a commit for a stable branch</a>
<li><a href="#criteria">Criteria for accepting patches to the stable branch</a>
<li><a href="#backports">Sending backports for the stable branch</a>
<li><a href="#gittips">Git tips</a>
</ul>
<h2 id="guidelines">Basic guidelines</h2>
<ul>
<li>Patches should not mix code changes with code formatting changes (except,
perhaps, in very trivial cases.)
<li>Code patches should follow Mesa
<a href="codingstyle.html" target="_parent">coding conventions</a>.
<li>Whenever possible, patches should only effect individual Mesa/Gallium
components.
<li>Patches should never introduce build breaks and should be bisectable (see
<code>git bisect</code>.)
<li>Patches should be properly <a href="#formatting">formatted</a>.
<li>Patches should be sufficiently <a href="#testing">tested</a> before submitting.
<li>Patches should be submitted to <a href="#mailing">mesa-dev</a>
for <a href="#reviewing">review</a> using <code>git send-email</code>.
</ul>
<h2 id="formatting">Patch formatting</h2>
<ul>
<li>Lines should be limited to 75 characters or less so that git logs
displayed in 80-column terminals avoid line wrapping. Note that git
log uses 4 spaces of indentation (4 + 75 &lt; 80).
<li>The first line should be a short, concise summary of the change prefixed
with a module name. Examples:
<pre>
mesa: Add support for querying GL_VERTEX_ATTRIB_ARRAY_LONG
gallium: add PIPE_CAP_DEVICE_RESET_STATUS_QUERY
i965: Fix missing type in local variable declaration.
</pre>
<li>Subsequent patch comments should describe the change in more detail,
if needed. For example:
<pre>
i965: Remove end-of-thread SEND alignment code.
This was present in Eric's initial implementation of the compaction code
for Sandybridge (commit 077d01b6). There is no documentation saying this
is necessary, and removing it causes no regressions in piglit on any
platform.
</pre>
<li>A "Signed-off-by:" line is not required, but not discouraged either.
<li>If a patch addresses a bugzilla issue, that should be noted in the
patch comment. For example:
<pre>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89689
</pre>
<li>If a patch addresses a issue introduced with earlier commit, that should be
noted in the patch comment. For example:
<pre>
Fixes: d7b3707c612 "util/disk_cache: use stat() to check if entry is a directory"
</pre>
<li>If there have been several revisions to a patch during the review
process, they should be noted such as in this example:
<pre>
st/mesa: add ARB_texture_stencil8 support (v4)
if we support stencil texturing, enable texture_stencil8
there is no requirement to support native S8 for this,
the texture can be converted to x24s8 fine.
v2: fold fixes from Marek in:
a) put S8 last in the list
b) fix renderable to always test for d/s renderable
fixup the texture case to use a stencil only format
for picking the format for the texture view.
v3: hit fallback for getteximage
v4: put s8 back in front, it shouldn't get picked now (Ilia)
</pre>
<li>If someone tested your patch, document it with a line like this:
<pre>
Tested-by: Joe Hacker &lt;jhacker@foo.com&gt;
</pre>
<li>If the patch was reviewed (usually the case) or acked by someone,
that should be documented with:
<pre>
Reviewed-by: Joe Hacker &lt;jhacker@foo.com&gt;
Acked-by: Joe Hacker &lt;jhacker@foo.com&gt;
</pre>
<li>If sending later revision of a patch, add all the tags - ack, r-b,
Cc: mesa-stable and/or other. This provides reviewers with quick feedback if the
patch has already been reviewed.
<li>In order for your patch to reach the prospective reviewer easier/faster,
use the script scripts/get_reviewer.pl to get a list of individuals and include
them in the CC list.
<br>
Please use common sense and do <strong>not</strong> blindly add everyone.
<br>
<pre>
$ scripts/get_reviewer.pl --help # to get the help screen
$ scripts/get_reviewer.pl -f src/egl/drivers/dri2/platform_android.c
Rob Herring <robh@kernel.org> (reviewer:ANDROID EGL SUPPORT,added_lines:188/700=27%,removed_lines:58/283=20%)
Tomasz Figa <tfiga@chromium.org> (reviewer:ANDROID EGL SUPPORT,authored:12/41=29%,added_lines:308/700=44%,removed_lines:115/283=41%)
Emil Velikov <emil.l.velikov@gmail.com> (authored:13/41=32%,removed_lines:76/283=27%)
</pre>
</ul>
<h2 id="testing">Testing Patches</h2>
<p>
It should go without saying that patches must be tested. In general,
do whatever testing is prudent.
</p>
<p>
You should always run the Mesa test suite before submitting patches.
The test suite can be run using the 'make check' command. All tests
must pass before patches will be accepted, this may mean you have
to update the tests themselves.
</p>
<p>
Whenever possible and applicable, test the patch with
<a href="https://piglit.freedesktop.org">Piglit</a> and/or
<a href="https://android.googlesource.com/platform/external/deqp/">dEQP</a>
to check for regressions.
</p>
<h2 id="mailing">Mailing Patches</h2>
<p>
Patches should be sent to the mesa-dev mailing list for review:
<a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev">
mesa-dev@lists.freedesktop.org</a>.
When submitting a patch make sure to use
<a href="https://git-scm.com/docs/git-send-email">git send-email</a>
rather than attaching patches to emails. Sending patches as
attachments prevents people from being able to provide in-line review
comments.
</p>
<p>
When submitting follow-up patches you can use --in-reply-to to make v2, v3,
etc patches show up as replies to the originals. This usually works well
when you're sending out updates to individual patches (as opposed to
re-sending the whole series). Using --in-reply-to makes
it harder for reviewers to accidentally review old patches.
</p>
<p>
When submitting follow-up patches you should also login to
<a href="https://patchwork.freedesktop.org">patchwork</a> and change the
state of your old patches to Superseded.
</p>
<p>
Some companies' mail server automatically append a legal disclaimer,
usually containing something along the lines of "The information in this
email is confidential" and "distribution is strictly prohibited".<br/>
These legal notices prevent us from being able to accept your patch,
rendering the whole process pointless. Please make sure these are
disabled before sending your patches. (Note that you may need to contact
your email administrator for this.)
</p>
<h2 id="reviewing">Reviewing Patches</h2>
<p>
When you've reviewed a patch on the mailing list, please be unambiguous
about your review. That is, state either
</p>
<pre>
Reviewed-by: Joe Hacker &lt;jhacker@foo.com&gt;
</pre>
or
<pre>
Acked-by: Joe Hacker &lt;jhacker@foo.com&gt;
</pre>
<p>
Rather than saying just "LGTM" or "Seems OK".
</p>
<p>
If small changes are suggested, it's OK to say something like:
</p>
<pre>
With the above fixes, Reviewed-by: Joe Hacker &lt;jhacker@foo.com&gt;
</pre>
<p>
which tells the patch author that the patch can be committed, as long
as the issues are resolved first.
</p>
<h2 id="nominations">Nominating a commit for a stable branch</h2>
<p>
There are three ways to nominate a patch for inclusion in the stable branch and
release.
</p>
<ul>
<li> By adding the Cc: mesa-stable@ tag as described below.
<li> Sending the commit ID (as seen in master branch) to the mesa-stable@ mailing list.
<li> Forwarding the patch from the mesa-dev@ mailing list.
</li>
</ul>
<p>
Note: resending patch identical to one on mesa-dev@ or one that differs only
by the extra mesa-stable@ tag is <strong>not</strong> recommended.
</p>
<h3 id="thetag">The stable tag</h3>
<p>
If you want a commit to be applied to a stable branch,
you should add an appropriate note to the commit message.
</p>
<p>
Here are some examples of such a note:
</p>
<ul>
<li>CC: &lt;mesa-stable@lists.freedesktop.org&gt;</li>
</ul>
Simply adding the CC to the mesa-stable list address is adequate to nominate
the commit for all the active stable branches. If the commit is not applicable
for said branch the stable-release manager will reply stating so.
This "CC" syntax for patch nomination will cause patches to automatically be
copied to the mesa-stable@ mailing list when you use "git send-email" to send
patches to the mesa-dev@ mailing list. If you prefer using --suppress-cc that
won't have any negative effect on the patch nomination.
<p>
Note: by removing the tag [as the commit is pushed] the patch is
<strong>explicitly</strong> rejected from inclusion in the stable branch(es).
<br>
Thus, drop the line <strong>only</strong> if you want to cancel the nomination.
</p>
Alternatively, if one uses the "Fixes" tag as described in the "Patch formatting"
section, it nominates a commit for all active stable branches that include the
commit that is referred to.
<h2 id="criteria">Criteria for accepting patches to the stable branch</h2>
Mesa has a designated release manager for each stable branch, and the release
manager is the only developer that should be pushing changes to these branches.
Everyone else should nominate patches using the mechanism described above.
The following rules define which patches are accepted and which are not. The
stable-release manager is also given broad discretion in rejecting patches
that have been nominated.
<ul>
<li>Patch must conform with the <a href="#guidelines">Basic guidelines</a></li>
<li>Patch must have landed in master first. In case where the original
patch is too large and/or otherwise contradicts with the rules set within, a
backport is appropriate.</li>
<li>It must not introduce a regression - be that build or runtime wise.
Note: If the regression is due to faulty piglit/dEQP/CTS/other test the
latter must be fixed first. A reference to the offending test(s) and
respective fix(es) should be provided in the nominated patch.</li>
<li>Patch cannot be larger than 100 lines.</li>
<li>Patches that move code around with no functional change should be
rejected.</li>
<li>Patch must be a bug fix and not a new feature.
Note: An exception to this rule, are hardware-enabling "features". For
example, <a href="#backports">backports</a> of new code to support a
newly-developed hardware product can be accepted if they can be reasonably
determined not to have effects on other hardware.</li>
<li>Patch must be reviewed, For example, the commit message has Reviewed-by,
Signed-off-by, or Tested-by tags from someone but the author.</li>
<li>Performance patches are considered only if they provide information
about the hardware, program in question and observed improvement. Use numbers
to represent your measurements.</li>
</ul>
If the patch complies with the rules it will be
<a href="releasing.html#pickntest">cherry-picked</a>. Alternatively the release
manager will reply to the patch in question stating why the patch has been
rejected or would request a backport.
A summary of all the picked/rejected patches will be presented in the
<a href="releasing.html#prerelease">pre-release</a> announcement.
The stable-release manager may at times need to force-push changes to the
stable branches, for example, to drop a previously-picked patch that was later
identified as causing a regression). These force-pushes may cause changes to
be lost from the stable branch if developers push things directly. Consider
yourself warned.
<h2 id="backports">Sending backports for the stable branch</h2>
By default merge conflicts are resolved by the stable-release manager. In which
case he/she should provide a comment about the changes required, alongside the
<code>Conflicts</code> section. Summary of which will be provided in the
<a href="releasing.html#prerelease">pre-release</a> announcement.
<br>
Developers are interested in sending backports are recommended to use either a
<code>[BACKPORT #branch]</code> subject prefix or provides similar information
within the commit summary.
<h2 id="gittips">Git tips</h2>
<ul>
<li><code>git rebase -i ...</code> is your friend. Don't be afraid to use it.
<li>Apply a fixup to commit FOO.
<pre>
git add ...
git commit --fixup=FOO
git rebase -i --autosquash ...
</pre>
<li>Test for build breakage between patches e.g last 8 commits.
<pre>
git rebase -i --exec="make -j4" HEAD~8
</pre>
<li>Sets the default mailing address for your repo.
<pre>
git config --local sendemail.to mesa-dev@lists.freedesktop.org
</pre>
<li> Add version to subject line of patch series in this case for the last 8
commits before sending.
<pre>
git send-email --subject-prefix="PATCH v4" HEAD~8
git send-email -v4 @~8 # shorter version, inherited from git format-patch
</pre>
<li> Configure git to use the get_reviewer.pl script interactively. Thus you
can avoid adding the world to the CC list.
<pre>
git config sendemail.cccmd "./scripts/get_reviewer.pl -i"
</pre>
</ul>
</div>
</body>
</html>

View File

@@ -36,10 +36,10 @@ Hardware drivers include:
<li>Intel i965, i945, i915.
See <a href="https://01.org/linuxgraphics">Intel's website</a></li>
<li>AMD Radeon series.
See <a href="http://www.x.org/wiki/RadeonFeature">RadeonFeature</a></li>
See <a href="https://www.x.org/wiki/RadeonFeature">RadeonFeature</a></li>
<li>NVIDIA GPUs.
See <a href="http://nouveau.freedesktop.org">Nouveau Wiki</a></li>
<li><a href="http://www.x.org/wiki/vmware">VMware virtual GPU</a></li>
See <a href="https://nouveau.freedesktop.org">Nouveau Wiki</a></li>
<li><a href="https://www.x.org/wiki/vmware">VMware virtual GPU</a></li>
</ul>
<p>
@@ -57,7 +57,7 @@ Additional driver information:
</p>
<ul>
<li><a href="http://dri.freedesktop.org/"> DRI hardware
<li><a href="https://dri.freedesktop.org/"> DRI hardware
drivers</a> for the X Window System
<li><a href="xlibdriver.html">Xlib / swrast driver</a> for the X Window System
and Unix-like operating systems

View File

@@ -24,7 +24,7 @@ This list is far from complete and somewhat dated, unfortunately.
<ul>
<li>Early Mesa development was done while Brian was part of the
<a href="http://www.ssec.wisc.edu/~billh/vis.html">
<a href="https://www.ssec.wisc.edu/~billh/vis.html">
SSEC Visualization Project</a> at the University of
Wisconsin. He'd like to thank Bill Hibbard for letting him work on
Mesa as part of that project.
@@ -40,14 +40,9 @@ Tungsten Graphics, Inc. have supported the ongoing development of Mesa.
<br>
<br>
<li>The
<a href="http://www.mesa3d.org">Mesa</a>
website is hosted by
<a href="http://sourceforge.net">sourceforge.net</a>.
<br>
<br>
<li>The Mesa git repository is hosted by
<a href="http://freedesktop.org/">freedesktop.org</a>.
<a href="https://www.mesa3d.org">Mesa</a>
website and git repository are hosted by
<a href="https://freedesktop.org/">freedesktop.org</a>.
<br>
<br>

View File

@@ -17,11 +17,11 @@
<h1>Development Utilities</h1>
<dl>
<dt><a href="http://cgit.freedesktop.org/mesa/demos">Mesa demos collection</a></dt>
<dt><a href="https://cgit.freedesktop.org/mesa/demos">Mesa demos collection</a></dt>
<dd>includes several utility routines in the <code>src/util/</code>
directory.</dd>
<dt><a href="http://piglit.freedesktop.org">Piglit</a></dt>
<dt><a href="https://piglit.freedesktop.org">Piglit</a></dt>
<dd>is an open-source test suite for OpenGL implementations.</dd>
<dt><a href="https://github.com/apitrace/apitrace">ApiTrace</a></dt>
@@ -31,7 +31,7 @@
<dd>is a very useful tool for tracking down
memory-related problems in your code.</dd>
<dt><a href="http://scan.coverity.com/projects/mesa">Coverity</a><dt>
<dt><a href="https://scan.coverity.com/projects/mesa">Coverity</a><dt>
<dd>provides static code analysis of Mesa. If you create an account
you can see the results and try to fix outstanding issues.</dd>
</dl>

View File

@@ -18,7 +18,7 @@
<p>
This page lists known issues with
<a href="http://www.spec.org/gwpg/gpc.static/vp11info.html" target="_main">SPEC Viewperf 11</a>
<a href="https://www.spec.org/gwpg/gpc.static/vp11info.html" target="_main">SPEC Viewperf 11</a>
and <a href="https://www.spec.org/gwpg/gpc.static/vp12info.html" target="_main">SPEC Viewperf 12</a>
when running on Mesa-based drivers.
</p>
@@ -66,10 +66,10 @@ either in Viewperf or the Mesa driver.
<p>
These tests use features of the
<a href="http://www.opengl.org/registry/specs/NV/fragment_program2.txt"
<a href="https://www.opengl.org/registry/specs/NV/fragment_program2.txt"
target="_main">
GL_NV_fragment_program2</a> and
<a href="http://www.opengl.org/registry/specs/NV/vertex_program3.txt"
<a href="https://www.opengl.org/registry/specs/NV/vertex_program3.txt"
target="_main">
GL_NV_vertex_program3</a> extensions without checking if the driver supports
them.
@@ -86,7 +86,7 @@ Subsequent drawing calls become no-ops and the rendering is incorrect.
<p>
These tests depend on the
<a href="http://www.opengl.org/registry/specs/NV/primitive_restart.txt"
<a href="https://www.opengl.org/registry/specs/NV/primitive_restart.txt"
target="_main">GL_NV_primitive_restart</a> extension.
</p>

View File

@@ -18,7 +18,7 @@
<p>
This page describes how to build, install and use the
<a href="http://www.vmware.com/">VMware</a> guest GL driver
<a href="https://www.vmware.com/">VMware</a> guest GL driver
(aka the SVGA or SVGA3D driver) for Linux using the latest source code.
This driver gives a Linux virtual machine access to the host's GPU for
hardware-accelerated 3D.
@@ -62,9 +62,9 @@ these instructions explain what to do.
For more information about the X components see these wiki pages at x.org:
</p>
<ul>
<li><a href="http://wiki.x.org/wiki/vmware">
<li><a href="https://wiki.x.org/wiki/vmware">
Driver Overview</a>
<li><a href="http://wiki.x.org/wiki/vmware/vmware3D">
<li><a href="https://wiki.x.org/wiki/vmware/vmware3D">
xf86-video-vmware Details</a>
</ul>
@@ -82,8 +82,8 @@ The components involved in this include:
<p>
All of these components reside in the guest Linux virtual machine.
On the host, all you're doing is running VMware
<a href="http://www.vmware.com/products/workstation/">Workstation</a> or
<a href="http://www.vmware.com/products/fusion/">Fusion</a>.
<a href="https://www.vmware.com/products/workstation/">Workstation</a> or
<a href="https://www.vmware.com/products/fusion/">Fusion</a>.
</p>

View File

@@ -171,9 +171,8 @@ drawn with glDrawPixels.
</p>
<p>
For more information about gamma correction see:
<a href="http://www.inforamp.net/~poynton/notes/colour_and_gamma/GammaFAQ.html">
the Gamma FAQ</a>
For more information about gamma correction, see the
<a href="https://en.wikipedia.org/wiki/Gamma_correction">Wikipedia article</a>
</p>

View File

@@ -6,7 +6,7 @@ extern "C" {
#endif
/*
** Copyright (c) 2013-2014 The Khronos Group Inc.
** Copyright (c) 2013-2017 The Khronos Group Inc.
**
** Permission is hereby granted, free of charge, to any person obtaining a
** copy of this software and/or associated documentation files (the
@@ -31,14 +31,14 @@ extern "C" {
** This header is generated from the Khronos OpenGL / OpenGL ES XML
** API Registry. The current version of the Registry, generator scripts
** used to make the header, and the header can be found at
** http://www.opengl.org/registry/
** http://www.opengl.org/registry/egl
**
** Khronos $Revision: 31039 $ on $Date: 2015-05-04 17:01:57 -0700 (Mon, 04 May 2015) $
** Khronos $Revision$ on $Date$
*/
#include <EGL/eglplatform.h>
/* Generated on date 20150504 */
/* Generated on date 20161230 */
/* Generated C header for:
* API: egl
@@ -78,7 +78,7 @@ typedef void (*__eglMustCastToProperFunctionPointerType)(void);
#define EGL_CONFIG_ID 0x3028
#define EGL_CORE_NATIVE_ENGINE 0x305B
#define EGL_DEPTH_SIZE 0x3025
#define EGL_DONT_CARE ((EGLint)-1)
#define EGL_DONT_CARE EGL_CAST(EGLint,-1)
#define EGL_DRAW 0x3059
#define EGL_EXTENSIONS 0x3055
#define EGL_FALSE 0
@@ -95,9 +95,9 @@ typedef void (*__eglMustCastToProperFunctionPointerType)(void);
#define EGL_NONE 0x3038
#define EGL_NON_CONFORMANT_CONFIG 0x3051
#define EGL_NOT_INITIALIZED 0x3001
#define EGL_NO_CONTEXT ((EGLContext)0)
#define EGL_NO_DISPLAY ((EGLDisplay)0)
#define EGL_NO_SURFACE ((EGLSurface)0)
#define EGL_NO_CONTEXT EGL_CAST(EGLContext,0)
#define EGL_NO_DISPLAY EGL_CAST(EGLDisplay,0)
#define EGL_NO_SURFACE EGL_CAST(EGLSurface,0)
#define EGL_PBUFFER_BIT 0x0001
#define EGL_PIXMAP_BIT 0x0002
#define EGL_READ 0x305A
@@ -197,7 +197,7 @@ typedef void *EGLClientBuffer;
#define EGL_RGB_BUFFER 0x308E
#define EGL_SINGLE_BUFFER 0x3085
#define EGL_SWAP_BEHAVIOR 0x3093
#define EGL_UNKNOWN ((EGLint)-1)
#define EGL_UNKNOWN EGL_CAST(EGLint,-1)
#define EGL_VERTICAL_RESOLUTION 0x3091
EGLAPI EGLBoolean EGLAPIENTRY eglBindAPI (EGLenum api);
EGLAPI EGLenum EGLAPIENTRY eglQueryAPI (void);
@@ -224,7 +224,7 @@ EGLAPI EGLBoolean EGLAPIENTRY eglWaitClient (void);
#ifndef EGL_VERSION_1_4
#define EGL_VERSION_1_4 1
#define EGL_DEFAULT_DISPLAY ((EGLNativeDisplayType)0)
#define EGL_DEFAULT_DISPLAY EGL_CAST(EGLNativeDisplayType,0)
#define EGL_MULTISAMPLE_RESOLVE_BOX_BIT 0x0200
#define EGL_MULTISAMPLE_RESOLVE 0x3099
#define EGL_MULTISAMPLE_RESOLVE_DEFAULT 0x309A
@@ -266,7 +266,7 @@ typedef void *EGLImage;
#define EGL_FOREVER 0xFFFFFFFFFFFFFFFFull
#define EGL_TIMEOUT_EXPIRED 0x30F5
#define EGL_CONDITION_SATISFIED 0x30F6
#define EGL_NO_SYNC ((EGLSync)0)
#define EGL_NO_SYNC EGL_CAST(EGLSync,0)
#define EGL_SYNC_FENCE 0x30F9
#define EGL_GL_COLORSPACE 0x309D
#define EGL_GL_COLORSPACE_SRGB 0x3089
@@ -283,7 +283,7 @@ typedef void *EGLImage;
#define EGL_GL_TEXTURE_CUBE_MAP_POSITIVE_Z 0x30B7
#define EGL_GL_TEXTURE_CUBE_MAP_NEGATIVE_Z 0x30B8
#define EGL_IMAGE_PRESERVED 0x30D2
#define EGL_NO_IMAGE ((EGLImage)0)
#define EGL_NO_IMAGE EGL_CAST(EGLImage,0)
EGLAPI EGLSync EGLAPIENTRY eglCreateSync (EGLDisplay dpy, EGLenum type, const EGLAttrib *attrib_list);
EGLAPI EGLBoolean EGLAPIENTRY eglDestroySync (EGLDisplay dpy, EGLSync sync);
EGLAPI EGLint EGLAPIENTRY eglClientWaitSync (EGLDisplay dpy, EGLSync sync, EGLint flags, EGLTime timeout);

View File

@@ -6,7 +6,7 @@ extern "C" {
#endif
/*
** Copyright (c) 2013-2016 The Khronos Group Inc.
** Copyright (c) 2013-2017 The Khronos Group Inc.
**
** Permission is hereby granted, free of charge, to any person obtaining a
** copy of this software and/or associated documentation files (the
@@ -31,14 +31,14 @@ extern "C" {
** This header is generated from the Khronos OpenGL / OpenGL ES XML
** API Registry. The current version of the Registry, generator scripts
** used to make the header, and the header can be found at
** http://www.opengl.org/registry/
** http://www.opengl.org/registry/egl
**
** Khronos $Revision$ on $Date$
*/
#include <EGL/eglplatform.h>
#define EGL_EGLEXT_VERSION 20160809
#define EGL_EGLEXT_VERSION 20161230
/* Generated C header for:
* API: egl
@@ -77,6 +77,13 @@ EGLAPI EGLSyncKHR EGLAPIENTRY eglCreateSync64KHR (EGLDisplay dpy, EGLenum type,
#define EGL_VG_ALPHA_FORMAT_PRE_BIT_KHR 0x0040
#endif /* EGL_KHR_config_attribs */
#ifndef EGL_KHR_context_flush_control
#define EGL_KHR_context_flush_control 1
#define EGL_CONTEXT_RELEASE_BEHAVIOR_NONE_KHR 0
#define EGL_CONTEXT_RELEASE_BEHAVIOR_KHR 0x2097
#define EGL_CONTEXT_RELEASE_BEHAVIOR_FLUSH_KHR 0x2098
#endif /* EGL_KHR_context_flush_control */
#ifndef EGL_KHR_create_context
#define EGL_KHR_create_context 1
#define EGL_CONTEXT_MAJOR_VERSION_KHR 0x3098
@@ -188,7 +195,7 @@ EGLAPI EGLBoolean EGLAPIENTRY eglGetSyncAttribKHR (EGLDisplay dpy, EGLSyncKHR sy
#define EGL_KHR_image 1
typedef void *EGLImageKHR;
#define EGL_NATIVE_PIXMAP_KHR 0x30B0
#define EGL_NO_IMAGE_KHR ((EGLImageKHR)0)
#define EGL_NO_IMAGE_KHR EGL_CAST(EGLImageKHR,0)
typedef EGLImageKHR (EGLAPIENTRYP PFNEGLCREATEIMAGEKHRPROC) (EGLDisplay dpy, EGLContext ctx, EGLenum target, EGLClientBuffer buffer, const EGLint *attrib_list);
typedef EGLBoolean (EGLAPIENTRYP PFNEGLDESTROYIMAGEKHRPROC) (EGLDisplay dpy, EGLImageKHR image);
#ifdef EGL_EGLEXT_PROTOTYPES
@@ -257,7 +264,7 @@ EGLAPI EGLBoolean EGLAPIENTRY eglQuerySurface64KHR (EGLDisplay dpy, EGLSurface s
#ifndef EGL_KHR_no_config_context
#define EGL_KHR_no_config_context 1
#define EGL_NO_CONFIG_KHR ((EGLConfig)0)
#define EGL_NO_CONFIG_KHR EGL_CAST(EGLConfig,0)
#endif /* EGL_KHR_no_config_context */
#ifndef EGL_KHR_partial_update
@@ -302,7 +309,7 @@ EGLAPI EGLBoolean EGLAPIENTRY eglSetDamageRegionKHR (EGLDisplay dpy, EGLSurface
#define EGL_SYNC_REUSABLE_KHR 0x30FA
#define EGL_SYNC_FLUSH_COMMANDS_BIT_KHR 0x0001
#define EGL_FOREVER_KHR 0xFFFFFFFFFFFFFFFFull
#define EGL_NO_SYNC_KHR ((EGLSyncKHR)0)
#define EGL_NO_SYNC_KHR EGL_CAST(EGLSyncKHR,0)
typedef EGLBoolean (EGLAPIENTRYP PFNEGLSIGNALSYNCKHRPROC) (EGLDisplay dpy, EGLSyncKHR sync, EGLenum mode);
#ifdef EGL_EGLEXT_PROTOTYPES
EGLAPI EGLBoolean EGLAPIENTRY eglSignalSyncKHR (EGLDisplay dpy, EGLSyncKHR sync, EGLenum mode);
@@ -315,7 +322,7 @@ EGLAPI EGLBoolean EGLAPIENTRY eglSignalSyncKHR (EGLDisplay dpy, EGLSyncKHR sync,
typedef void *EGLStreamKHR;
typedef khronos_uint64_t EGLuint64KHR;
#ifdef KHRONOS_SUPPORT_INT64
#define EGL_NO_STREAM_KHR ((EGLStreamKHR)0)
#define EGL_NO_STREAM_KHR EGL_CAST(EGLStreamKHR,0)
#define EGL_CONSUMER_LATENCY_USEC_KHR 0x3210
#define EGL_PRODUCER_FRAME_KHR 0x3212
#define EGL_CONSUMER_FRAME_KHR 0x3213
@@ -343,6 +350,24 @@ EGLAPI EGLBoolean EGLAPIENTRY eglQueryStreamu64KHR (EGLDisplay dpy, EGLStreamKHR
#endif /* KHRONOS_SUPPORT_INT64 */
#endif /* EGL_KHR_stream */
#ifndef EGL_KHR_stream_attrib
#define EGL_KHR_stream_attrib 1
#ifdef KHRONOS_SUPPORT_INT64
typedef EGLStreamKHR (EGLAPIENTRYP PFNEGLCREATESTREAMATTRIBKHRPROC) (EGLDisplay dpy, const EGLAttrib *attrib_list);
typedef EGLBoolean (EGLAPIENTRYP PFNEGLSETSTREAMATTRIBKHRPROC) (EGLDisplay dpy, EGLStreamKHR stream, EGLenum attribute, EGLAttrib value);
typedef EGLBoolean (EGLAPIENTRYP PFNEGLQUERYSTREAMATTRIBKHRPROC) (EGLDisplay dpy, EGLStreamKHR stream, EGLenum attribute, EGLAttrib *value);
typedef EGLBoolean (EGLAPIENTRYP PFNEGLSTREAMCONSUMERACQUIREATTRIBKHRPROC) (EGLDisplay dpy, EGLStreamKHR stream, const EGLAttrib *attrib_list);
typedef EGLBoolean (EGLAPIENTRYP PFNEGLSTREAMCONSUMERRELEASEATTRIBKHRPROC) (EGLDisplay dpy, EGLStreamKHR stream, const EGLAttrib *attrib_list);
#ifdef EGL_EGLEXT_PROTOTYPES
EGLAPI EGLStreamKHR EGLAPIENTRY eglCreateStreamAttribKHR (EGLDisplay dpy, const EGLAttrib *attrib_list);
EGLAPI EGLBoolean EGLAPIENTRY eglSetStreamAttribKHR (EGLDisplay dpy, EGLStreamKHR stream, EGLenum attribute, EGLAttrib value);
EGLAPI EGLBoolean EGLAPIENTRY eglQueryStreamAttribKHR (EGLDisplay dpy, EGLStreamKHR stream, EGLenum attribute, EGLAttrib *value);
EGLAPI EGLBoolean EGLAPIENTRY eglStreamConsumerAcquireAttribKHR (EGLDisplay dpy, EGLStreamKHR stream, const EGLAttrib *attrib_list);
EGLAPI EGLBoolean EGLAPIENTRY eglStreamConsumerReleaseAttribKHR (EGLDisplay dpy, EGLStreamKHR stream, const EGLAttrib *attrib_list);
#endif
#endif /* KHRONOS_SUPPORT_INT64 */
#endif /* EGL_KHR_stream_attrib */
#ifndef EGL_KHR_stream_consumer_gltexture
#define EGL_KHR_stream_consumer_gltexture 1
#ifdef EGL_KHR_stream
@@ -362,7 +387,7 @@ EGLAPI EGLBoolean EGLAPIENTRY eglStreamConsumerReleaseKHR (EGLDisplay dpy, EGLSt
#define EGL_KHR_stream_cross_process_fd 1
typedef int EGLNativeFileDescriptorKHR;
#ifdef EGL_KHR_stream
#define EGL_NO_FILE_DESCRIPTOR_KHR ((EGLNativeFileDescriptorKHR)(-1))
#define EGL_NO_FILE_DESCRIPTOR_KHR EGL_CAST(EGLNativeFileDescriptorKHR,-1)
typedef EGLNativeFileDescriptorKHR (EGLAPIENTRYP PFNEGLGETSTREAMFILEDESCRIPTORKHRPROC) (EGLDisplay dpy, EGLStreamKHR stream);
typedef EGLStreamKHR (EGLAPIENTRYP PFNEGLCREATESTREAMFROMFILEDESCRIPTORKHRPROC) (EGLDisplay dpy, EGLNativeFileDescriptorKHR file_descriptor);
#ifdef EGL_EGLEXT_PROTOTYPES
@@ -520,6 +545,11 @@ EGLAPI EGLBoolean EGLAPIENTRY eglQuerySurfacePointerANGLE (EGLDisplay dpy, EGLSu
#define EGL_FIXED_SIZE_ANGLE 0x3201
#endif /* EGL_ANGLE_window_fixed_size */
#ifndef EGL_ARM_implicit_external_sync
#define EGL_ARM_implicit_external_sync 1
#define EGL_SYNC_PRIOR_COMMANDS_IMPLICIT_EXTERNAL_ARM 0x328A
#endif /* EGL_ARM_implicit_external_sync */
#ifndef EGL_ARM_pixmap_multisample_discard
#define EGL_ARM_pixmap_multisample_discard 1
#define EGL_DISCARD_SAMPLES_ARM 0x3286
@@ -545,7 +575,7 @@ EGLAPI EGLBoolean EGLAPIENTRY eglQuerySurfacePointerANGLE (EGLDisplay dpy, EGLSu
#ifndef EGL_EXT_device_base
#define EGL_EXT_device_base 1
typedef void *EGLDeviceEXT;
#define EGL_NO_DEVICE_EXT ((EGLDeviceEXT)(0))
#define EGL_NO_DEVICE_EXT EGL_CAST(EGLDeviceEXT,0)
#define EGL_BAD_DEVICE_EXT 0x322B
#define EGL_DEVICE_EXT 0x322C
typedef EGLBoolean (EGLAPIENTRYP PFNEGLQUERYDEVICEATTRIBEXTPROC) (EGLDeviceEXT device, EGLint attribute, EGLAttrib *value);
@@ -578,6 +608,21 @@ EGLAPI EGLBoolean EGLAPIENTRY eglQueryDisplayAttribEXT (EGLDisplay dpy, EGLint a
#define EGL_EXT_device_query 1
#endif /* EGL_EXT_device_query */
#ifndef EGL_EXT_gl_colorspace_bt2020_linear
#define EGL_EXT_gl_colorspace_bt2020_linear 1
#define EGL_GL_COLORSPACE_BT2020_LINEAR_EXT 0x333F
#endif /* EGL_EXT_gl_colorspace_bt2020_linear */
#ifndef EGL_EXT_gl_colorspace_bt2020_pq
#define EGL_EXT_gl_colorspace_bt2020_pq 1
#define EGL_GL_COLORSPACE_BT2020_PQ_EXT 0x3340
#endif /* EGL_EXT_gl_colorspace_bt2020_pq */
#ifndef EGL_EXT_gl_colorspace_scrgb_linear
#define EGL_EXT_gl_colorspace_scrgb_linear 1
#define EGL_GL_COLORSPACE_SCRGB_LINEAR_EXT 0x3350
#endif /* EGL_EXT_gl_colorspace_scrgb_linear */
#ifndef EGL_EXT_image_dma_buf_import
#define EGL_EXT_image_dma_buf_import 1
#define EGL_LINUX_DMA_BUF_EXT 0x3270
@@ -604,6 +649,27 @@ EGLAPI EGLBoolean EGLAPIENTRY eglQueryDisplayAttribEXT (EGLDisplay dpy, EGLint a
#define EGL_YUV_CHROMA_SITING_0_5_EXT 0x3285
#endif /* EGL_EXT_image_dma_buf_import */
#ifndef EGL_EXT_image_dma_buf_import_modifiers
#define EGL_EXT_image_dma_buf_import_modifiers 1
#define EGL_DMA_BUF_PLANE3_FD_EXT 0x3440
#define EGL_DMA_BUF_PLANE3_OFFSET_EXT 0x3441
#define EGL_DMA_BUF_PLANE3_PITCH_EXT 0x3442
#define EGL_DMA_BUF_PLANE0_MODIFIER_LO_EXT 0x3443
#define EGL_DMA_BUF_PLANE0_MODIFIER_HI_EXT 0x3444
#define EGL_DMA_BUF_PLANE1_MODIFIER_LO_EXT 0x3445
#define EGL_DMA_BUF_PLANE1_MODIFIER_HI_EXT 0x3446
#define EGL_DMA_BUF_PLANE2_MODIFIER_LO_EXT 0x3447
#define EGL_DMA_BUF_PLANE2_MODIFIER_HI_EXT 0x3448
#define EGL_DMA_BUF_PLANE3_MODIFIER_LO_EXT 0x3449
#define EGL_DMA_BUF_PLANE3_MODIFIER_HI_EXT 0x344A
typedef EGLBoolean (EGLAPIENTRYP PFNEGLQUERYDMABUFFORMATSEXTPROC) (EGLDisplay dpy, EGLint max_formats, EGLint *formats, EGLint *num_formats);
typedef EGLBoolean (EGLAPIENTRYP PFNEGLQUERYDMABUFMODIFIERSEXTPROC) (EGLDisplay dpy, EGLint format, EGLint max_modifiers, EGLuint64KHR *modifiers, EGLBoolean *external_only, EGLint *num_modifiers);
#ifdef EGL_EGLEXT_PROTOTYPES
EGLAPI EGLBoolean EGLAPIENTRY eglQueryDmaBufFormatsEXT (EGLDisplay dpy, EGLint max_formats, EGLint *formats, EGLint *num_formats);
EGLAPI EGLBoolean EGLAPIENTRY eglQueryDmaBufModifiersEXT (EGLDisplay dpy, EGLint format, EGLint max_modifiers, EGLuint64KHR *modifiers, EGLBoolean *external_only, EGLint *num_modifiers);
#endif
#endif /* EGL_EXT_image_dma_buf_import_modifiers */
#ifndef EGL_EXT_multiview_window
#define EGL_EXT_multiview_window 1
#define EGL_MULTIVIEW_VIEW_COUNT_EXT 0x3134
@@ -613,8 +679,8 @@ EGLAPI EGLBoolean EGLAPIENTRY eglQueryDisplayAttribEXT (EGLDisplay dpy, EGLint a
#define EGL_EXT_output_base 1
typedef void *EGLOutputLayerEXT;
typedef void *EGLOutputPortEXT;
#define EGL_NO_OUTPUT_LAYER_EXT ((EGLOutputLayerEXT)0)
#define EGL_NO_OUTPUT_PORT_EXT ((EGLOutputPortEXT)0)
#define EGL_NO_OUTPUT_LAYER_EXT EGL_CAST(EGLOutputLayerEXT,0)
#define EGL_NO_OUTPUT_PORT_EXT EGL_CAST(EGLOutputPortEXT,0)
#define EGL_BAD_OUTPUT_LAYER_EXT 0x322D
#define EGL_BAD_OUTPUT_PORT_EXT 0x322E
#define EGL_SWAP_INTERVAL_EXT 0x322F
@@ -651,6 +717,13 @@ EGLAPI const char *EGLAPIENTRY eglQueryOutputPortStringEXT (EGLDisplay dpy, EGLO
#define EGL_OPENWF_PORT_ID_EXT 0x3239
#endif /* EGL_EXT_output_openwf */
#ifndef EGL_EXT_pixel_format_float
#define EGL_EXT_pixel_format_float 1
#define EGL_COLOR_COMPONENT_TYPE_EXT 0x3339
#define EGL_COLOR_COMPONENT_TYPE_FIXED_EXT 0x333A
#define EGL_COLOR_COMPONENT_TYPE_FLOAT_EXT 0x333B
#endif /* EGL_EXT_pixel_format_float */
#ifndef EGL_EXT_platform_base
#define EGL_EXT_platform_base 1
typedef EGLDisplay (EGLAPIENTRYP PFNEGLGETPLATFORMDISPLAYEXTPROC) (EGLenum platform, void *native_display, const EGLint *attrib_list);
@@ -696,6 +769,20 @@ EGLAPI EGLBoolean EGLAPIENTRY eglStreamConsumerOutputEXT (EGLDisplay dpy, EGLStr
#endif
#endif /* EGL_EXT_stream_consumer_egloutput */
#ifndef EGL_EXT_surface_SMPTE2086_metadata
#define EGL_EXT_surface_SMPTE2086_metadata 1
#define EGL_SMPTE2086_DISPLAY_PRIMARY_RX_EXT 0x3341
#define EGL_SMPTE2086_DISPLAY_PRIMARY_RY_EXT 0x3342
#define EGL_SMPTE2086_DISPLAY_PRIMARY_GX_EXT 0x3343
#define EGL_SMPTE2086_DISPLAY_PRIMARY_GY_EXT 0x3344
#define EGL_SMPTE2086_DISPLAY_PRIMARY_BX_EXT 0x3345
#define EGL_SMPTE2086_DISPLAY_PRIMARY_BY_EXT 0x3346
#define EGL_SMPTE2086_WHITE_POINT_X_EXT 0x3347
#define EGL_SMPTE2086_WHITE_POINT_Y_EXT 0x3348
#define EGL_SMPTE2086_MAX_LUMINANCE_EXT 0x3349
#define EGL_SMPTE2086_MIN_LUMINANCE_EXT 0x334A
#endif /* EGL_EXT_surface_SMPTE2086_metadata */
#ifndef EGL_EXT_swap_buffers_with_damage
#define EGL_EXT_swap_buffers_with_damage 1
typedef EGLBoolean (EGLAPIENTRYP PFNEGLSWAPBUFFERSWITHDAMAGEEXTPROC) (EGLDisplay dpy, EGLSurface surface, EGLint *rects, EGLint n_rects);
@@ -802,6 +889,11 @@ EGLAPI EGLBoolean EGLAPIENTRY eglExportDMABUFImageMESA (EGLDisplay dpy, EGLImage
#define EGL_PLATFORM_GBM_MESA 0x31D7
#endif /* EGL_MESA_platform_gbm */
#ifndef EGL_MESA_platform_surfaceless
#define EGL_MESA_platform_surfaceless 1
#define EGL_PLATFORM_SURFACELESS_MESA 0x31DD
#endif /* EGL_MESA_platform_surfaceless */
#ifndef EGL_NOK_swap_region
#define EGL_NOK_swap_region 1
typedef EGLBoolean (EGLAPIENTRYP PFNEGLSWAPBUFFERSREGIONNOKPROC) (EGLDisplay dpy, EGLSurface surface, EGLint numRects, const EGLint *rects);
@@ -901,6 +993,48 @@ EGLAPI EGLBoolean EGLAPIENTRY eglStreamConsumerGLTextureExternalAttribsNV (EGLDi
#endif
#endif /* EGL_NV_stream_consumer_gltexture_yuv */
#ifndef EGL_NV_stream_cross_display
#define EGL_NV_stream_cross_display 1
#define EGL_STREAM_CROSS_DISPLAY_NV 0x334E
#endif /* EGL_NV_stream_cross_display */
#ifndef EGL_NV_stream_cross_object
#define EGL_NV_stream_cross_object 1
#define EGL_STREAM_CROSS_OBJECT_NV 0x334D
#endif /* EGL_NV_stream_cross_object */
#ifndef EGL_NV_stream_cross_partition
#define EGL_NV_stream_cross_partition 1
#define EGL_STREAM_CROSS_PARTITION_NV 0x323F
#endif /* EGL_NV_stream_cross_partition */
#ifndef EGL_NV_stream_cross_process
#define EGL_NV_stream_cross_process 1
#define EGL_STREAM_CROSS_PROCESS_NV 0x3245
#endif /* EGL_NV_stream_cross_process */
#ifndef EGL_NV_stream_cross_system
#define EGL_NV_stream_cross_system 1
#define EGL_STREAM_CROSS_SYSTEM_NV 0x334F
#endif /* EGL_NV_stream_cross_system */
#ifndef EGL_NV_stream_fifo_next
#define EGL_NV_stream_fifo_next 1
#define EGL_PENDING_FRAME_NV 0x3329
#define EGL_STREAM_TIME_PENDING_NV 0x332A
#endif /* EGL_NV_stream_fifo_next */
#ifndef EGL_NV_stream_fifo_synchronous
#define EGL_NV_stream_fifo_synchronous 1
#define EGL_STREAM_FIFO_SYNCHRONOUS_NV 0x3336
#endif /* EGL_NV_stream_fifo_synchronous */
#ifndef EGL_NV_stream_frame_limits
#define EGL_NV_stream_frame_limits 1
#define EGL_PRODUCER_MAX_FRAME_HINT_NV 0x3337
#define EGL_CONSUMER_MAX_FRAME_HINT_NV 0x3338
#endif /* EGL_NV_stream_frame_limits */
#ifndef EGL_NV_stream_metadata
#define EGL_NV_stream_metadata 1
#define EGL_MAX_STREAM_METADATA_BLOCKS_NV 0x3250
@@ -927,6 +1061,45 @@ EGLAPI EGLBoolean EGLAPIENTRY eglQueryStreamMetadataNV (EGLDisplay dpy, EGLStrea
#endif
#endif /* EGL_NV_stream_metadata */
#ifndef EGL_NV_stream_remote
#define EGL_NV_stream_remote 1
#define EGL_STREAM_STATE_INITIALIZING_NV 0x3240
#define EGL_STREAM_TYPE_NV 0x3241
#define EGL_STREAM_PROTOCOL_NV 0x3242
#define EGL_STREAM_ENDPOINT_NV 0x3243
#define EGL_STREAM_LOCAL_NV 0x3244
#define EGL_STREAM_PRODUCER_NV 0x3247
#define EGL_STREAM_CONSUMER_NV 0x3248
#define EGL_STREAM_PROTOCOL_FD_NV 0x3246
#endif /* EGL_NV_stream_remote */
#ifndef EGL_NV_stream_reset
#define EGL_NV_stream_reset 1
#define EGL_SUPPORT_RESET_NV 0x3334
#define EGL_SUPPORT_REUSE_NV 0x3335
typedef EGLBoolean (EGLAPIENTRYP PFNEGLRESETSTREAMNVPROC) (EGLDisplay dpy, EGLStreamKHR stream);
#ifdef EGL_EGLEXT_PROTOTYPES
EGLAPI EGLBoolean EGLAPIENTRY eglResetStreamNV (EGLDisplay dpy, EGLStreamKHR stream);
#endif
#endif /* EGL_NV_stream_reset */
#ifndef EGL_NV_stream_socket
#define EGL_NV_stream_socket 1
#define EGL_STREAM_PROTOCOL_SOCKET_NV 0x324B
#define EGL_SOCKET_HANDLE_NV 0x324C
#define EGL_SOCKET_TYPE_NV 0x324D
#endif /* EGL_NV_stream_socket */
#ifndef EGL_NV_stream_socket_inet
#define EGL_NV_stream_socket_inet 1
#define EGL_SOCKET_TYPE_INET_NV 0x324F
#endif /* EGL_NV_stream_socket_inet */
#ifndef EGL_NV_stream_socket_unix
#define EGL_NV_stream_socket_unix 1
#define EGL_SOCKET_TYPE_UNIX_NV 0x324E
#endif /* EGL_NV_stream_socket_unix */
#ifndef EGL_NV_stream_sync
#define EGL_NV_stream_sync 1
#define EGL_SYNC_NEW_FRAME_NV 0x321F
@@ -953,7 +1126,7 @@ typedef khronos_utime_nanoseconds_t EGLTimeNV;
#define EGL_SYNC_TYPE_NV 0x30ED
#define EGL_SYNC_CONDITION_NV 0x30EE
#define EGL_SYNC_FENCE_NV 0x30EF
#define EGL_NO_SYNC_NV ((EGLSyncNV)0)
#define EGL_NO_SYNC_NV EGL_CAST(EGLSyncNV,0)
typedef EGLSyncNV (EGLAPIENTRYP PFNEGLCREATEFENCESYNCNVPROC) (EGLDisplay dpy, EGLenum condition, const EGLint *attrib_list);
typedef EGLBoolean (EGLAPIENTRYP PFNEGLDESTROYSYNCNVPROC) (EGLSyncNV sync);
typedef EGLBoolean (EGLAPIENTRYP PFNEGLFENCENVPROC) (EGLSyncNV sync);

View File

@@ -52,6 +52,7 @@ extern "C" {
#define EGL_TEXTURE_Y_U_V_WL 0x31D7
#define EGL_TEXTURE_Y_UV_WL 0x31D8
#define EGL_TEXTURE_Y_XUXV_WL 0x31D9
#define EGL_TEXTURE_EXTERNAL_WL 0x31DA
struct wl_display;
struct wl_resource;
@@ -84,10 +85,12 @@ typedef EGLBoolean (EGLAPIENTRYP PFNEGLSWAPBUFFERSREGIONNOK) (EGLDisplay dpy, EG
#define EGL_NO_CONFIG_MESA ((EGLConfig)0)
#endif
#ifndef EGL_MESA_platform_surfaceless
#define EGL_MESA_platform_surfaceless 1
#define EGL_PLATFORM_SURFACELESS_MESA 0x31DD
#endif /* EGL_MESA_platform_surfaceless */
#ifndef EGL_MESA_drm_image_formats
#define EGL_MESA_drm_image_formats 1
#define EGL_DRM_BUFFER_FORMAT_ARGB2101010_MESA 0x3290
#define EGL_DRM_BUFFER_FORMAT_ARGB1555_MESA 0x3291
#define EGL_DRM_BUFFER_FORMAT_RGB565_MESA 0x3292
#endif /* EGL_MESA_drm_image_formats */
#ifdef __cplusplus
}

View File

@@ -2,7 +2,7 @@
#define __eglplatform_h_
/*
** Copyright (c) 2007-2013 The Khronos Group Inc.
** Copyright (c) 2007-2016 The Khronos Group Inc.
**
** Permission is hereby granted, free of charge, to any person obtaining a
** copy of this software and/or associated documentation files (the
@@ -150,4 +150,12 @@ typedef EGLNativeWindowType NativeWindowType;
*/
typedef khronos_int32_t EGLint;
/* C++ / C typecast macros for special EGL handle values */
#if defined(__cplusplus)
#define EGL_CAST(type, value) (static_cast<type>(value))
#else
#define EGL_CAST(type, value) ((type) (value))
#endif
#endif /* __eglplatform_h */

View File

@@ -340,12 +340,19 @@ struct __DRI2throttleExtensionRec {
*/
#define __DRI2_FENCE "DRI2_Fence"
#define __DRI2_FENCE_VERSION 1
#define __DRI2_FENCE_VERSION 2
#define __DRI2_FENCE_TIMEOUT_INFINITE 0xffffffffffffffffllu
#define __DRI2_FENCE_FLAG_FLUSH_COMMANDS (1 << 0)
/**
* \name Capabilities that might be returned by __DRI2fenceExtensionRec::get_capabilities
*/
/*@{*/
#define __DRI_FENCE_CAP_NATIVE_FD 1
/*@}*/
struct __DRI2fenceExtensionRec {
__DRIextension base;
@@ -390,6 +397,41 @@ struct __DRI2fenceExtensionRec {
* sense with this function (right now there are none)
*/
void (*server_wait_sync)(__DRIcontext *ctx, void *fence, unsigned flags);
/**
* Query for general capabilities of the driver that concern fences.
* Returns a bitmask of __DRI_FENCE_CAP_x
*
* \since 2
*/
unsigned (*get_capabilities)(__DRIscreen *screen);
/**
* Create an fd (file descriptor) associated fence. If the fence fd
* is -1, this behaves similarly to create_fence() except that when
* rendering is flushed the driver creates a fence fd. Otherwise,
* the driver wraps an existing fence fd.
*
* This is used to implement the EGL_ANDROID_native_fence_sync extension.
*
* \since 2
*
* \param ctx the context associated with the fence
* \param fd the fence fd or -1
*/
void *(*create_fence_fd)(__DRIcontext *ctx, int fd);
/**
* For fences created with create_fence_fd(), after rendering is flushed,
* this retrieves the native fence fd. Caller takes ownership of the
* fd and will close() it when it is no longer needed.
*
* \since 2
*
* \param screen the screen associated with the fence
* \param fence the fence
*/
int (*get_fence_fd)(__DRIscreen *screen, void *fence);
};
@@ -1094,7 +1136,7 @@ struct __DRIdri2ExtensionRec {
* extensions.
*/
#define __DRI_IMAGE "DRI_IMAGE"
#define __DRI_IMAGE_VERSION 13
#define __DRI_IMAGE_VERSION 14
/**
* These formats correspond to the similarly named MESA_FORMAT_*
@@ -1121,6 +1163,9 @@ struct __DRIdri2ExtensionRec {
#define __DRI_IMAGE_FORMAT_XRGB2101010 0x1009
#define __DRI_IMAGE_FORMAT_ARGB2101010 0x100a
#define __DRI_IMAGE_FORMAT_SARGB8 0x100b
#define __DRI_IMAGE_FORMAT_ARGB1555 0x100c
#define __DRI_IMAGE_FORMAT_R16 0x100d
#define __DRI_IMAGE_FORMAT_GR1616 0x100e
#define __DRI_IMAGE_USE_SHARE 0x0001
#define __DRI_IMAGE_USE_SCANOUT 0x0002
@@ -1148,6 +1193,9 @@ struct __DRIdri2ExtensionRec {
#define __DRI_IMAGE_FOURCC_R8 0x20203852
#define __DRI_IMAGE_FOURCC_GR88 0x38385247
#define __DRI_IMAGE_FOURCC_ARGB1555 0x35315241
#define __DRI_IMAGE_FOURCC_R16 0x20363152
#define __DRI_IMAGE_FOURCC_GR1616 0x32335247
#define __DRI_IMAGE_FOURCC_RGB565 0x36314752
#define __DRI_IMAGE_FOURCC_ARGB8888 0x34325241
#define __DRI_IMAGE_FOURCC_XRGB8888 0x34325258
@@ -1209,6 +1257,8 @@ struct __DRIdri2ExtensionRec {
#define __DRI_IMAGE_ATTRIB_NUM_PLANES 0x2009 /* available in versions 11 */
#define __DRI_IMAGE_ATTRIB_OFFSET 0x200A /* available in versions 13 */
#define __DRI_IMAGE_ATTRIB_MODIFIER_LOWER 0x200B /* available in versions 14 */
#define __DRI_IMAGE_ATTRIB_MODIFIER_UPPER 0x200C /* available in versions 14 */
enum __DRIYUVColorSpace {
__DRI_YUV_COLOR_SPACE_UNDEFINED = 0,
@@ -1420,6 +1470,29 @@ struct __DRIimageExtensionRec {
*/
void (*unmapImage)(__DRIcontext *context, __DRIimage *image, void *data);
/**
* Creates an image with implementation's favorite modifiers.
*
* This acts like createImage except there is a list of modifiers passed in
* which the implementation may selectively use to create the DRIimage. The
* result should be the implementation selects one modifier (perhaps it would
* hold on to a few and later pick).
*
* The created image should be destroyed with destroyImage().
*
* Returns the new DRIimage. The chosen modifier can be obtained later on
* and passed back to things like the kernel's AddFB2 interface.
*
* \sa __DRIimageRec::createImage
*
* \since 14
*/
__DRIimage *(*createImageWithModifiers)(__DRIscreen *screen,
int width, int height, int format,
const uint64_t *modifiers,
const unsigned int modifier_count,
void *loaderPrivate);
};
@@ -1610,4 +1683,43 @@ struct __DRIimageDriverExtensionRec {
__DRIgetAPIMaskFunc getAPIMask;
};
/**
* Background callable loader extension.
*
* Loaders expose this extension to indicate to drivers that they are capable
* of handling callbacks from the driver's background drawing threads.
*/
#define __DRI_BACKGROUND_CALLABLE "DRI_BackgroundCallable"
#define __DRI_BACKGROUND_CALLABLE_VERSION 1
typedef struct __DRIbackgroundCallableExtensionRec __DRIbackgroundCallableExtension;
struct __DRIbackgroundCallableExtensionRec {
__DRIextension base;
/**
* Indicate that this thread is being used by the driver as a background
* drawing thread which may make callbacks to the loader.
*
* \param loaderPrivate is the value that was passed to to the driver when
* the context was created. This can be used by the loader to identify
* which context any callbacks are associated with.
*
* If this function is called more than once from any given thread, each
* subsequent call overrides the loaderPrivate data that was passed in the
* previous call. The driver can take advantage of this to re-use a
* background thread to perform drawing on behalf of multiple contexts.
*
* It is permissible for the driver to call this function from a
* non-background thread (i.e. a thread that has already been bound to a
* context using __DRIcoreExtensionRec::bindContext()); when this happens,
* the \c loaderPrivate pointer must be equal to the pointer that was
* passed to the driver when the currently bound context was created.
*
* This call should execute quickly enough that the driver can call it with
* impunity whenever a background thread starts performing drawing
* operations (e.g. it should just set a thread-local variable).
*/
void (*setBackgroundContext)(void *loaderPrivate);
};
#endif

View File

@@ -30,6 +30,9 @@
#define EMULATED_THREADS_H_INCLUDED_
#include <time.h>
#ifdef _MSC_VER
#include <thr/xtimec.h> // for xtime
#endif
#ifndef TIME_UTC
#define TIME_UTC 1
@@ -41,11 +44,13 @@
typedef void (*tss_dtor_t)(void*);
typedef int (*thrd_start_t)(void*);
#ifndef _MSC_VER
struct xtime {
time_t sec;
long nsec;
};
typedef struct xtime xtime;
#endif
/*-------------------- enumeration constants --------------------*/

View File

@@ -163,6 +163,7 @@ test_c99_compat_h(const void * restrict a,
# define HAVE_FUNC_ATTRIBUTE_UNUSED 1
# define HAVE_FUNC_ATTRIBUTE_FORMAT 1
# define HAVE_FUNC_ATTRIBUTE_PACKED 1
# define HAVE_FUNC_ATTRIBUTE_ALIAS 1
# if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3)
/* https://gcc.gnu.org/onlinedocs/gcc-4.3.6/gcc/Other-Builtins.html */

View File

@@ -35,6 +35,22 @@ typedef struct ID3DPresentGroup ID3DPresentGroup;
typedef struct ID3DAdapter9 ID3DAdapter9;
typedef struct D3DWindowBuffer D3DWindowBuffer;
/* Available since version 1.3 */
typedef struct _D3DPRESENT_PARAMETERS2_ {
/* Whether D3DSWAPEFFECT_DISCARD is allowed to release the
* D3DWindowBuffers in any order, and eventually with a delay.
* FALSE (Default): buffers should be released as soon as possible.
* TRUE: it is allowed to release some buffers with a delay, and in
* a random order. */
BOOL AllowDISCARDDelayedRelease;
/* User preference for D3DSWAPEFFECT_DISCARD with D3DPRESENT_INTERVAL_IMMEDIATE.
* FALSE (Default): User prefers presentation to occur as soon as possible,
* with potential tearings.
* TRUE: User prefers presentation to be tear free. Requires
* AllowDISCARDDelayedRelease to have any effect. */
BOOL TearFreeDISCARD;
} D3DPRESENT_PARAMETERS2, *PD3DPRESENT_PARAMETERS2, *LPD3DPRESENT_PARAMETERS2;
/* Presentation backend for drivers to display their brilliant work */
typedef struct ID3DPresentVtbl
{
@@ -54,7 +70,10 @@ typedef struct ID3DPresentVtbl
HRESULT (WINAPI *DestroyD3DWindowBuffer)(ID3DPresent *This, D3DWindowBuffer *buffer);
/* After presenting a buffer to the window system, the buffer
* may be used as is (no copy of the content) by the window system.
* You must not use a non-released buffer, else the user may see undefined content. */
* You must not use a non-released buffer, else the user may see undefined content.
* Note: This function waits as well that the buffer content was displayed (this
* can be after the release of the buffer if the window system decided to make
* an internal copy and release early. */
HRESULT (WINAPI *WaitBufferReleased)(ID3DPresent *This, D3DWindowBuffer *buffer);
HRESULT (WINAPI *FrontBufferCopy)(ID3DPresent *This, D3DWindowBuffer *buffer);
/* It is possible to do partial copy, but impossible to do resizing, which must
@@ -75,6 +94,11 @@ typedef struct ID3DPresentVtbl
BOOL (WINAPI *ResolutionMismatch)(ID3DPresent *This);
HANDLE (WINAPI *CreateThread)(ID3DPresent *This, void *pThreadfunc, void *pParam);
BOOL (WINAPI *WaitForThread)(ID3DPresent *This, HANDLE thread);
/* Available since version 1.3 */
HRESULT (WINAPI *SetPresentParameters2)(ID3DPresent *This, D3DPRESENT_PARAMETERS2 *pParameters);
BOOL (WINAPI *IsBufferReleased)(ID3DPresent *This, D3DWindowBuffer *buffer);
/* Wait a buffer gets released. */
HRESULT (WINAPI *WaitBufferReleaseEvent)(ID3DPresent *This);
} ID3DPresentVtbl;
struct ID3DPresent
@@ -106,6 +130,9 @@ struct ID3DPresent
#define ID3DPresent_ResolutionMismatch(p) (p)->lpVtbl->ResolutionMismatch(p)
#define ID3DPresent_CreateThread(p,a,b) (p)->lpVtbl->CreateThread(p,a,b)
#define ID3DPresent_WaitForThread(p,a) (p)->lpVtbl->WaitForThread(p,a)
#define ID3DPresent_SetPresentParameters2(p,a) (p)->lpVtbl->SetPresentParameters2(p,a)
#define ID3DPresent_IsBufferReleased(p,a) (p)->lpVtbl->IsBufferReleased(p,a)
#define ID3DPresent_WaitBufferReleaseEvent(p) (p)->lpVtbl->WaitBufferReleaseEvent(p)
typedef struct ID3DPresentGroupVtbl
{

View File

@@ -109,6 +109,10 @@ CHIPSET(0x162A, bdw_gt3, "Intel(R) Iris Pro P6300 (Broadwell GT3e)")
CHIPSET(0x162B, bdw_gt3, "Intel(R) Iris 6100 (Broadwell GT3)")
CHIPSET(0x162D, bdw_gt3, "Intel(R) Broadwell GT3")
CHIPSET(0x162E, bdw_gt3, "Intel(R) Broadwell GT3")
CHIPSET(0x22B0, chv, "Intel(R) HD Graphics (Cherrytrail)")
CHIPSET(0x22B1, chv, "Intel(R) HD Graphics XXX (Braswell)") /* Overridden in brw_get_renderer_string */
CHIPSET(0x22B2, chv, "Intel(R) HD Graphics (Cherryview)")
CHIPSET(0x22B3, chv, "Intel(R) HD Graphics (Cherryview)")
CHIPSET(0x1902, skl_gt1, "Intel(R) HD Graphics 510 (Skylake GT1)")
CHIPSET(0x1906, skl_gt1, "Intel(R) HD Graphics 510 (Skylake GT1)")
CHIPSET(0x190A, skl_gt1, "Intel(R) Skylake GT1")
@@ -134,8 +138,13 @@ CHIPSET(0x1932, skl_gt4, "Intel(R) Iris Pro Graphics 580 (Skylake GT4e)")
CHIPSET(0x193A, skl_gt4, "Intel(R) Iris Pro Graphics P580 (Skylake GT4e)")
CHIPSET(0x193B, skl_gt4, "Intel(R) Iris Pro Graphics 580 (Skylake GT4e)")
CHIPSET(0x193D, skl_gt4, "Intel(R) Iris Pro Graphics P580 (Skylake GT4e)")
CHIPSET(0x5902, kbl_gt1, "Intel(R) Kabylake GT1")
CHIPSET(0x5906, kbl_gt1, "Intel(R) Kabylake GT1")
CHIPSET(0x0A84, bxt, "Intel(R) HD Graphics (Broxton)")
CHIPSET(0x1A84, bxt, "Intel(R) HD Graphics (Broxton)")
CHIPSET(0x1A85, bxt_2x6, "Intel(R) HD Graphics (Broxton 2x6)")
CHIPSET(0x5A84, bxt, "Intel(R) HD Graphics 505 (Broxton)")
CHIPSET(0x5A85, bxt_2x6, "Intel(R) HD Graphics 500 (Broxton 2x6)")
CHIPSET(0x5902, kbl_gt1, "Intel(R) HD Graphics 610 (Kaby Lake GT1)")
CHIPSET(0x5906, kbl_gt1, "Intel(R) HD Graphics 610 (Kaby Lake GT1)")
CHIPSET(0x590A, kbl_gt1, "Intel(R) Kabylake GT1")
CHIPSET(0x5908, kbl_gt1, "Intel(R) Kabylake GT1")
CHIPSET(0x590B, kbl_gt1, "Intel(R) Kabylake GT1")
@@ -143,23 +152,16 @@ CHIPSET(0x590E, kbl_gt1, "Intel(R) Kabylake GT1")
CHIPSET(0x5913, kbl_gt1_5, "Intel(R) Kabylake GT1.5")
CHIPSET(0x5915, kbl_gt1_5, "Intel(R) Kabylake GT1.5")
CHIPSET(0x5917, kbl_gt1_5, "Intel(R) Kabylake GT1.5")
CHIPSET(0x5912, kbl_gt2, "Intel(R) Kabylake GT2")
CHIPSET(0x5916, kbl_gt2, "Intel(R) Kabylake GT2")
CHIPSET(0x591A, kbl_gt2, "Intel(R) Kabylake GT2")
CHIPSET(0x591B, kbl_gt2, "Intel(R) Kabylake GT2")
CHIPSET(0x591D, kbl_gt2, "Intel(R) Kabylake GT2")
CHIPSET(0x591E, kbl_gt2, "Intel(R) Kabylake GT2")
CHIPSET(0x5912, kbl_gt2, "Intel(R) HD Graphics 630 (Kaby Lake GT2)")
CHIPSET(0x5916, kbl_gt2, "Intel(R) HD Graphics 620 (Kaby Lake GT2)")
CHIPSET(0x591A, kbl_gt2, "Intel(R) HD Graphics P630 (Kaby Lake GT2)")
CHIPSET(0x591B, kbl_gt2, "Intel(R) HD Graphics 630 (Kaby Lake GT2)")
CHIPSET(0x591D, kbl_gt2, "Intel(R) HD Graphics P630 (Kaby Lake GT2)")
CHIPSET(0x591E, kbl_gt2, "Intel(R) HD Graphics 615 (Kaby Lake GT2)")
CHIPSET(0x5921, kbl_gt2, "Intel(R) Kabylake GT2F")
CHIPSET(0x5923, kbl_gt3, "Intel(R) Kabylake GT3")
CHIPSET(0x5926, kbl_gt3, "Intel(R) Kabylake GT3")
CHIPSET(0x5927, kbl_gt3, "Intel(R) Kabylake GT3")
CHIPSET(0x5926, kbl_gt3, "Intel(R) Iris Plus Graphics 640 (Kaby Lake GT3)")
CHIPSET(0x5927, kbl_gt3, "Intel(R) Iris Plus Graphics 650 (Kaby Lake GT3)")
CHIPSET(0x593B, kbl_gt4, "Intel(R) Kabylake GT4")
CHIPSET(0x22B0, chv, "Intel(R) HD Graphics (Cherrytrail)")
CHIPSET(0x22B1, chv, "Intel(R) HD Graphics XXX (Braswell)") /* Overridden in brw_get_renderer_string */
CHIPSET(0x22B2, chv, "Intel(R) HD Graphics (Cherryview)")
CHIPSET(0x22B3, chv, "Intel(R) HD Graphics (Cherryview)")
CHIPSET(0x0A84, bxt, "Intel(R) HD Graphics (Broxton)")
CHIPSET(0x1A84, bxt, "Intel(R) HD Graphics (Broxton)")
CHIPSET(0x1A85, bxt_2x6, "Intel(R) HD Graphics (Broxton 2x6)")
CHIPSET(0x5A84, bxt, "Intel(R) HD Graphics (Broxton)")
CHIPSET(0x5A85, bxt_2x6, "Intel(R) HD Graphics (Broxton 2x6)")
CHIPSET(0x3184, glk, "Intel(R) HD Graphics (Geminilake)")
CHIPSET(0x3185, glk_2x6, "Intel(R) HD Graphics (Geminilake 2x6)")

View File

@@ -202,6 +202,23 @@ CHIPSET(0x67C9, POLARIS10_, POLARIS10)
CHIPSET(0x67CA, POLARIS10_, POLARIS10)
CHIPSET(0x67CC, POLARIS10_, POLARIS10)
CHIPSET(0x67CF, POLARIS10_, POLARIS10)
CHIPSET(0x67D0, POLARIS10_, POLARIS10)
CHIPSET(0x67DF, POLARIS10_, POLARIS10)
CHIPSET(0x98E4, STONEY_, STONEY)
CHIPSET(0x6980, POLARIS12_, POLARIS12)
CHIPSET(0x6981, POLARIS12_, POLARIS12)
CHIPSET(0x6985, POLARIS12_, POLARIS12)
CHIPSET(0x6986, POLARIS12_, POLARIS12)
CHIPSET(0x6987, POLARIS12_, POLARIS12)
CHIPSET(0x6995, POLARIS12_, POLARIS12)
CHIPSET(0x699F, POLARIS12_, POLARIS12)
CHIPSET(0x6860, VEGA10_, VEGA10)
CHIPSET(0x6861, VEGA10_, VEGA10)
CHIPSET(0x6862, VEGA10_, VEGA10)
CHIPSET(0x6863, VEGA10_, VEGA10)
CHIPSET(0x6867, VEGA10_, VEGA10)
CHIPSET(0x687F, VEGA10_, VEGA10)
CHIPSET(0x686C, VEGA10_, VEGA10)

View File

@@ -1,28 +1,56 @@
//
// File: vk_icd.h
//
/*
* Copyright (c) 2015-2016 The Khronos Group Inc.
* Copyright (c) 2015-2016 Valve Corporation
* Copyright (c) 2015-2016 LunarG, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
*/
#ifndef VKICD_H
#define VKICD_H
#include "vk_platform.h"
#include "vulkan.h"
/*
* Loader-ICD version negotiation API
*/
#define CURRENT_LOADER_ICD_INTERFACE_VERSION 3
#define MIN_SUPPORTED_LOADER_ICD_INTERFACE_VERSION 0
typedef VkResult (VKAPI_PTR *PFN_vkNegotiateLoaderICDInterfaceVersion)(uint32_t *pVersion);
/*
* The ICD must reserve space for a pointer for the loader's dispatch
* table, at the start of <each object>.
* The ICD must initialize this variable using the SET_LOADER_MAGIC_VALUE macro.
*/
#define ICD_LOADER_MAGIC 0x01CDC0DE
#define ICD_LOADER_MAGIC 0x01CDC0DE
typedef union _VK_LOADER_DATA {
uintptr_t loaderMagic;
void *loaderData;
typedef union {
uintptr_t loaderMagic;
void *loaderData;
} VK_LOADER_DATA;
static inline void set_loader_magic_value(void* pNewObject) {
VK_LOADER_DATA *loader_info = (VK_LOADER_DATA *) pNewObject;
static inline void set_loader_magic_value(void *pNewObject) {
VK_LOADER_DATA *loader_info = (VK_LOADER_DATA *)pNewObject;
loader_info->loaderMagic = ICD_LOADER_MAGIC;
}
static inline bool valid_loader_magic_value(void* pNewObject) {
const VK_LOADER_DATA *loader_info = (VK_LOADER_DATA *) pNewObject;
static inline bool valid_loader_magic_value(void *pNewObject) {
const VK_LOADER_DATA *loader_info = (VK_LOADER_DATA *)pNewObject;
return (loader_info->loaderMagic & 0xffffffff) == ICD_LOADER_MAGIC;
}
@@ -30,56 +58,74 @@ static inline bool valid_loader_magic_value(void* pNewObject) {
* Windows and Linux ICDs will treat VkSurfaceKHR as a pointer to a struct that
* contains the platform-specific connection and surface information.
*/
typedef enum _VkIcdWsiPlatform {
typedef enum {
VK_ICD_WSI_PLATFORM_MIR,
VK_ICD_WSI_PLATFORM_WAYLAND,
VK_ICD_WSI_PLATFORM_WIN32,
VK_ICD_WSI_PLATFORM_XCB,
VK_ICD_WSI_PLATFORM_XLIB,
VK_ICD_WSI_PLATFORM_DISPLAY
} VkIcdWsiPlatform;
typedef struct _VkIcdSurfaceBase {
VkIcdWsiPlatform platform;
typedef struct {
VkIcdWsiPlatform platform;
} VkIcdSurfaceBase;
#ifdef VK_USE_PLATFORM_MIR_KHR
typedef struct _VkIcdSurfaceMir {
VkIcdSurfaceBase base;
MirConnection* connection;
MirSurface* mirSurface;
typedef struct {
VkIcdSurfaceBase base;
MirConnection *connection;
MirSurface *mirSurface;
} VkIcdSurfaceMir;
#endif // VK_USE_PLATFORM_MIR_KHR
#ifdef VK_USE_PLATFORM_WAYLAND_KHR
typedef struct _VkIcdSurfaceWayland {
VkIcdSurfaceBase base;
struct wl_display* display;
struct wl_surface* surface;
typedef struct {
VkIcdSurfaceBase base;
struct wl_display *display;
struct wl_surface *surface;
} VkIcdSurfaceWayland;
#endif // VK_USE_PLATFORM_WAYLAND_KHR
#ifdef VK_USE_PLATFORM_WIN32_KHR
typedef struct _VkIcdSurfaceWin32 {
VkIcdSurfaceBase base;
HINSTANCE hinstance;
HWND hwnd;
typedef struct {
VkIcdSurfaceBase base;
HINSTANCE hinstance;
HWND hwnd;
} VkIcdSurfaceWin32;
#endif // VK_USE_PLATFORM_WIN32_KHR
#ifdef VK_USE_PLATFORM_XCB_KHR
typedef struct _VkIcdSurfaceXcb {
VkIcdSurfaceBase base;
xcb_connection_t* connection;
xcb_window_t window;
typedef struct {
VkIcdSurfaceBase base;
xcb_connection_t *connection;
xcb_window_t window;
} VkIcdSurfaceXcb;
#endif // VK_USE_PLATFORM_XCB_KHR
#ifdef VK_USE_PLATFORM_XLIB_KHR
typedef struct _VkIcdSurfaceXlib {
VkIcdSurfaceBase base;
Display* dpy;
Window window;
typedef struct {
VkIcdSurfaceBase base;
Display *dpy;
Window window;
} VkIcdSurfaceXlib;
#endif // VK_USE_PLATFORM_XLIB_KHR
#ifdef VK_USE_PLATFORM_ANDROID_KHR
typedef struct {
ANativeWindow* window;
} VkIcdSurfaceAndroid;
#endif //VK_USE_PLATFORM_ANDROID_KHR
typedef struct {
VkIcdSurfaceBase base;
VkDisplayModeKHR displayMode;
uint32_t planeIndex;
uint32_t planeStackIndex;
VkSurfaceTransformFlagBitsKHR transform;
float globalAlpha;
VkDisplayPlaneAlphaFlagBitsKHR alphaMode;
VkExtent2D imageExtent;
} VkIcdSurfaceDisplay;
#endif // VKICD_H

View File

@@ -2,26 +2,19 @@
// File: vk_platform.h
//
/*
** Copyright (c) 2014-2015 The Khronos Group Inc.
** Copyright (c) 2014-2017 The Khronos Group Inc.
**
** Permission is hereby granted, free of charge, to any person obtaining a
** copy of this software and/or associated documentation files (the
** "Materials"), to deal in the Materials without restriction, including
** without limitation the rights to use, copy, modify, merge, publish,
** distribute, sublicense, and/or sell copies of the Materials, and to
** permit persons to whom the Materials are furnished to do so, subject to
** the following conditions:
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
**
** The above copyright notice and this permission notice shall be included
** in all copies or substantial portions of the Materials.
** http://www.apache.org/licenses/LICENSE-2.0
**
** THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
** EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
** MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
** IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
** CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
** TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
** MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS.
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*/
@@ -58,13 +51,13 @@ extern "C"
#define VKAPI_ATTR
#define VKAPI_CALL __stdcall
#define VKAPI_PTR VKAPI_CALL
#elif defined(__ANDROID__) && defined(__ARM_EABI__) && !defined(__ARM_ARCH_7A__)
// Android does not support Vulkan in native code using the "armeabi" ABI.
#error "Vulkan requires the 'armeabi-v7a' or 'armeabi-v7a-hard' ABI on 32-bit ARM CPUs"
#elif defined(__ANDROID__) && defined(__ARM_ARCH_7A__)
// On Android/ARMv7a, Vulkan functions use the armeabi-v7a-hard calling
// convention, even if the application's native code is compiled with the
// armeabi-v7a calling convention.
#elif defined(__ANDROID__) && defined(__ARM_ARCH) && __ARM_ARCH < 7
#error "Vulkan isn't supported for the 'armeabi' NDK ABI"
#elif defined(__ANDROID__) && defined(__ARM_ARCH) && __ARM_ARCH >= 7 && defined(__ARM_32BIT_STATE)
// On Android 32-bit ARM targets, Vulkan functions use the "hardfloat"
// calling convention, i.e. float parameters are passed in registers. This
// is true even if the rest of the application passes floats on the stack,
// as it does by default when compiling for the armeabi-v7a NDK ABI.
#define VKAPI_ATTR __attribute__((pcs("aapcs-vfp")))
#define VKAPI_CALL
#define VKAPI_PTR VKAPI_ATTR

File diff suppressed because it is too large Load Diff

View File

@@ -281,7 +281,7 @@ def parse_source_list(env, filename, names=None):
# cause duplicate actions.
f = f[len(cur_srcdir + '/'):]
# do not include any headers
if f.endswith('.h'):
if f.endswith(tuple(['.h','.hpp'])):
continue
srcs.append(f)

View File

@@ -291,8 +291,9 @@ def generate(env):
# C preprocessor options
cppdefines = []
cppdefines += [
'__STDC_LIMIT_MACROS',
'__STDC_CONSTANT_MACROS',
'__STDC_FORMAT_MACROS',
'__STDC_LIMIT_MACROS',
'HAVE_NO_AUTOCONF',
]
if env['build'] in ('debug', 'checked'):
@@ -323,10 +324,6 @@ def generate(env):
'GLX_DIRECT_RENDERING',
'GLX_INDIRECT_RENDERING',
]
if env['platform'] in ('linux', 'freebsd'):
cppdefines += ['HAVE_ALIAS']
else:
cppdefines += ['GLX_ALIAS_UNSUPPORTED']
if env['platform'] in ('linux', 'darwin'):
cppdefines += ['HAVE_XLOCALE_H']
@@ -648,10 +645,10 @@ def generate(env):
env.AddMethod(msvc2013_compat, 'MSVC2013Compat')
env.AddMethod(unit_test, 'UnitTest')
env.PkgCheckModules('X11', ['x11', 'xext', 'xdamage', 'xfixes', 'glproto >= 1.4.13'])
env.PkgCheckModules('X11', ['x11', 'xext', 'xdamage >= 1.1', 'xfixes', 'glproto >= 1.4.13', 'dri2proto >= 2.8'])
env.PkgCheckModules('XCB', ['x11-xcb', 'xcb-glx >= 1.8.1', 'xcb-dri2 >= 1.8'])
env.PkgCheckModules('XF86VIDMODE', ['xxf86vm'])
env.PkgCheckModules('DRM', ['libdrm >= 2.4.38'])
env.PkgCheckModules('DRM', ['libdrm >= 2.4.75'])
if env['x11']:
env.Append(CPPPATH = env['X11_CPPPATH'])

View File

@@ -100,13 +100,28 @@ def generate(env):
env.Prepend(CPPPATH = [os.path.join(llvm_dir, 'include')])
env.AppendUnique(CPPDEFINES = [
'__STDC_LIMIT_MACROS',
'__STDC_CONSTANT_MACROS',
'HAVE_STDINT_H',
])
env.Prepend(LIBPATH = [os.path.join(llvm_dir, 'lib')])
# LIBS should match the output of `llvm-config --libs engine mcjit bitwriter x86asmprinter`
if llvm_version >= distutils.version.LooseVersion('3.7'):
if llvm_version >= distutils.version.LooseVersion('3.9'):
env.Prepend(LIBS = [
'LLVMX86Disassembler', 'LLVMX86AsmParser',
'LLVMX86CodeGen', 'LLVMSelectionDAG', 'LLVMAsmPrinter',
'LLVMDebugInfoCodeView', 'LLVMCodeGen',
'LLVMScalarOpts', 'LLVMInstCombine',
'LLVMInstrumentation', 'LLVMTransformUtils',
'LLVMBitWriter', 'LLVMX86Desc',
'LLVMMCDisassembler', 'LLVMX86Info',
'LLVMX86AsmPrinter', 'LLVMX86Utils',
'LLVMMCJIT', 'LLVMExecutionEngine', 'LLVMTarget',
'LLVMAnalysis', 'LLVMProfileData',
'LLVMRuntimeDyld', 'LLVMObject', 'LLVMMCParser',
'LLVMBitReader', 'LLVMMC', 'LLVMCore',
'LLVMSupport',
'LLVMIRReader', 'LLVMASMParser'
])
elif llvm_version >= distutils.version.LooseVersion('3.7'):
env.Prepend(LIBS = [
'LLVMBitWriter', 'LLVMX86Disassembler', 'LLVMX86AsmParser',
'LLVMX86CodeGen', 'LLVMSelectionDAG', 'LLVMAsmPrinter',
@@ -177,11 +192,12 @@ def generate(env):
# that.
env.Append(LINKFLAGS = ['/nodefaultlib:LIBCMT'])
else:
if not env.Detect('llvm-config'):
print 'scons: llvm-config script not found'
llvm_config = os.environ.get('LLVM_CONFIG', 'llvm-config')
if not env.Detect(llvm_config):
print 'scons: %s script not found' % llvm_config
return
llvm_version = env.backtick('llvm-config --version').rstrip()
llvm_version = env.backtick('%s --version' % llvm_config).rstrip()
llvm_version = distutils.version.LooseVersion(llvm_version)
if llvm_version < distutils.version.LooseVersion(required_llvm_version):
@@ -191,7 +207,7 @@ def generate(env):
try:
# Treat --cppflags specially to prevent NDEBUG from disabling
# assertion failures in debug builds.
cppflags = env.ParseFlags('!llvm-config --cppflags')
cppflags = env.ParseFlags('!%s --cppflags' % llvm_config)
try:
cppflags['CPPDEFINES'].remove('NDEBUG')
except ValueError:
@@ -199,16 +215,16 @@ def generate(env):
env.MergeFlags(cppflags)
# Match llvm --fno-rtti flag
cxxflags = env.backtick('llvm-config --cxxflags').split()
cxxflags = env.backtick('%s --cxxflags' % llvm_config).split()
if '-fno-rtti' in cxxflags:
env.Append(CXXFLAGS = ['-fno-rtti'])
components = ['engine', 'mcjit', 'bitwriter', 'x86asmprinter', 'mcdisassembler']
components = ['engine', 'mcjit', 'bitwriter', 'x86asmprinter', 'mcdisassembler', 'irreader']
env.ParseConfig('llvm-config --libs ' + ' '.join(components))
env.ParseConfig('llvm-config --ldflags')
env.ParseConfig('%s --libs ' % llvm_config + ' '.join(components))
env.ParseConfig('%s --ldflags' % llvm_config)
if llvm_version >= distutils.version.LooseVersion('3.5'):
env.ParseConfig('llvm-config --system-libs')
env.ParseConfig('%s --system-libs' % llvm_config)
env.Append(CXXFLAGS = ['-std=c++11'])
except OSError:
print 'scons: llvm-config version %s failed' % llvm_version

View File

@@ -44,6 +44,7 @@ git_sha1.h: git_sha1.h.tmp
BUILT_SOURCES = git_sha1.h
CLEANFILES = $(BUILT_SOURCES)
EXTRA_DIST =
SUBDIRS = . gtest util mapi/glapi/gen mapi
@@ -74,6 +75,16 @@ endif
# include only conditionally ?
SUBDIRS += compiler
## Optionally required by GBM, EGL and Vulkan
if HAVE_PLATFORM_WAYLAND
SUBDIRS += egl/wayland/wayland-drm
endif
if HAVE_VULKAN_COMMON
SUBDIRS += vulkan
endif
EXTRA_DIST += vulkan/registry/vk.xml
if HAVE_AMD_DRIVERS
SUBDIRS += amd
endif
@@ -92,11 +103,6 @@ if HAVE_DRI_GLX
SUBDIRS += glx
endif
## Optionally required by GBM and EGL
if HAVE_PLATFORM_WAYLAND
SUBDIRS += egl/wayland/wayland-drm
endif
## Optionally required by EGL (aka PLATFORM_GBM)
if HAVE_GBM
SUBDIRS += gbm
@@ -111,22 +117,8 @@ if HAVE_EGL
SUBDIRS += egl
endif
if HAVE_INTEL_DRIVERS
SUBDIRS += intel/tools
endif
if HAVE_VULKAN_COMMON
SUBDIRS += vulkan/wsi
endif
## Requires the i965 compiler (part of mesa) and wayland-drm
if HAVE_INTEL_VULKAN
SUBDIRS += intel/vulkan
endif
# Requires wayland-drm
if HAVE_RADEON_VULKAN
SUBDIRS += amd/common
SUBDIRS += amd/vulkan
endif
@@ -134,7 +126,7 @@ if HAVE_GALLIUM
SUBDIRS += gallium
endif
EXTRA_DIST = \
EXTRA_DIST += \
getopt hgl SConscript \
$(top_srcdir)/include/GL/mesa_glinterop.h

Some files were not shown because too many files have changed in this diff Show More