* shader_recompiler: Tessellation WIP
* fix compiler errors after merge
DONT MERGE set log file to /dev/null
DONT MERGE linux pthread bb fix
save work
DONT MERGE dump ir
save more work
fix mistake with ES shader
skip list
add input patch control points dynamic state
random stuff
* WIP Tessellation partial implementation. Squash commits
* test: make local/tcs use attr arrays
* attr arrays in TCS/TES
* dont define empty attr arrays
* switch to special opcodes for tess tcs/tes reads and tcs writes
* impl tcs/tes read attr insts
* rebase fix
* save some work
* save work probably broken and slow
* put Vertex LogicalStage after TCS and TES to fix bindings
* more refactors
* refactor pattern matching and optimize modulos (disabled)
* enable modulo opt
* copyright
* rebase fixes
* remove some prints
* remove some stuff
* Add TCS/TES support for shader patching and use LogicalStage
* refactor and handle wider DS instructions
* get rid of GetAttributes for special tess constants reads. Immediately replace some upon seeing readconstbuffer. Gets rid of some extra passes over IR
* stop relying on GNMX HsConstants struct. Change runtime_info.hs_info and some regs
* delete some more stuff
* update comments for current implementation
* some cleanup
* uint error
* more cleanup
* remove patch control points dynamic state (because runtime_info already depends on it)
* fix potential problem with determining passthrough
---------
Co-authored-by: IndecisiveTurtle <47210458+raphaelthegreat@users.noreply.github.com>
* shader_recompiler: Read image format info directly from sharps instead of storing in shader info.
* renderer_vulkan: Parse fetch shader per-pipeline
* Few minor fixes.
* shader_recompiler: Specialize on vertex attribute number types.
* shader_recompiler: Move GetDrawOffsets to fetch shader
* libkernel: Cleanup some function places
* kernel: Refactor thread functions
* kernel: It builds
* kernel: Fix a bunch of bugs, kernel thread heap
* kernel: File cleanup pt1
* File cleanup pt2
* File cleanup pt3
* File cleanup pt4
* kernel: Add missing funcs
* kernel: Add basic exceptions for linux
* gnmdriver: Add workload functions
* kernel: Fix new pthreads code on macOS. (#1441)
* kernel: Downgrade edeadlk to log
* gnmdriver: Add sceGnmSubmitCommandBuffersForWorkload
* exception: Add context register population for macOS. (#1444)
* kernel: Pthread rewrite touchups for Windows
* kernel: Multiplatform thread implementation
* mutex: Remove spamming log
* pthread_spec: Make assert into a log
* pthread_spec: Zero initialize array
* Attempt to fix non-Windows builds
* hotfix: change incorrect NID for scePthreadAttrSetaffinity
* scePthreadAttrSetaffinity implementation
* Attempt to fix Linux
* windows: Address a bunch of address space problems
* address_space: Fix unmap of region surrounded by placeholders
* libs: Reduce logging
* pthread: Implement condvar with waitable atomics and sleepqueue
* sleepq: Separate and make faster
* time: Remove delay execution
* Causes high cpu usage in Tohou Luna Nights
* kernel: Cleanup files again
* pthread: Add missing include
* semaphore: Use binary_semaphore instead of condvar
* Seems more reliable
* libraries/sysmodule: log module on `sceSysmoduleIsLoaded`
* libraries/kernel: implement `scePthreadSetPrio`
---------
Co-authored-by: squidbus <175574877+squidbus@users.noreply.github.com>
Co-authored-by: Daniel R. <47796739+polybiusproxy@users.noreply.github.com>
* Implement shader resource tables
* fix after rebase + squash
* address some review comments
* fix pipeline_common
* cleanup debug stuff
* switch to using single codegenerator
* shader_recompiler: Move sampling parameter resolution to tracking pass and support more derivative types.
* shader_recompiler: Only track sampler sharp on sample instructions.
* shader_recompiler: Fix Inst args size.
* shader_recompiler: Define fragment output type based on number format.
* shader_recompiler: Fix GetAttribute SPIR-V output type.
* shader_recompiler: Don't bitcast on SetAttribute unless integer target.
* shader_recompiler: Use push constants for user data regs
* shader: Add some GR2 instructions
* shader: Add some instructions
* shader: Add instructions for knack
* touchups
* spirv: Better names
* buffer_cache: Ignore non gpu modified images
* clang format
* Add log
* more fixes
* video_core: texture: image subresources state tracking
* shader_recompiler: use one binding if the same image is read and written
* video_core: added rebinding of changed textures after overlap resolve
* don't use pointers; slight `FindTexture` refactoring
* video_core: buffer_cache: don't copy over the image size
* redundant barriers removed; fixes
* regression fixes
* texture_cache: 3d texture layers count fixup
* shader_recompiler: support for partially bound cubemaps
* added support for cubemap arrays
* don't bind unused color buffers
* fixed depth promotion to do not use stencil
* doors
* bonfire lit
* cubemap array index calculation
* final touches
* shader_recompiler: Add more format swap modes
* texture_cache: Handle stencil texture reads
* emulator: Support loading font library
* readme: Add thanks section
* shader_recompiler: Constant buffers as integers
* shader_recompiler: Typed buffers as integers
* shader_recompiler: Separate thread bit scalars
* We can assume guest shader never mixes them with normal sgprs. This helps avoid errors where ssa could view an sgpr write dominating a thread bit read, due to how control flow is structurized, even though its not possible in actual control flow
* shader_recompiler: Implement data append/consume operations
* clang format
* buffer_cache: Simplify invalidation scheme
* video_core: Remove some invalidation remnants
* adjust
* Implement some missing shader opcodes
Implements TBUFFER_STORE_FORMAT_XYZW, IMAGE_SAMPLE_CD, and IMAGE_GATHER4_C_LZ.
These are seen in https://github.com/shadps4-emu/shadPS4/issues/496.
* Implement IMAGE_STORE_MIP
Not sure if this is the right way to do this, let me know if this needs changing.
* Revert "Implement IMAGE_STORE_MIP"
This reverts commit cff78b5924.
* shader_recompiler: Implement V_MOVRELS_B32, V_MOVRELD_B32,
V_MOVRELSD_B32
Generates a ton of OpSelects to hardcode reading or writing from each
possible vgpr depending on the value of m0
Future work is to do range analysis to put an upper bound on m0 and
check fewer registers.
* fix runtime info after rebase
* shader_recompiler: Use null image when shader is compiled with unbound sharp
* video_core: Refactor and render target swizzles
* liverpool_to_vk: Add missing swap format from RDR
* video_core: Refactor shader recompiler interface
* Makes it much easier to pass runtime information to the recompiler and have it treated as part of the shader key. Also pulls out most runtime state from Info struct
* shader_recompiler: Avoid some asserts
* video_core: Compile shader permutations
* spirv: Only specific storage image format for atomics
* ir: Avoid cube coord patching for storage image
* spirv: Fix default attributes
* data_share: Add more instructions
* video_core: Query storage flag with runtime state
* kernel: Use std::list for semaphore
* video_core: Use texture buffers for untyped format load/store
* buffer_cache: Limit view usage
* vk_pipeline_cache: Fix invalid iterator
* image_view: Reduce log spam when alpha=1 in storage swizzle
* video_core: More features and proper spirv feature detection
* video_core: Attempt no2 for specialization
* spirv: Remove conflict
* vk_shader_cache: Small cleanup
* shader_recompiler: handle fetch shader address offsets
parse index & offset sgpr from fetch shader and propagate them to vkBindVertexBuffers
* shader_recompiler: fix fetch_shader when offset is not present
* video_core: propagate index/offset SGPRs to vkCmdDraw instead of offsetting the buffer address
* video_core: add vertex_offset to non-indexed draw calls
renamed fetch offset fields
* cfg: Add one more divergence case
* Seen in RDR shaders
* renderer_vulkan: Reduce number of compiled shaders
* vk_pipeline_cache: Remove some unnecessary checks
This used to cause a fatal crash that would prevent Amplitude [CUSA02480] from booting beyond initialization.
A conditional true label would get an address starting with 0xffff...., which wasn't realistic with the given shader.
The multiplication by 4 causes the value to have its MSB set due to the smaller type.