shadPS4

mirror of https://github.com/shadps4-emu/shadPS4.git synced 2025-07-23 18:45:36 +00:00

Author	SHA1	Message	Date
baggins183	3019bfb978	Implement MUBUF instructions for shorts/bytes (#2856 ) * implement loads/store instructions for types smaller than dwords * initialize s16/s8 types * set profile for int8/16/64 * also need to zero extend u8/u16 to u32 result * document unrelated bugs with atomic fmin/max * remove profile checks and simple emit for added opcodes --------- Co-authored-by: georgemoralis <giorgosmrls@gmail.com>	2025-07-18 12:04:50 +03:00
TheTurtle	6e350a5085	Avoid clearing HTILE when shader contains address calculation (#3252 ) * resource_tracking: Mark image as written when its used with atomics * texture_cache: Remove meta registered flag Mostly useless and it is possible for images to switch metas * vk_rasterizer: Use xor as heuristic for HTILE clear	2025-07-16 01:28:03 +03:00
TheTurtle	83475ac828	attribute: Correct bary coord function (#3253 )	2025-07-15 22:55:57 +03:00
TheTurtle	4407ebdd9b	shader_recompiler: Implement guest barycentrics (#3245 ) * shader_recompiler: Implement guest barycentrics * Review comments and some cleanup	2025-07-15 18:49:12 +03:00
TheTurtle	399a725343	shader_recompiler: Replace buffer pulling with attribute divisor for instance step rates (#3238 ) * shader_recompiler: Replace buffer pulling with attribute divisor for instance step rates * flatten_extended_userdata: Remove special step rate buffer handling * Review comments * spirv_emit_context: Name all instance rate attribs properly * spirv: Merge ReadConstBuffer again template function only has 1 user now * attribute: Add missing attributes * translate: Reimplement step rate instance id * Resolve validation warnings * shader_recompiler: Separate vertex inputs from LS stage, cleanup tess	2025-07-14 00:32:02 +03:00
TheTurtle	8bc30270c8	shader_recompiler: Implement ff1 with subgroup ops (#3225 )	2025-07-10 21:52:56 +03:00
TheTurtle	88abb93669	ir_passes: Fold readlane with ff1 pattern (#3224 )	2025-07-10 14:19:44 +03:00
TheTurtle	27cbd6647f	shader_recompiler: Reorganize data share operations and implement GDS bit (#3222 ) * shader_recompiler: Reorganize data share operations and implement GDS bit * Review comments	2025-07-10 13:38:50 +03:00
georgemoralis	d6163a6edb	uber fix	2025-07-07 13:37:08 +03:00
Paris Oplopoios	4eaa992aff	Rename 'AddCary' to 'AddCarry' (#3206 )	2025-07-07 13:29:11 +03:00
georgemoralis	4f99f304e6	Revert "Avoid clearing depth on partial HTILE writes (#3167 )" (#3190 ) This reverts commit `59dd73492b`.	2025-07-04 09:57:01 +03:00
TheTurtle	df22c4225e	config: Add toggle for DMA (#3185 ) * config: Add toggle for DMA * config: Log new config	2025-07-03 20:03:06 +03:00
Marcin Mikołajczyk	7431b30005	Support for BUFFER_ATOMIC_S/UMIN_X2 (#3182 ) * Fix BufferAtomicS/UMax64 SPIR-V emitting * Support for BUFFER_ATOMIC_S/UMIN_X2	2025-07-02 18:13:07 -07:00
nickci2002	9eae6b57ce	V_CMP_EQ_U64 support (#3153 ) * Added V_CMP_EQ_U64 shader opcode support and added 64-bit relational operators (<,>,<=,>=) * Fixed clang-format crying because I typed xargs clang-format instead of xargs clang-format-19 * Replaced V_CMP_EQ_U64 code to match V_CMP_U32 to test * Updated V_CMP_U64 for future addons	2025-07-02 19:22:30 +03:00
Marcin Mikołajczyk	1757dfaf5a	buffer_atomic_imax_x2 (#3130 ) * buffer_atomic_imax_x2 * Define Int64Atomics SPIR-V capability	2025-06-29 16:16:47 -07:00
TheTurtle	59dd73492b	Avoid clearing depth on partial HTILE writes (#3167 ) * vk_rasterizer: Avoid full depth clear in case of partial HTILE update * resource_tracking: Mark image as written when its used with atomics	2025-06-29 00:53:14 +03:00
TheTurtle	a49b13fe66	shader_recompiler: Optimize general case of buffer addressing (#3159 ) * shader_recompiler: Simplify dma types Only U32 is needed for S_LOAD_DWORD * shader_recompiler: Perform address shift on IR level Buffer instructions now expect address in the data unit they work on. Doing the shift on IR level will allow us to optimize some operations away on common case * shader_recompiler: Optimize common buffer access pattern * emit_spirv: Use 32-bit integer ops for fault buffer Not many GPUs have 8-bit bitwise or operations so that would probably require some overhead to emulate from the driver * resource_tracking_pass: Fix texel buffer shift	2025-06-26 12:14:36 +03:00
squidbus	669b19c2f3	shader_recompiler: Fix handling unbound depth image. (#3143 ) * shader_recompiler: Fix handling unbound depth image. * shader_recompiler: Consolidate unbound image handling.	2025-06-21 22:18:00 -07:00
Marcin Mikołajczyk	423254692a	Implement buffer atomic fmin/fmax instructions (#3123 )	2025-06-19 17:37:29 -07:00
Marcin Mikołajczyk	efa8f6a154	Handle immediate inline samplers (#3015 ) * Handle immediate inline sampler * Simplify inline sampler handling	2025-06-16 23:42:14 -07:00
squidbus	c71dc740e2	shader_recompiler: Reduce cases where shared memory to buffer pass is needed. (#3082 )	2025-06-11 13:24:41 -07:00
TheTurtle	dedf6de2ac	texture_cache: Implement color<->depth copies (#3079 ) * texture_cache: Implement color to depth copies and vise versa * ir_passes: Adjust shared memory barrier pass to cover more cases * texture_cache: Remove unused code * review comment	2025-06-11 01:34:37 -07:00
squidbus	ca92e72efe	shader_recompiler: Various fixes to shared memory and atomics. (#3075 ) * shader_recompiler: Various fixes to shared memory and atomics. * shader_recompiler: Re-type non-32bit load/stores.	2025-06-10 15:41:58 -07:00
squidbus	e2b726382e	vulkan: Fix two validation errors introduced by shared memory changes. (#3074 )	2025-06-09 19:48:20 -07:00
Marcin Mikołajczyk	217d32b502	Handle DS_READ_U16, DS_WRITE_B16, DS_ADD_U64 (#3007 ) * Handle DS_READ_U16 & DS_WRITE_B16 * Refactor DS translation * Translate DS_ADD_U64 * format * Fix RingAccessElimination after changing WriteShared64 type * Simplify bounds checking in generated SPIR-V	2025-06-09 22:03:38 +03:00
Lander Gallastegi	a71bfb30a2	shader_recompiler: Patch SRT walker on segfault (#2991 ) * Patch srt walker access violations * Fix range * clang-format lolz * Lower log from warning to debug	2025-06-09 13:04:21 +03:00
TheTurtle	c20d02dd40	shader_recompiler: Better handling of geometry shader scenario G (#3064 )	2025-06-08 17:31:51 -07:00
TheTurtle	ce42eccc9d	texture_cache: Handle compressed views of uncompressed images (#3056 ) * pixel_format: Remove unused tables, refactor * host_compatibilty: Cleanup and support uncompressed views of compressed formats * texture_cache: Handle compressed views of uncompressed images * tile_manager: Bump max supported mips to 16 Fixes a crash during start * oops * texture_cache: Fix order of format compat check	2025-06-08 23:09:08 +03:00
TheTurtle	8ffcfc87bd	shader_recompiler: Implement linear interpolation support (#3055 )	2025-06-08 22:46:34 +03:00
Marcin Mikołajczyk	ce84e80f65	BUFFER_ATOMIC_CMPSWAP (#3045 )	2025-06-08 11:43:58 -07:00
Marcin Mikołajczyk	fff3bf9917	s_flbit_i32_b64 (#3033 ) * s_flbit_i32_b64 * Split FindUMsb64 into two 32bit ops	2025-06-05 14:33:25 -07:00
Marcin Mikołajczyk	2091bc5651	Handle R128 bit in MIMG instructions (#3010 )	2025-05-29 16:56:24 -07:00
Lander Gallastegi	f9bbde9c79	video_core: Implement DMA. (#2819 ) * Import memory * 64K pages and fix memory mapping * Queue coverage * Buffer syncing, faulted readback adn BDA in Buffer * Base DMA implementation * Preparations for implementing SPV DMA access * Base impl (pending 16K pages and getbuffersize) * 16K pages and stack overflow fix * clang-format * clang-format but for real this time * Try to fix macOS build * Correct decltype * Add testing log * Fix stride and patch phi node blocks * No need to check if it is a deleted buffer * Clang format once more * Offset in bytes * Removed host buffers (may do it in another PR) Also some random barrier fixes * Add IR dumping from my read-const branch * clang-format * Correct size insteed of end * Fix incorrect assert * Possible fix for NieR deadlock * Copy to avoid deadlock * Use 2 mutexes insteed of copy * Attempt to range sync error * Revert "Attempt to range sync error" This reverts commit dd287b48682b50f215680bb0956e39c2809bf3fe. * Fix size truncated when syncing range And memory barrier * Some fixes (and async testing (doesn't work)) * Use compute to parse fault buffer * Process faults on submit * Only sync in the first time we see a readconst Thsi is partialy wrong. We need to save the state into the submission context itself, not the rasterizer since we can yield and process another sumission (if im not understanding wrong). * Use spec const and 32 bit atomic * 32 bit counter * Fix store_index * Better sync (WIP, breaks PR now) * Fixes for better sync * Better sync * Remove memory coveragte logic * Point sirit to upstream * Less waiting and barriers * Correctly checkout moltenvk * Bring back applying pending operations in wait * Sync the whole buffer insteed of only the range * Implement recursive shared/scoped locks * Iterators * Faster syncing with ranges * Some alignment fixes * fixed clang format * Fix clang-format again * Port page_manager from readbacks-poc * clang-format * Defer memory protect * Remove RENDERER_TRACE * Experiment: only sync on first readconst * Added profiling (will be removed) * Don't sync entire buffers * Added logging for testing * Updated temporary workaround to use 4k pages * clang.-format * Cleanup part 1 * Make ReadConst a SPIR-V function --------- Co-authored-by: georgemoralis <giorgosmrls@gmail.com>	2025-05-22 21:00:15 +03:00
squidbus	3f949d2b6c	amdgpu: Handle 32-bit Unorm formats. (#2974 )	2025-05-22 03:16:20 -07:00
squidbus	b130fe6ed5	vulkan: Handle incompatible depth format using null binding. (#2892 ) Co-authored-by: kalaposfos13 <153381648+kalaposfos13@users.noreply.github.com>	2025-05-09 08:43:20 -07:00
Mahmoud Adel	b0e4e87ff3	Implement SnormNz conversion (#2841 ) * + * + * Unpack Snorm 2x16 * + * SintToSnormNz * all is broken ig.... * review changes * my stupid ass messed all while trying to resolve the conflicts.. * + * + * fix rebase * clang-format fix (1) * clang-format fix (2) --------- Co-authored-by: squidbus <175574877+squidbus@users.noreply.github.com>	2025-05-01 02:12:15 -07:00
squidbus	10b24d04bc	fix: Add new image atomic instructions to relevant lists.	2025-04-30 17:55:50 -07:00
Marcin Mikołajczyk	c08f92aca1	Implement IMAGE_ATOMIC_FMIN and IMAGE_ATOMIC_FMAX for 32bit floats (#2820 ) * Implement IMAGE_ATOMIC_FMIN and IMAGE_ATOMIC_FMAX for 32bit floats * Handle missing VK_EXT_shader_atomic_float2	2025-04-30 11:42:08 -07:00
squidbus	81fa9b7fff	shader_recompiler: Add lowering pass for when 64-bit float is unsupported. (#2858 ) * shader_recompiler: Add lowering pass for when 64-bit float is unsupported. * shader_recompiler: Fix PackDouble2x32/UnpackDouble2x32 type. * shader_recompiler: Remove extra bit cast implementations.	2025-04-28 00:04:16 -07:00
squidbus	b505829e16	lower_buffer_format_to_raw: Fix handling of format remapping. (#2857 )	2025-04-27 16:52:52 -07:00
Dmugetsu	ddc05e8a5f	Implementing DS_SUB_U32, DS_INC_U32, DS_DEC_U32. (#2797 ) * Implementing DS_SUB_U32, DS_INC_U32, DS_DEC_U32, DS_WRITE_SRC2_B32, DS_WRITE_SRC2_B64. * Added ir instructions for new opcodes. Removing Write implementations. Maping operation S_BFE_I32 as it was added in translate but wasnt pointing to anything. * Suggestions	2025-04-16 17:56:27 -07:00
squidbus	52ab1ed04b	shader_recompiler: Implement S_FLBIT_I32_B32 and V_MUL_HI_I32. (#2793 )	2025-04-16 18:08:09 +03:00
squidbus	4bea00135d	resource_tracking_pass: Add heuristic to detect incorrectly tracked buffer sharp. (#2786 )	2025-04-14 20:58:49 -07:00
squidbus	bec1b9056f	shader_recompiler: Misc shader fixes. (#2781 ) * shader_recompiler: Fix frexp exponent type. * shader_recompiler: Implement V_CMP_CLASS_F32 negative class mask. * shader_recompiler: Define operands for DS_ORDERED_COUNT.	2025-04-13 23:46:30 -07:00
squidbus	afd0251dd2	shader_recompiler: Use VK_AMD_shader_trinary_minmax when available. (#2739 ) Some checks are pending Build and Release / reuse (push) Waiting to run Details Build and Release / clang-format (push) Waiting to run Details Build and Release / get-info (push) Waiting to run Details Build and Release / windows-sdl (push) Blocked by required conditions Details Build and Release / windows-qt (push) Blocked by required conditions Details Build and Release / macos-sdl (push) Blocked by required conditions Details Build and Release / macos-qt (push) Blocked by required conditions Details Build and Release / linux-sdl (push) Blocked by required conditions Details Build and Release / linux-qt (push) Blocked by required conditions Details Build and Release / linux-sdl-gcc (push) Blocked by required conditions Details Build and Release / linux-qt-gcc (push) Blocked by required conditions Details Build and Release / pre-release (push) Blocked by required conditions Details * shader_recompiler: Use VK_AMD_shader_trinary_minmax when available. * shader_recompiler: Simplify signed/unsigned trinary instruction variants.	2025-04-02 23:36:54 +03:00
TheTurtle	1f9ac53c28	shader_recompiler: Improve divergence handling and readlane elimintation (#2667 ) Some checks are pending Build and Release / reuse (push) Waiting to run Details Build and Release / clang-format (push) Waiting to run Details Build and Release / get-info (push) Waiting to run Details Build and Release / windows-sdl (push) Blocked by required conditions Details Build and Release / windows-qt (push) Blocked by required conditions Details Build and Release / macos-sdl (push) Blocked by required conditions Details Build and Release / macos-qt (push) Blocked by required conditions Details Build and Release / linux-sdl (push) Blocked by required conditions Details Build and Release / linux-qt (push) Blocked by required conditions Details Build and Release / linux-sdl-gcc (push) Blocked by required conditions Details Build and Release / linux-qt-gcc (push) Blocked by required conditions Details Build and Release / pre-release (push) Blocked by required conditions Details * control_flow_graph: Improve divergence handling * recompiler: Simplify optimization passes Removes a redudant constant propagation and cleans up the passes a little * ir_passes: Add new readlane elimination pass The algorithm has grown complex enough where it deserves its own pass. The old implementation could only handle a single phi level properly, however this one should be able to eliminate vast majority of lane cases remaining. It first performs a traversal of the phi tree to ensure that all phi sources can be rewritten into an expected value and then performs elimintation by recursively duplicating the phi nodes at each step, in order to preserve control flow. * clang format * control_flow_graph: Remove debug code	2025-03-23 00:35:42 +02:00
baggins183	7a4244ac8b	Misc Cleanups (#2579 ) -dont do trivial phi removal during SRT pass, that's now done in ssa_rewrite -remove unused variable when walking tess attributes -fix some tess comments	2025-03-02 21:52:32 +02:00
TheTurtle	76b4da6212	video_core: Various small improvements and bug fixes (#2525 ) * ir_passes: Add barrier at end of block too * vk_platform: Always assign names to resources * texture_cache: Better overlap handling * liverpool: Avoid resuming ce_task when its finished * spirv_quad_rect: Skip default attributes Fixes some crashes * memory: Improve buffer size clamping * liverpool: Relax binary header validity check * liverpool: Stub SetPredication with a warning * Better than outright crash * emit_spirv: Implement round to zero mode * liverpool: queue::pop takes the front element * image_info: Remove obsolete assert The old code assumed the mip only had 1 layer thus a right overlap could not return mip 0. But with the new path we handle images that are both mip-mapped and multi-layer, thus this can happen * tile_manager: Fix size calculation * spirv_quad_rect: Skip default attributes --------- Co-authored-by: poly <47796739+polybiusproxy@users.noreply.github.com> Co-authored-by: squidbus <175574877+squidbus@users.noreply.github.com>	2025-02-24 14:31:12 +02:00
squidbus	9424047214	shader_recompiler: Proper support for inst-typed buffer format operations. (#2469 )	2025-02-21 03:01:18 -08:00
¥IGA	8447412c77	Bump to Clang 19 (#2434 )	2025-02-18 15:55:13 +02:00

1 2 3 4

170 Commits