46 Commits

Author SHA1 Message Date
2a7bed5036 Allocate both ioring buffers and fallback buffer memory at the same time + bug fix 2026-06-19 23:16:17 +01:00
4fac135dce Replacing printf, putchar and snprintf in the hot path with custom functions
Replacing snprintf(), localtime() and strftime() with custom formating
functions

Reworking progress_thread(), instead of composing the progress printing
with multiple printf() and putchar, we compose it in a buffer and write
it at onse using WriteConsole in Windows and write in Linux
2026-05-29 13:00:53 +01:00
16c6aeae65 Minor optimisations and bug fixes
Fix bug in mt_mpmc.c, in Linux mutexes are not recursive.
Add arena_trim_string() to the arena API
Removing arena->path, now paths are pushed to arena->metadata
Replacing fe->owner[128] with char *owner; the owner is not pushed as a
string to arena->metadata and trimed with arena_trim_string()
Improving cache locality in arena->metadata, the memory layout is not
fe; fe->path; fe->owner.
Cache aligning all arenas except HasherContext->arena to sizeof(void *).
Pushing elements one by one instead of snprintf() in finalize_file() and
hash_worker().
Getting the full path of current directory instead of "."
Fixing bug in path formating, this allow us to remove normalize_path()
from the hot loop.
2026-05-08 20:04:56 +01:00
7d2a24d0be Experimenting with the restrict keyword 2026-05-06 10:25:47 +01:00
b8104b0fc7 MPMC queues implementation
Now we have 3 different API compatible MPMC queues that we can swap with
swapping the header.
mt_mpmc.h, a blocking queue that uses a mutex/critical section.
lf_mpmc.h, a lock free queue that uses atomics.
sm_mpmc.h, a hybrid queue that uses atomics and a semaphore to block
when the queue is empty.
In this program, for max performance it is recommanded to use sm_mpmc.h
or mt_mpmc.h, they are designed to avoid busy waiting which frees more
CPU time to do useful work.
2026-05-04 14:06:48 +01:00
759fdfda1e Project reordering and mpmc code 2026-05-04 13:39:49 +01:00
73aa4808f2 Reworking process_completion() function 2026-05-02 16:48:21 +01:00
fb83c3114f Add a build system 2026-05-01 20:59:51 +01:00
5cb47a17a2 Minor fixes after the merge
Deleting some duplicate functions and header
2026-04-28 22:04:53 +01:00
0faf2bc792 Merge branch 'io_ring' 2026-04-28 17:55:41 +01:00
b4487cd3a6 Finalizing the implementation of file registration
Adding the file system check in Linux(can be enabled from the config
file)
Adding a more options to the config file
Writing the README
2026-04-28 17:52:02 +01:00
3393129c5f Implementing registered files in io_uring
The windows implementation is disabled, currently registering files in
IO Ring when there is inflight IO operations causes corruptions.

Implementing a config file.

Some code cleanup
2026-04-24 15:30:04 +01:00
ab31776658 Reworking IO Ring pipeline to fully support multiple infilght files
Reworking the filequeue, the buffer chaining logic and the error
handling.
Renaming functions.
Fix bug in arena.
2026-04-23 19:53:58 +01:00
43ab4ed1c3 Merge pull request 'j'ai du faire ses changement pour que ca compile sur linux' (#1) from massinissa/filehasher:fix-build-linux into main
Reviewed-on: #1
v4.5
2026-04-16 19:06:01 +00:00
b8e577b5bb Porting IO Ring to linux by implementing io_uring 2026-04-15 23:15:00 +01:00
657752313e j'ai du faire ses changement pour que ca compile sur linux 2026-04-14 00:39:00 +01:00
0294498538 Add support for multiple inflight files and one shot hash small files
The IO Ring now supports bashing multiple submissions and can handle
multiple files at the same time.

Hashing small files using XXH3_128bits() instead of the streaming
pipeline(XXH3_128bits_reset(), XXH3_128bits_update(),
XXH3_128bits_digest()), this reduses the overhead of creating a state
and digest, coupled with the IO Ring it improves the hashing of small
files whose size is inferior to the size of IO Ring buffers
2026-04-02 14:31:58 +01:00
41ac164881 Updating the IO Ring, Updating the progress printing fn 2026-03-31 19:33:39 +01:00
d4ba121b56 Implementation of IO Ring in Windows
Fixing the two compilation warnings.
2026-03-31 00:26:03 +01:00
81d47fb675 Linux porting
Porting to linux
Reorganising the code
Improving the scan function
2026-03-18 23:38:54 +01:00
ed0326d796 Fixing user prompt parsing 2026-03-13 20:48:00 +01:00
d35858df01 Using xxhash xxh_x86dispatch to select the best SIMD instruction set at runtime
This dispatcher can not be added in a unity build and we must remove
AVX2 or AVX512 compilation flags, link xxh_x86dispatch.c in the
compilation command. The compilaiton throws two warnings about function
with internal linkage but not defined, they are defined in
xxh_x86dispatch.c so it's harmless warnings
2026-03-13 16:24:31 +01:00
c1abada7ba Updating the LF MPMC queue and replacing DirQueue with it
Making the MPMC queue support when producers are consumers at the same
time by adding a variable work, mpmc_push_work() that increments work
and mpmc_task_done() that decrements work, and if work = 0 calls
mpmc_producers_finished() that pushes poinsons to wake up sleeping
threads and make them return NULL

Replacing DirQueue, a queue growable with realloc with the MPMC queue
2026-03-12 13:57:09 +01:00
0e3ec5b09c Replacing Malloc and strdup in scan helper function with FileEntry and path arenas 2026-03-11 16:17:22 +01:00
aef070192f Using FindFirstFileA() instead of CreateFileA() to get the file size
Since we already call FindFirstFileA() and it returns the size there is
no need to open/close every file to get it's size
2026-03-11 09:02:17 +01:00
1fa306643f Align the MPMC queue to pagesize 2026-03-09 18:01:11 +01:00
f3c4cb7b76 plat_sem_destroy(&q->items_sem); 2026-03-09 17:27:45 +01:00
7d8b4addb7 Implementing a semaphore in the MPMC queue to wake up consumers 2026-03-09 16:44:43 +01:00
a299c4a1e1 LF MPMC queue improvements
Small improvements of the LF MPMC queue

Making the LF MPMC queue generic and in a seperate header file
2026-03-09 13:21:45 +01:00
b2f444af00 Making the LF MPMC queue generic and isolating it's code in a separate header file 2026-03-09 01:14:24 +01:00
75c2592bfe Making the hashing buffer reusable instead of malloc every file 2026-03-08 21:14:58 +01:00
c846952cbf forcing xxhash to use the stack instead of the heap 2026-03-08 10:47:18 +01:00
dd0797df79 hashers now use thread local arena
Instead of writing directly to file_hashes.txt, hash_workers now are
using a local arena, writing everything once at the end

using #pragma once to ensure that a given header file is included only
once in a single compilation unit
2026-03-08 10:46:05 +01:00
ee02b83094 Updating the changelog 2026-03-07 19:51:17 +01:00
8e8e6fe2b1 Merge branch 'mpmc_queue' 2026-03-07 17:12:09 +01:00
ac78f585d9 Rewriting hash_worker() to export file_hashes.txt 2026-03-07 17:10:35 +01:00
417dbad374 Bug fixes in lock free MPMC queue
Fix bug slots used before initialization,compare and swap is protecting
updating committed, but it is not protecting the memory initialization.
Adding atomic_flag commit_lock to protect against that

Fix bug multiple threads committing at the same time, fixed by using
atomic_flag commit_lock and re-checking committed after acquiring the
lock

Reorder helper functions
2026-03-07 11:36:11 +01:00
4967591ff8 Improving the lock free mpmc queue
Making the queue growable
Add padding to avoir false sharing
Add sleep() and SwitchToThread() to limit spinning
2026-03-07 01:40:31 +01:00
86ad30788a Add changlog and binaries 2026-03-06 20:52:29 +01:00
7099c1ddd6 Implementing lock free MPMC queue 2026-03-06 20:20:28 +01:00
9b327c82a6 Implementing simple MPMC queue
Rewrinting the pipeline and progress display
2026-03-06 16:44:37 +01:00
ca1bbefeaf Fix strdup warnings 2026-02-28 19:50:26 +01:00
e591dbf347 Add support for AVX2 instead of SSE2
Must compile with -mavx2 in clang/gcc or
 /arch:AVX2 in clang-cl
2026-02-28 19:44:43 +01:00
b89526d724 Removing -resume functionality 2026-02-28 19:09:28 +01:00
1744309b50 first commit 2026-02-28 10:54:16 +01:00
92aac64cf1 Initial commit 2026-02-23 23:21:13 +00:00