Reworking process_completion() function

This commit is contained in:
2026-05-02 16:17:17 +01:00
parent fb83c3114f
commit 73aa4808f2
5 changed files with 269 additions and 164 deletions

View File

@@ -26,13 +26,38 @@ It is a high performance cross platform Windows and Linux compatible program, it
UCRT64 uses the modern Universal C Runtime (ucrtbase.dll), which supports the newest APIs,
the standard MSYS2 uses the legacy msvcrt.dll and does not support IO Ring.
To install:
pacman -S mingw-w64-ucrt-x86_64-gcc
pacman -S mingw-w64-ucrt-x86_64-clang
pacman -S mingw-w64-ucrt-x86_64-cmake
or:
pacman -S mingw-w64-ucrt-x86_64-gcc
pacman -Syu
And add to path:
C:\msys64\ucrt64\bin
Additionally, to use clang-cl install the latest version of Windows SDK and MSVC, or at least select these in Visual Studio Installer:
* MSVC Build tools fo x64/86.
* C++ Build tools core features.
* MSBuild support for LLVM (clang-cl) toolset.
* Windows Universal C runtime.
* Windows Universal CRT SDK.
* Windows 11 SDK.
And use the MSVC command prompt or run a script to add MSVC environment variables to current session.
Ex: for PowerShell Terminal save as .ps1 (not persistent):
```ps1
# Add MS visual studio environment variables
cmd /c '"C:\Program Files (x86)\Microsoft Visual Studio\18\BuildTools\VC\Auxiliary\Build\vcvarsall.bat" x64 && set' |
ForEach-Object {
if ($_ -match "^(.*?)=(.*)$") {
Set-Item -Path "Env:$($matches[1])" -Value $matches[2]
}
}
```
Optional: to use the build system
pacman -S mingw-w64-ucrt-x86_64-cmake
The build system uses Ninja and fallsback to make, in Windows it prefers clang-cl > gcc > clang, and in Linux gcc > clang.
### Using a build system
| Command | Description|
| :--- | :--- |
@@ -52,7 +77,7 @@ clang -g -O0 file_hasher.c xxhash.c xxh_x86dispatch.c -o filehasher
clang-cl /Zi /Od file_hasher.c xxhash.c xxh_x86dispatch.c
## Linux
**Requirements**: GCC, CMake and Ninja
**Requirements**: GCC or clang, optional CMake, Ninja or make.
### Using a build system
| Command | Description|
@@ -109,13 +134,16 @@ While both systems share the same core concept, their APIs and management styles
| **Partial Updates** | Supports `IORING_REGISTER_FILES_UPDATE` to swap specific indices. | No partial updates; a new registration replaces the entire table. |
| **Scope of Operations** | Extremely broad (files, sockets, timers, signals, etc.). | Primarily focused on file storage (read, write, flush). |
### Completion Wait count
### Completion Wait count and peek
To avoid busy waiting when receiving CQEs, we can use io_uring_submit_and_wait() in Linux by entering a wait count,
the threads sleeps until the count of CQEs are received, in windows the wait_count is present in SubmitIoRing()
the threads sleep until the count of CQEs are received, in windows the wait_count is present in SubmitIoRing()
but is not implemented yet, so we wait with a completion event for a single completion. Another limitation on the completion
event is that the kernel will waik up the thread only when receiving the first CQE, after that we need to drain the completion
queue completely before sleeping again, or we enter an eternal slumber. And my config, each time the thread wakes up
it receives rarely more than 3 to 5 CQEs and most of the time only one CQE.
queue completely before sleeping again, or we enter an eternal slumber.
In the other hand, in Linux we can batch pop completions with io_uring_peek_batch_cqe() + io_uring_cq_advance(),
in Windows we can only pop one completion at a time with PopIoRingCompletion() (equivalent to io_uring_peek_cqe() + io_uring_cqe_seen()).
To simulate the same behavior as the Linux functions we use a double loop, an outer loop to control how much we wait
and in inner loop to drain all the available completions.
### Filtering CQEs
@@ -149,12 +177,15 @@ IO Ring implementation.
"Increase the limit to solve this warning.\n");
```
The Memlock limit in Linux restricts the amount of memory a process can
"lock" into physical RAM using the mlock() family of system calls. This
The Memlock limit in Linux restricts the amount of memory that can be
"locked" into physical RAM using the mlock() family of system calls. This
prevents the operating system from swapping that memory out to disk.
And registering buffers will lock the buffers memory so the hardware
can access it directly without kernel intervention and prevents the kernel from
swapping it to the SSD or HDD. Increase the limit to be able to register the buffers.
swapping it to the SSD or HDD.
This limit does not apply to a single process, but it applies to what all the runnig processes can lock, so in order
to be able to register the buffers, we need to set it to unlimited or increase it to at least:
num_hash_threads * NUM_BUFFERS_PER_THREAD * IORING_BUFFER_SIZE + extra memory reserved for other processes.
#### *Modifying the Limit*
The method for changing the memlock limit depends on whether you are
@@ -166,7 +197,7 @@ the /etc/security/limits.conf file. Add the following lines:
```conf
# Example for a specific user (replace 'username'), unlimited or a custom value in KB
username soft memlock unlimited
username hard memlock unlimited
username hard memlock unlimitedhttps://wiki.postgresql.org/wiki/AIO
```
```conf
# Example for all users