1) add try-catch around "join" call. It should help with program termination. 2) change the GCC thread patch, add "mx->u.i.acquired++" before return 0. 3) add serial logging to GCC thread patch in EBUSY branches.
Trying this now, will first try native-threading-patch fix with "mx->u.i.acquired++" before return 0.
ps. can show an example with that "add try-catch around "join" call.", never used this (and mostly never use c++ for real:) )
Quote:
For testing, what happens when you run the example on this page:
======== Testing module TestThreading before join after join before join after join before join after join before join after join before join after join [PASS] testStartStopWait - 485ms before join after join before join
So, it does 5 joins. And seems passed the testStartStopWait(). But then, when it starts to test testAtomicSemaphoreThread() from above-mentioned test_threading.cpp, then it seems to fail again.
Question is: do we deal with first issue now, and meeting with another one in semaphores ? For testStartStopWait test i can see in void TestThreading::testStartStopWait() : for (size_t i = 0; i != 5; i++) { ... } , so maybe indeed correctly works now and we need to approve patch to adtools ?
But then second question, why it stops on second joing from void TestThreading::testAtomicSemaphoreThread():
AtomicTestThread *threads[num_threads];
for (auto &thread : threads) {
thread = new AtomicTestThread(val, trigger);
UASSERT(thread->start());
}
trigger.post(num_threads);
for (AtomicTestThread *thread : threads) {
thread->wait();
delete thread;
}
UASSERT(val == num_threads * 0x10000);
}
Does that maybe issue in our semaphores-POSIX implementation? Frederik, maybe you have an idea about the second one, and if we correctly fix now gthread_mutex_trylock() and this one also needs to be added to adtools ?
PS. Was in hope that this fix in thread-native will fix that issue: https://github.com/sba1/adtools/issues/82, but nope, still something else (but also probably close enough to what you find). But our new issue also looks pretty close to that bug report too. Maybe will be just another bug in the native-threading-patch.
PS2. Oh, and what is interesting, the actual game can't start because of the same issue :) I.e. when I click "play game" it shows me in the shell the same "before join, after join, before join" and hang. Just like when I run test_threading and it acts when checking testAtomicSemaphoreThread() :)
At least now I know why the game didn't start (were about to start debugging network code)
Edited by kas1e on 2021/1/5 20:23:27 Edited by kas1e on 2021/1/5 20:30:59 Edited by kas1e on 2021/1/5 20:56:34 Edited by kas1e on 2021/1/5 21:01:31
I browsed through the thread patch and it seems like a potential problem that threadstore is accessed without semaphore in join, detach and cond_wait, at least.
I browsed through the thread patch and it seems like a potential problem that threadstore is accessed without semaphore in join, detach and cond_wait, at least.
Maybe can cook something to test with? I mean is it just a few additional lines that need to be added in those functions for a test, or need some rewriting?
Anyway, I put bunch of printfs to the testAtomicSemaphoreThread(), like this:
we in the testAtomicSemaphoreThread
first for() count = 0
before thread = new AtomicTestThread(val, trigger);
thread->start seems fine too
first for() count = 1
before thread = new AtomicTestThread(val, trigger);
thread->start seems fine too
first for() count = 2
before thread = new AtomicTestThread(val, trigger);
thread->start seems fine too
first for() count = 3
before thread = new AtomicTestThread(val, trigger);
thread->start seems fine too
before trigger.post(num_threads);
after trigger.post(num_threads);
Second for() count = 0
before thread->wait
before join
so we try to join()
after join
before delete thread
Second for() count = 1
before thread->wait
before join
so we try to join()
<output hang there>
So first "For" for 4 threads works ok (created 4 AtomicTestThreads and start them ), then trigger.post(num_threads) also seems ok, and then on the second "For" for 4 threads, it passes the only the first one, and hang on the second thread while doing thread->wait() in the join().
Edited by kas1e on 2021/1/6 8:56:09 Edited by kas1e on 2021/1/6 9:08:33 Edited by kas1e on 2021/1/6 9:12:57
Threadstore protection is a bit complex and would be more than 2 lines of code (missing in various places). In fact I'm not familiar with this code at all so it would take some more time to investigate.
Could you try to add serial debug in __gthread_join near line 966:
+ /* FIXME: This is very ugly, use SIGF_SINGLE */
+ while (!thr->finished)
+ {
+ iexec->DebugPrintF("Waiting for thread\n");
+ idos->Delay (1);
+ }
Added your printf, and have on serial when things hangs never ended loop of our printfs:
Quote:
Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread
It just never ends, and it waits forever and so because of that we hang.
So yeah, seems your suggestion is the correct one. Or at least we close enough :) As i see from SBA's comment at top of that "Fix me" code, he think about use of SIGF_SINGLE , is it of any help in right direction ?
Edited by kas1e on 2021/1/6 16:14:28 Edited by kas1e on 2021/1/6 16:18:11
The finished field in struct threadentry may have to be made volatile. Otherwise the compiler will probably read it only once at the beginning of the loop and then assume that it will never change.
after findtask + strcutprocess_userData , but before Wait for the parent task after while with (SIGBREAKF_CTRL_F) & SIGBREAKF_CTRL_F) Waiting for thread after findtask + strcutprocess_userData , but before Wait for the parent task after while with (SIGBREAKF_CTRL_F) & SIGBREAKF_CTRL_F) Waiting for thread after findtask + strcutprocess_userData , but before Wait for the parent task after while with (SIGBREAKF_CTRL_F) & SIGBREAKF_CTRL_F) Waiting for thread after findtask + strcutprocess_userData , but before Wait for the parent task after while with (SIGBREAKF_CTRL_F) & SIGBREAKF_CTRL_F) Waiting for thread after findtask + strcutprocess_userData , but before Wait for the parent task after while with (SIGBREAKF_CTRL_F) & SIGBREAKF_CTRL_F) Waiting for thread after findtask + strcutprocess_userData , but before Wait for the parent task after while with (SIGBREAKF_CTRL_F) & SIGBREAKF_CTRL_F) after findtask + strcutprocess_userData , but before Wait for the parent task after while with (SIGBREAKF_CTRL_F) & SIGBREAKF_CTRL_F) after findtask + strcutprocess_userData , but before Wait for the parent task after while with (SIGBREAKF_CTRL_F) & SIGBREAKF_CTRL_F) after findtask + strcutprocess_userData , but before Wait for the parent task after while with (SIGBREAKF_CTRL_F) & SIGBREAKF_CTRL_F) Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread
As I can see, the first 4 thread waiting are fine (so our first 4 for() from the test case). Then, when our second test from the test case start (that atomicsemaprhore one), then 1 thread waiting fine, but then on second one when we have a halt, it does 3 additional "findtask/thr=" and seems this "while" with SIGBREAKFs didn't find a shit, and all going to loop "waiting for the thread"?
It's like the second test also does 4 times the test as expected, but seems to wait for answers borks for 2,3, and 4 threads.
EDIT: I also comment out for the sake of more clean output from test-threading code a TEST(testStartStopWait) and keep only our failing one: TEST(testAtomicSemaphoreThread), and clean output with our new debugs are:
Quote:
after findtask + strcutprocess_userData , but before Wait for the parent task after while with (SIGBREAKF_CTRL_F) & SIGBREAKF_CTRL_F) after findtask + strcutprocess_userData , but before Wait for the parent task after while with (SIGBREAKF_CTRL_F) & SIGBREAKF_CTRL_F) after findtask + strcutprocess_userData , but before Wait for the parent task after while with (SIGBREAKF_CTRL_F) & SIGBREAKF_CTRL_F) after findtask + strcutprocess_userData , but before Wait for the parent task after while with (SIGBREAKF_CTRL_F) & SIGBREAKF_CTRL_F) Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread
@Salas00 Will check now, but the line you point out are from __gthread_create(), so i can add printfs in both __gthread_create() around ObtainSemaphore and in __gthread_entry() also around ObtainSemaphore
Before ObtainSemaphore in __gthread_create After ObtainSemaphore in __gthread_create after findtask + strcutprocess_userData , but before Wait for the parent task after while with (SIGBREAKF_CTRL_F) & SIGBREAKF_CTRL_F) Before ObtainSemaphore in __gthread_create after findtask + strcutprocess_userData , but before Wait for the parent task After ObtainSemaphore in __gthread_create after while with (SIGBREAKF_CTRL_F) & SIGBREAKF_CTRL_F) Before ObtainSemaphore in __gthread_create after findtask + strcutprocess_userData , but before Wait for the parent task After ObtainSemaphore in __gthread_create after while with (SIGBREAKF_CTRL_F) & SIGBREAKF_CTRL_F) Before ObtainSemaphore in __gthread_create After ObtainSemaphore in __gthread_create after findtask + strcutprocess_userData , but before Wait for the parent task after while with (SIGBREAKF_CTRL_F) & SIGBREAKF_CTRL_F) Before ObtainSemaphore in __gthread_entry After ObtainSemaphore in __gthread_entry Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread Waiting for thread
As I can see it only 1-time call ObtainSemaphore from the __gthread_entry at the end. Probably that why we have 1 thread from debug output passes, but the other 3 never happens, and waiting forever for a 3 threads to answer. At least how I understand the output.
It also looks like we send it 4 times CTRL_F, but only one is received (right?)
Interesting also, that at first, we have clean ObtainSemaphore() call from __gthread_create, without anything else, but then all next ones mixed with CTRL+F checks, but dunno maybe that expected to be like this.
Unless I've misunderstood your debug output then all four threads have received the CTRL-F signal.
The problem seems to be that only one of them manages to return from "thr->result = thr->entry(thr->args);" as there is only one message "Before ObtainSemaphore in __gthread_entry" in output.
Maybe there is some problem with the atomic type that causes it to deadlock?
Edit: No I think the problem is that sem_post() (trigger.post() in TestThreading::testAtomicSemaphoreThread()) only successfully triggers one of the threads.
Could you maybe add some debug output around trigger.wait() to check this?
I made some changes in the sem_wait()/sem_timedwait() and sem_post() functions that should hopefully fix the problem.
In the previous version it was possible that when sem_post() was called four times in quick succession that only the first thread in the wait list was signaled four times and the others were just left waiting.