Computation error: EXIT_TIME_LIMIT_EXCEEDED

Message boards : Number crunching : Computation error: EXIT_TIME_LIMIT_EXCEEDED
Message board moderation

To post messages, you must log in.

AuthorMessage
Crystal Pellet

Send message
Joined: 29 Sep 20
Posts: 14
Credit: 64,341
RAC: 0
Message 91 - Posted: 13 Oct 2020, 15:51:52 UTC

ID: 91 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Steve Dodd

Send message
Joined: 8 Apr 20
Posts: 13
Credit: 554,155
RAC: 0
Message 92 - Posted: 13 Oct 2020, 16:12:43 UTC

I had 2 WUs that were continuing to run past 6 hours (on different machines). I just aborted them. Hope I didn't do that prematurely. (I've only had 1 2_Gaia WUs complete successfully so far)
ID: 92 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zupa

Send message
Joined: 21 Aug 19
Posts: 75
Credit: 163,295
RAC: 0
Message 93 - Posted: 13 Oct 2020, 16:29:28 UTC - in response to Message 92.  

I am waiting for end of wus in progress and I will change 2_Gaia app to shorter time for each wus ....
ID: 93 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sergey Kovalchuk

Send message
Joined: 21 Aug 19
Posts: 23
Credit: 91,637
RAC: 0
Message 94 - Posted: 13 Oct 2020, 16:45:57 UTC - in response to Message 93.  
Last modified: 13 Oct 2020, 16:48:47 UTC

increase the difficulty of the task

<workunit>
<name>######</name>
<app_name>2_Gaia@home[20201013.11]</app_name>
<version_num>100</version_num>
<rsc_fpops_bound>86400000000000.000000</rsc_fpops_bound>


or decrease application performance

<app_version>
<app_name>2_Gaia@home[20201013.11]</app_name>
<version_num>100</version_num>
<flops>424268477.267049</flops>


by changing the size of the task, you keep the proportion, but it is too optimistic

changing the difficulty of a task is easiest.
performance is recalculated by the client after the test and task completion
ID: 94 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rantanplan

Send message
Joined: 29 Sep 20
Posts: 4
Credit: 50,008
RAC: 0
Message 95 - Posted: 13 Oct 2020, 18:02:44 UTC
Last modified: 13 Oct 2020, 18:03:14 UTC

At threadopener

Holy Cow , 60K seconds and a lousy 8.33 points for real !?
ID: 95 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Steve Dodd

Send message
Joined: 8 Apr 20
Posts: 13
Credit: 554,155
RAC: 0
Message 96 - Posted: 13 Oct 2020, 23:51:32 UTC - in response to Message 95.  
Last modified: 13 Oct 2020, 23:53:23 UTC

My validated 2_Gaia WUs haven't been that stingy on credit. Nothing to holler about either :).
ID: 96 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Werinbert

Send message
Joined: 15 Oct 19
Posts: 11
Credit: 2,848,916
RAC: 0
Message 97 - Posted: 14 Oct 2020, 1:41:34 UTC - in response to Message 96.  

The revised 2_Gaia have a large drop in credits today compared to earlier (yesterday). I assume this change happened after a fix to stop the timeouts as my machine is now running tasks longer. I am fine with 40-60 cr/hour on this machine, however, the current WUs are at 9.15 credits regardless of time which results in 2 to 3 cr/hour.

...something got messed up somewhere....
ID: 97 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Conan
Avatar

Send message
Joined: 27 Apr 20
Posts: 20
Credit: 668,559
RAC: 0
Message 103 - Posted: 16 Oct 2020, 21:56:35 UTC

I may run into this error soon as I have 3 work units that are using cpu but have been running 10, 12 and 15 hours.
The time remaining has run out on two of them and 3 seconds left on the other but it will take a long time to do those 3 seconds.

Thanks
Conan
ID: 103 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet

Send message
Joined: 29 Sep 20
Posts: 14
Credit: 64,341
RAC: 0
Message 118 - Posted: 19 Oct 2020, 6:58:14 UTC
Last modified: 19 Oct 2020, 6:58:50 UTC

Finally the task http://150.254.66.104/gaiaathome/result.php?resultid=2338280 mentioned here

ended with the error EXIT_TIME_LIMIT_EXCEEDED after almost 36 hours runtime.
ID: 118 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Werinbert

Send message
Joined: 15 Oct 19
Posts: 11
Credit: 2,848,916
RAC: 0
Message 119 - Posted: 19 Oct 2020, 7:15:05 UTC

Too many long running tasks that end in error...I have decided to set my machine to no new tasks for the time being until the current kinks are worked out.
ID: 119 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
GLadi

Send message
Joined: 9 Oct 20
Posts: 3
Credit: 264,349
RAC: 0
Message 127 - Posted: 19 Oct 2020, 10:11:44 UTC

Should we manually cancel tasks that are being calculated much longer than 2 hours, let's say those running over 5 hours? They are in infinite loop or something? I've got 2 tasks running over 10 hours and they will probably end with errors.
ID: 127 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zupa

Send message
Joined: 21 Aug 19
Posts: 75
Credit: 163,295
RAC: 0
Message 128 - Posted: 19 Oct 2020, 10:33:51 UTC - in response to Message 127.  

My previous solution was limited to 2h calculations, it does not work perfectly because sometimes the computation time is very long for one loop.
I checked the time every one loop.
I will change to a protection using kernel signals and I will terminate the process after max 3h ( 2h for normal time without problems). Next will send a signal of correct process ending to boinc and i will save temporary results.
I hope that this solution will save your credits .....
ID: 128 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Werinbert

Send message
Joined: 15 Oct 19
Posts: 11
Credit: 2,848,916
RAC: 0
Message 129 - Posted: 19 Oct 2020, 16:42:05 UTC

I wonder if there is not something else that might be causing problems..
This WU http://150.254.66.104/gaiaathome/workunit.php?wuid=1164886, the first two tasks were aborted after long run times. However the third task seemed to run for the normal length of time and was valid. So why were two tasks screwy and yet the WU still ended up having good results? Is there still some sort of calculation bug in the system that would lead one set of hardware to go to an infinite loop yet another set of hardware runs just fine?
ID: 129 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey
Avatar

Send message
Joined: 26 Feb 20
Posts: 17
Credit: 827,821
RAC: 0
Message 130 - Posted: 19 Oct 2020, 16:45:49 UTC - in response to Message 92.  
Last modified: 19 Oct 2020, 16:47:59 UTC

I had 2 WUs that were continuing to run past 6 hours (on different machines). I just aborted them. Hope I didn't do that prematurely. (I've only had 1 2_Gaia WUs complete successfully so far)


Hey that's good for me passing you...ol buddy!!! :-) :-)))
ID: 130 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zupa

Send message
Joined: 21 Aug 19
Posts: 75
Credit: 163,295
RAC: 0
Message 131 - Posted: 19 Oct 2020, 17:15:23 UTC - in response to Message 129.  

It is problem of numerical integrations...
Nominal star travels long distance from the Sun ( long integrations step, short calculation time for it)
Next 10 clons of star travel silimary distance (short calculation time)
Next clons travel very close to Sun ( short integration step, long calculations time) - and this situations is problematic
I plan stop at the moment calculations and save previous 10 clones.
ID: 131 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Fardringle

Send message
Joined: 27 Sep 20
Posts: 10
Credit: 7,196,719
RAC: 0
Message 155 - Posted: 23 Oct 2020, 6:55:07 UTC - in response to Message 131.  

Long running tasks are not necessarily bad, if they are needed to perform the calculations properly. But they should give more credits, based on the longer run times.

And you really should add checkpoints so the tasks don't have to start over at 0% if they are paused or interrupted (computer reboot, for example).
ID: 155 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
boboviz

Send message
Joined: 19 May 20
Posts: 27
Credit: 9,765
RAC: 0
Message 156 - Posted: 23 Oct 2020, 8:25:03 UTC - in response to Message 155.  

And you really should add checkpoints so the tasks don't have to start over at 0% if they are paused or interrupted (computer reboot, for example).

+1
ID: 156 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dataman
Avatar

Send message
Joined: 16 Sep 19
Posts: 4
Credit: 1,256,430
RAC: 0
Message 157 - Posted: 23 Oct 2020, 14:14:40 UTC - in response to Message 156.  

And you really should add checkpoints so the tasks don't have to start over at 0% if they are paused or interrupted (computer reboot, for example).

+1

+1
ID: 157 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Computation error: EXIT_TIME_LIMIT_EXCEEDED

©2024 GAVIP-GC