cancel wu's

Message boards : Number crunching : cancel wu's
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
fzs600

Send message
Joined: 21 Aug 19
Posts: 4
Credit: 1,238,708
RAC: 0
Message 106 - Posted: 18 Oct 2020, 17:04:44 UTC

hello
is it normal to cancel wu's already partially calculated without crediting them?

thank you

18 Oct 2020, 15:32:00 UTC 18 Oct 2020, 15:58:35 UTC Annulé par le serveur 1,183.18 1,171.00 --- 2_Gaia@home v1.00
x86_64-pc-linux-gnu

18 Oct 2020, 15:00:51 UTC 18 Oct 2020, 16:46:25 UTC Annulé par le serveur 6,316.74 6,126.78 --- 2_Gaia@home v1.00
x86_64-pc-linux-gnu


18 Oct 2020, 15:32:00 UTC 18 Oct 2020, 15:58:35 UTC Annulé par le serveur 1,183.18 1,171.00 --- 2_Gaia@home v1.00
x86_64-pc-linux-gnu
ID: 106 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sergey Kovalchuk

Send message
Joined: 21 Aug 19
Posts: 23
Credit: 91,637
RAC: 0
Message 107 - Posted: 18 Oct 2020, 17:40:08 UTC - in response to Message 106.  

In the next topic, the admin gave the answer

Staram sie w chwili zaistnienia problemu wywolywac przerwanie obliczen przez serwer w celu ochrony czasu obliczen na Panstwa procesorach.
W tej chwili opracowuje wersje programu ktorej piorytetem bedzie ochrona czasu obliczen ( zadanie zakonczone po okreslonym czasie ok 2h).


At the moment of the problem, I try to cause the server to interrupt the computation in order to protect the computing time on your processors.
At the moment, it is developing a version of the program, the priority of which will be to protect the computation time (the task will be completed after a certain time, about 2 hours).
ID: 107 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zupa

Send message
Joined: 21 Aug 19
Posts: 75
Credit: 163,295
RAC: 0
Message 108 - Posted: 18 Oct 2020, 18:14:18 UTC - in response to Message 107.  

The queue is cleared for tasks where the given number of calculations is achieved for some targets ( about 1 % tasks).

I will create automatically system for add jobs for calculations respect achieved solutions, without remove jobs form queue.
We plan to send about 750 000 wus for calculations ...
ID: 108 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zupa

Send message
Joined: 21 Aug 19
Posts: 75
Credit: 163,295
RAC: 0
Message 109 - Posted: 18 Oct 2020, 18:28:29 UTC - in response to Message 108.  

queue cleaning complete
ID: 109 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sergey Kovalchuk

Send message
Joined: 21 Aug 19
Posts: 23
Credit: 91,637
RAC: 0
Message 110 - Posted: 18 Oct 2020, 18:53:04 UTC - in response to Message 108.  

We plan to send about 750 000 wus for calculations ...


Quite a large amount of calculations, accordingly, statistics of received credits ;o)

I would like to hear a small comment about the structure of the planned calculations

Will this be one app (current version)? Or do new applications / versions appear as calculations are made?
Will the statistics be common for the entire project or are you planning to account separately for applications?
If the "reward" for the amount of computation has already been determined, are you planning on introducing a badge system to reward users who reach certain milestones?

It is advisable for participants to plan ahead of time for the opportunity to achieve certain goals, both locally in the project and on external statistics sites
ID: 110 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet

Send message
Joined: 29 Sep 20
Posts: 14
Credit: 64,341
RAC: 0
Message 111 - Posted: 18 Oct 2020, 19:37:12 UTC - in response to Message 107.  

At the moment of the problem, I try to cause the server to interrupt the computation in order to protect the computing time on your processors.
At the moment, it is developing a version of the program, the priority of which will be to protect the computation time (the task will be completed after a certain time, about 2 hours).
This is not happening with this task on my machine:
Application 2_Gaia@home 1.00 
Name 2_7281
State Running
Received Sat 17 Oct 2020 13:38:59 CEST
Report deadline Mon 19 Oct 2020 13:38:58 CEST
Estimated computation size 3,600 GFLOPs
CPU time 1d 01:46:33
CPU time since checkpoint 1d 01:46:33
Elapsed time 1d 01:46:35
Estimated time remaining 00:00:00
Fraction done 100.000%
Virtual memory size 11.54 MB
Working set size 8.90 MB
Directory  slots/1
Process ID 4876
Progress rate 3.960% per hour
Executable 2_Gaia@home[20201017.07]_x86_64-pc-linux-gnu

and it's not (yet) aborted by the server.
ID: 111 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zupa

Send message
Joined: 21 Aug 19
Posts: 75
Credit: 163,295
RAC: 0
Message 112 - Posted: 18 Oct 2020, 19:59:14 UTC - in response to Message 111.  

2_Gaia@home :
The 2_Gaia@home check system time before start main loop of calculation.
Then, it checks the system time on each loop steps.
The main loop is broken if time difference between start and actual time is greater than 2h.
The 2_Gaia@home is finishing work and prepare results.

The number of lines in the output file is different for different processors.

Also, a surprise for me is why in some cases the loop does not end. :(
These are sporadic cases.
ID: 112 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zupa

Send message
Joined: 21 Aug 19
Posts: 75
Credit: 163,295
RAC: 0
Message 113 - Posted: 18 Oct 2020, 20:16:44 UTC - in response to Message 112.  

I think I found out why this is happening with 2_Gaia@home

We numerically calculate the motion of the star cluster near the Sun in the gravitational field of the Galaxy.
For each star of cluster we draw clones using the covariance matrix from the Gaia catalog.

Sometimes a random clone requires a very small integration step which increases the computation time for the loop step (2_Gaia@home app).
Unfortunately, we are not able to predict such a situation :(
ID: 113 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zupa

Send message
Joined: 21 Aug 19
Posts: 75
Credit: 163,295
RAC: 0
Message 114 - Posted: 18 Oct 2020, 20:52:40 UTC - in response to Message 110.  
Last modified: 18 Oct 2020, 21:10:54 UTC

Status of apps:
1_Gaia@home - final
2_Gaia@home - final (I hope, We will check the received numbers for the first full results obtained today. Random checks were positive)

Hopefully the new 2_Gaia@home computing strategy will benefit you and us. (If not then I will look for the next calculation strategy)

> I would like to hear a small comment about the structure of the planned calculations

Currently, all calculations are performed using the Gaia DR2 star catalog.
1_Gaia@home calculates the trajectories of comet clones in the gravitational field of the star cluster and the galaxy
2_Gaia@home calculates the movement of star clones in the gravitational field of the galaxy.

The DR3 Gaia catalog will be available in a few months, so we will start new calculation with new data (using 1_Gaia@home and 2_Gaia@home).

We would also like to start calculating very high precision quadrupole calculations (3_Gaia@home)

Future, the aim of the project is to enable the use of boinc for scientists who calculate using the Gaia catalog.
Since each topic has its own specificity, it requires a lot of work to be implemented efficiently.
I hope that the stable versions of the existing calculations will compensate for testing new issues...

>Will the statistics be common for the entire project or are you planning to account separately for applications?
>If the "reward" for the amount of computation has already been determined, are you planning on introducing a badge system to reward users who reach certain milestones?
>It is advisable for participants to plan ahead of time for the opportunity to achieve certain goals, both locally in the project and on external statistics sites

Unfortunately, I didn't have time to go deeper into the documentation on this topic.
You know what solution is best for you and I am asking you for support.
I don't want to create this badge system without knowing it.
ID: 114 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sergey Kovalchuk

Send message
Joined: 21 Aug 19
Posts: 23
Credit: 91,637
RAC: 0
Message 116 - Posted: 18 Oct 2020, 22:26:08 UTC - in response to Message 114.  

It would be nice to move the last 2 posts to a separate topic in the "Science"
And also add information about other applications to the "About Us" page. Or replace with something more general + description in Science

About badges started a separate topic
ID: 116 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet

Send message
Joined: 29 Sep 20
Posts: 14
Credit: 64,341
RAC: 0
Message 120 - Posted: 19 Oct 2020, 8:10:31 UTC - in response to Message 113.  

2_Gaia@home :
The 2_Gaia@home check system time before start main loop of calculation.
Then, it checks the system time on each loop steps.
The main loop is broken if time difference between start and actual time is greater than 2h.
The 2_Gaia@home is finishing work and prepare results.

The number of lines in the output file is different for different processors.

Also, a surprise for me is why in some cases the loop does not end. :(
These are sporadic cases.
I think I found out why this is happening with 2_Gaia@home

We numerically calculate the motion of the star cluster near the Sun in the gravitational field of the Galaxy.
For each star of cluster we draw clones using the covariance matrix from the Gaia catalog.

Sometimes a random clone requires a very small integration step which increases the computation time for the loop step (2_Gaia@home app).
Unfortunately, we are not able to predict such a situation
:(

So, for us cruchers it's OK to abort tasks running longer than ~3 hours, cause when running longer, you don't get a valid result and we will not get credit for the wasted time.
Problem is that such a task would be sent to another 'victim', so maybe a temporary solution to reduce wasted time
(until you have a better solution within your application) is to reduce the rsc_fpops_bound from 86400000000000 to 21600000000000.
ID: 120 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sergey Kovalchuk

Send message
Joined: 21 Aug 19
Posts: 23
Credit: 91,637
RAC: 0
Message 121 - Posted: 19 Oct 2020, 8:25:56 UTC - in response to Message 120.  

reduce the rsc_fpops_bound from 86400000000000 to 21600000000000

this could potentially reduce the size of credits from ~64 to ~16
you need to check how Credit-New will behave
ID: 121 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Werinbert

Send message
Joined: 15 Oct 19
Posts: 11
Credit: 2,848,916
RAC: 0
Message 122 - Posted: 19 Oct 2020, 8:30:06 UTC

No project should be using credit-new....it is so prone to error and wonky results.
ID: 122 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zupa

Send message
Joined: 21 Aug 19
Posts: 75
Credit: 163,295
RAC: 0
Message 124 - Posted: 19 Oct 2020, 9:06:12 UTC - in response to Message 121.  

I will try to change 2_Gaia@home like this:
I will use the kernel signal to terminate after 3h and I will try to save some temporary results so that the program exits properly and you won't lose your credits.
I hope I can do it ...

what do you think about it ?
ID: 124 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet

Send message
Joined: 29 Sep 20
Posts: 14
Credit: 64,341
RAC: 0
Message 125 - Posted: 19 Oct 2020, 9:42:03 UTC - in response to Message 124.  

I will try to change 2_Gaia@home like this:
I will use the kernel signal to terminate after 3h and I will try to save some temporary results so that the program exits properly and you won't lose your credits.
I hope I can do it ...

what do you think about it ?

You could give that a try and I hope those temporary results are still useful for you.
ID: 125 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zupa

Send message
Joined: 21 Aug 19
Posts: 75
Credit: 163,295
RAC: 0
Message 126 - Posted: 19 Oct 2020, 10:04:39 UTC - in response to Message 125.  

I start cleaning the queue and wait for wus in progress
ID: 126 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matthias Lehmkuhl

Send message
Joined: 12 Oct 20
Posts: 7
Credit: 1,443,706
RAC: 0
Message 132 - Posted: 19 Oct 2020, 17:58:33 UTC - in response to Message 124.  

sounds good to me, I'm back to do work
ID: 132 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zupa

Send message
Joined: 21 Aug 19
Posts: 75
Credit: 163,295
RAC: 0
Message 133 - Posted: 19 Oct 2020, 18:57:40 UTC - in response to Message 132.  

3_Gaia@home - test for new vesrion of 2_Gaia@home (350 wus) (normal time of calculation: 1h, stop signal 1,5h)
ID: 133 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet

Send message
Joined: 29 Sep 20
Posts: 14
Credit: 64,341
RAC: 0
Message 134 - Posted: 19 Oct 2020, 19:53:43 UTC - in response to Message 133.  

3_Gaia@home - test for new vesrion of 2_Gaia@home (350 wus) (normal time of calculation: 1h, stop signal 1,5h)
Let's see how these 360 workunits behave.
I've 16 tasks running and 6 ready to start.
ID: 134 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet

Send message
Joined: 29 Sep 20
Posts: 14
Credit: 64,341
RAC: 0
Message 135 - Posted: 19 Oct 2020, 20:41:09 UTC - in response to Message 134.  

Results are not reporting their used CPU-time.
ID: 135 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : cancel wu's

©2024 GAVIP-GC