Message boards :
Number crunching :
A lot of errors
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 31 Oct 24 Posts: 7 Credit: 21,522,201 RAC: 497,953 |
@ADMINS ANY UPDATES ?? i checked 20 of the successful resends from his most recent 20 errors. and all were newer systems with more modern instruction sets. the oldest system that a resend successfully completed was on an i5-4250U, which supports SSE4/AVX/AVX2. there's a strong correlation here. but also very unlikely that many people are running such old hardware anyway so pretty unlikely that a resend would go to such an old system. |
Send message Joined: 26 Feb 20 Posts: 29 Credit: 8,134,368 RAC: 30,743 |
@ADMINS ANY UPDATES ?? Very nice!!! |
Send message Joined: 21 Aug 19 Posts: 13 Credit: 118,660 RAC: 602 |
|
Send message Joined: 3 Jan 25 Posts: 1 Credit: 259,185 RAC: 7,677 |
Succuss! I seem to have solved the 195 error. Works for me as well. Thank You. |
Send message Joined: 28 Sep 20 Posts: 21 Credit: 705,606 RAC: 2,820 |
I just restarted gaia on a debian host, many tasks are "not failing" with a status "cancel by the user" when I have NOT cancelled anything and the deadline is far away ! the status in the task log is "EXIT_ABORTED_VIA_GUI", this is crazy !! The same is happening for version 4 and 6, and I have other tasks that end normally... Just because I read this reinstall option of a math lib above I just did to see if it makes any difference, but it makes no sense really. |
Send message Joined: 28 Sep 20 Posts: 21 Credit: 705,606 RAC: 2,820 |
The rate of "false / invented" tasks cancelation is growing : 748 valid tasks and 428 so called canceled tasks... |
Send message Joined: 28 Sep 20 Posts: 21 Credit: 705,606 RAC: 2,820 |
There were no more tasks for several days, now we have again and the same continues to happen at a high rate : dozens of tasks are "artificially aborted" with a status that claims EXIT_ABORTED_VIA_GUI which is absolutely false, I'm not doing anything on these tasks. Luckily at the same time other tasks (of the same 4 and 6 version of the application) continue to crunch normally. I have never seen such a behavior. |
Send message Joined: 31 Oct 24 Posts: 7 Credit: 21,522,201 RAC: 497,953 |
There were no more tasks for several days, now we have again and the same continues to happen at a high rate : dozens of tasks are "artificially aborted" with a status that claims EXIT_ABORTED_VIA_GUI which is absolutely false, I'm not doing anything on these tasks. Ask your cloud rental provider for this host. maybe there is some permissions issue. |
Send message Joined: 28 Sep 20 Posts: 21 Credit: 705,606 RAC: 2,820 |
It is the first project I ever see this behavior, on this particular VPS host as any other machine. How could there be a "permissions issue" that makes that, on a given "boinc work session", a % of tasks are completing ok, and other tasks of that same project / application are "magically canceled" ? I would understand an issue where "all the tasks for this project are not working anymore" (for instance), but yes/no/yes/no it makes no sense... |
Send message Joined: 31 Oct 24 Posts: 7 Credit: 21,522,201 RAC: 497,953 |
It is the first project I ever see this behavior, on this particular VPS host as any other machine. rather than pushing back at a suggestion, you could explore that possibility. you wont know until you try. if this is only happening with your cloud host and not one's locally that are under your full control, then it might be possible that the thing that is different (being a cloud host or something in that environment) could be related to the root cause of the problem. no one else is having this problem. it's very likely something wrong/misconfigured in your VPS environment. do you have something like low priority or interruptible availability? maybe when higher priority work needs to be done by other clients at the VPS they are aborting your tasks? all kinds of stuff like this are possible when it's not your own host. |
Send message Joined: 28 Sep 20 Posts: 21 Credit: 705,606 RAC: 2,820 |
rather than pushing back at a suggestion, you could explore that possibility. you wont know until you try. I don't even know what I could actually ask to the support of the provider ? "are the permissions screwed on my host" ? if this is only happening with your cloud host and not one's locally that are under your full control I can only crunch gaia on this host because that's the only linux I have (my only own machine is a mac and I have a windows PC at work, so no gaia for both) then it might be possible that the thing that is different (being a cloud host or something in that environment) could be related to the root cause of the problem. In theory, why not, but these symptoms make no sense. no one else is having this problem. it's very likely something wrong/misconfigured in your VPS environment. How do you know that I am the only one ? many people don't even care going to the project forum, they just stop crunching for that particular project if they are bothered by anything about it... (I seen this many times on the AF forum, "this project is stupid let me crunch something else") do you have something like low priority or interruptible availability? maybe when higher priority work needs to be done by other clients at the VPS they are aborting your tasks? all kinds of stuff like this are possible when it's not your own host So the provider would have a process to automatically interact with my boinc process running and arbitrarily issue cancel order for "some tasks and not others" just because it wants the CPU I'm paying for 24 hours a day ? There is no "low priority or interruptible availability", it is a simple VPS offer for private individuals, not a complex offer for professionals with such options... |
Send message Joined: 31 Oct 24 Posts: 7 Credit: 21,522,201 RAC: 497,953 |
you should ask them. it's clearly not a problem with the app itself. it works fine for everyone else on normal hosts. tell them you are seeing a lot of tasks failing and ask them if they are routinely stopping your VM for other high priority tasks. the problem is some configuration issue or something the VPS host is doing almost certainly. i couldnt tell you what the issue is specifically since I don't have any idea which VPS this is or how they configure their access/resources. I was merely giving you some examples of possible things. There is no "low priority or interruptible availability", it is a simple VPS offer for private individuals, not a complex offer for professionals with such options... you have that reasoning backwards. interruptible cloud instances are FAR more common for "private individuals" as most normal folks just want some compute availability and it doesnt have to be available 24/7 or have any kind of latency or uptime requirements. professionals would be the ones that want dedicated uninterruptible instances, and they pay more for that. |
Send message Joined: 21 Apr 21 Posts: 19 Credit: 1,941,946 RAC: 11,333 |
BTW, for VPS for crunching, I recommend getting one of Hetzner servers from their auctions. They are a bit more expensive than typical low end VPS, but you get access to a physical machine with which you can do whatever you want. I've been crunching on them months at a time with no issues. https://www.hetzner.com/sb my blog about raspberry pis and astrophotography |
Send message Joined: 28 Sep 20 Posts: 21 Credit: 705,606 RAC: 2,820 |
you should ask them. it's clearly not a problem with the app itself. it works fine for everyone else on normal hosts. Again : your assumption. And even though, it is not because I (could) be the only one on the planet to see this that it could not be the result of the interaction of this particular project and "this particular version of debian / machine setup" etc. tell them you are seeing a lot of tasks failing and ask them if they are routinely stopping your VM for other high priority tasks. I've actually been using OVH "continuous service availability" VPS only for boinc for years and this is the first project where I have such a behavior. At the moment the same host is also crunching milkway, because I saw there was no more gaia tasks for some days, and this didn't happen for milky. And then I got gaia tasks and again, "auto canceled tasks" are growing, together with non canceled tasks. The number of canceled tasks is now bigger that successful tasks, but since they use 0 sec of CPU, this is only a problem for the project itself, not for me. And since I continue to consider that I have nothing to do with this problem (because it is the one and only project where this happens), let it be :) |
Send message Joined: 31 Oct 24 Posts: 7 Credit: 21,522,201 RAC: 497,953 |
Must be some kind of BOINC misconfiguration that’s causing the client to abort some tasks. Maybe it thinks it won’t make the deadline? Dunno. BOINC has a lot of gremlins. Definitely specific to that host though. |
Send message Joined: 28 Sep 20 Posts: 21 Credit: 705,606 RAC: 2,820 |
I had confirmation by a notorious boinc project dev that "The status ‘Cancelled by user’ applies to tasks cancelled manually or cancelled automatically by the BOINC client. " So it this case it means "by the boinc client", in conjunction with this particular project / apps, like if it believed that there was no time to finish the tasks in due time, when they all have a 9 days deadline, which doesn't make sense. I did a reset of the project but it didn't change anything, loads of tasks cancelled all the time - and the rest finishing normally. I had only let gaia enabled on that host when this issue started, then I added milkyway at the same time (when there was no more tasks some time ago, and then I let it open in case gaia has no tasks anymore or if the project decides to blacklist me) and I obviously have no problem with milky - or any other project that I have crunched on this host in the past years. |
Send message Joined: 6 Aug 21 Posts: 11 Credit: 182,677 RAC: 359 |
I'm not claiming to be Boinc savvy, but your problem is interesting, albeit one of a kind it seems. In BOINC manager, under Options, there is Event Log Options. Try turning on a few more things there. It might give you an additional hint. Also, open Notepad while the tasks are crunching. See if any stray characters show up. I've had some faulty computer|keyboards that do develop a mind of their own. I can't explain why other tasks are immune, but it might be the CPU/heat load of Gaia... You might also throttle down Use at Most n% of CPU time and see if working the computer less helps. Good Luck, it sounds like you're having some success either way. |
Send message Joined: 28 Sep 20 Posts: 21 Credit: 705,606 RAC: 2,820 |
I found the reason !!!! I'm using an account manager (SAM, not BAM) where there is an option " Abandonner les UT non démarrées" (cancel non started tasked), normally used when you want to stop a project on a host, and it was checked (but the project was not on "no new task")... so it explains it all, the project was not guilty at all, it is a behavior of a *standard* boinc client applying an instruction from a boinc manager ! I have no memory of when I why I clicked this, probably long ago after I stopped gaia and I had not removed it afterwards, and it remained there. You were right Ian&Steve !! |
©2025 GAVIP-GC