A lot of errors

Message boards : Number crunching : A lot of errors
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Ian&Steve C.

Send message
Joined: 31 Oct 24
Posts: 7
Credit: 21,520,601
RAC: 497,852
Message 683 - Posted: 24 Dec 2024, 16:22:23 UTC - in response to Message 682.  

@ADMINS ANY UPDATES ??
Still receiving same "-195 errors" with the _4 tasks.


"195" error is a generic error from BOINC, not specifically in regard to the app.

your CPU is over 15 years old and lacks modern features such as SSE4 and AVX. this probably the reason they all fail.

the Gaia_5 and Gaia_6 apps probably do not require this, which is why they work.


That could be an interesting research Project to be done here on the returns


i checked 20 of the successful resends from his most recent 20 errors. and all were newer systems with more modern instruction sets. the oldest system that a resend successfully completed was on an i5-4250U, which supports SSE4/AVX/AVX2.

there's a strong correlation here. but also very unlikely that many people are running such old hardware anyway so pretty unlikely that a resend would go to such an old system.
ID: 683 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey
Avatar

Send message
Joined: 26 Feb 20
Posts: 29
Credit: 8,134,368
RAC: 30,743
Message 690 - Posted: 27 Dec 2024, 15:26:38 UTC - in response to Message 683.  

@ADMINS ANY UPDATES ??
Still receiving same "-195 errors" with the _4 tasks.


"195" error is a generic error from BOINC, not specifically in regard to the app.

your CPU is over 15 years old and lacks modern features such as SSE4 and AVX. this probably the reason they all fail.

the Gaia_5 and Gaia_6 apps probably do not require this, which is why they work.


That could be an interesting research Project to be done here on the returns


i checked 20 of the successful resends from his most recent 20 errors. and all were newer systems with more modern instruction sets. the oldest system that a resend successfully completed was on an i5-4250U, which supports SSE4/AVX/AVX2.

there's a strong correlation here. but also very unlikely that many people are running such old hardware anyway so pretty unlikely that a resend would go to such an old system.


Very nice!!!
ID: 690 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Contact
Avatar

Send message
Joined: 21 Aug 19
Posts: 13
Credit: 118,660
RAC: 602
Message 691 - Posted: 5 Jan 2025, 2:17:11 UTC

Succuss! I seem to have solved the 195 error.
I now have a valid result again after reinstalling the libquadmath library.

sudo apt install --reinstall libquadmath0

ID: 691 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tito

Send message
Joined: 3 Jan 25
Posts: 1
Credit: 259,185
RAC: 7,677
Message 692 - Posted: 5 Jan 2025, 15:38:58 UTC - in response to Message 691.  

Succuss! I seem to have solved the 195 error.
I now have a valid result again after reinstalling the libquadmath library.

sudo apt install --reinstall libquadmath0


Works for me as well.
Thank You.
ID: 692 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>Le_Pommier] Jerome_C2005

Send message
Joined: 28 Sep 20
Posts: 21
Credit: 705,606
RAC: 2,820
Message 697 - Posted: 8 Jan 2025, 8:09:24 UTC

I just restarted gaia on a debian host, many tasks are "not failing" with a status "cancel by the user" when I have NOT cancelled anything and the deadline is far away ! the status in the task log is "EXIT_ABORTED_VIA_GUI", this is crazy !!

The same is happening for version 4 and 6, and I have other tasks that end normally...

Just because I read this reinstall option of a math lib above I just did to see if it makes any difference, but it makes no sense really.
ID: 697 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>Le_Pommier] Jerome_C2005

Send message
Joined: 28 Sep 20
Posts: 21
Credit: 705,606
RAC: 2,820
Message 701 - Posted: 9 Jan 2025, 17:18:43 UTC - in response to Message 697.  

The rate of "false / invented" tasks cancelation is growing : 748 valid tasks and 428 so called canceled tasks...
ID: 701 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>Le_Pommier] Jerome_C2005

Send message
Joined: 28 Sep 20
Posts: 21
Credit: 705,606
RAC: 2,820
Message 703 - Posted: 14 Jan 2025, 10:55:57 UTC - in response to Message 701.  

There were no more tasks for several days, now we have again and the same continues to happen at a high rate : dozens of tasks are "artificially aborted" with a status that claims EXIT_ABORTED_VIA_GUI which is absolutely false, I'm not doing anything on these tasks.

Luckily at the same time other tasks (of the same 4 and 6 version of the application) continue to crunch normally.

I have never seen such a behavior.
ID: 703 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 31 Oct 24
Posts: 7
Credit: 21,520,601
RAC: 497,852
Message 704 - Posted: 14 Jan 2025, 15:12:54 UTC - in response to Message 703.  

There were no more tasks for several days, now we have again and the same continues to happen at a high rate : dozens of tasks are "artificially aborted" with a status that claims EXIT_ABORTED_VIA_GUI which is absolutely false, I'm not doing anything on these tasks.

Luckily at the same time other tasks (of the same 4 and 6 version of the application) continue to crunch normally.

I have never seen such a behavior.


Ask your cloud rental provider for this host. maybe there is some permissions issue.
ID: 704 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>Le_Pommier] Jerome_C2005

Send message
Joined: 28 Sep 20
Posts: 21
Credit: 705,606
RAC: 2,820
Message 705 - Posted: 15 Jan 2025, 13:53:48 UTC - in response to Message 704.  

It is the first project I ever see this behavior, on this particular VPS host as any other machine.

How could there be a "permissions issue" that makes that, on a given "boinc work session", a % of tasks are completing ok, and other tasks of that same project / application are "magically canceled" ?

I would understand an issue where "all the tasks for this project are not working anymore" (for instance), but yes/no/yes/no it makes no sense...
ID: 705 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 31 Oct 24
Posts: 7
Credit: 21,520,601
RAC: 497,852
Message 706 - Posted: 15 Jan 2025, 15:39:13 UTC - in response to Message 705.  
Last modified: 15 Jan 2025, 15:58:29 UTC

It is the first project I ever see this behavior, on this particular VPS host as any other machine.

How could there be a "permissions issue" that makes that, on a given "boinc work session", a % of tasks are completing ok, and other tasks of that same project / application are "magically canceled" ?

I would understand an issue where "all the tasks for this project are not working anymore" (for instance), but yes/no/yes/no it makes no sense...


rather than pushing back at a suggestion, you could explore that possibility. you wont know until you try.

if this is only happening with your cloud host and not one's locally that are under your full control, then it might be possible that the thing that is different (being a cloud host or something in that environment) could be related to the root cause of the problem.

no one else is having this problem. it's very likely something wrong/misconfigured in your VPS environment.

do you have something like low priority or interruptible availability? maybe when higher priority work needs to be done by other clients at the VPS they are aborting your tasks? all kinds of stuff like this are possible when it's not your own host.
ID: 706 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>Le_Pommier] Jerome_C2005

Send message
Joined: 28 Sep 20
Posts: 21
Credit: 705,606
RAC: 2,820
Message 707 - Posted: 15 Jan 2025, 16:39:14 UTC - in response to Message 706.  

rather than pushing back at a suggestion, you could explore that possibility. you wont know until you try.

I don't even know what I could actually ask to the support of the provider ? "are the permissions screwed on my host" ?

if this is only happening with your cloud host and not one's locally that are under your full control

I can only crunch gaia on this host because that's the only linux I have (my only own machine is a mac and I have a windows PC at work, so no gaia for both)

then it might be possible that the thing that is different (being a cloud host or something in that environment) could be related to the root cause of the problem.

In theory, why not, but these symptoms make no sense.

no one else is having this problem. it's very likely something wrong/misconfigured in your VPS environment.

How do you know that I am the only one ? many people don't even care going to the project forum, they just stop crunching for that particular project if they are bothered by anything about it... (I seen this many times on the AF forum, "this project is stupid let me crunch something else")

do you have something like low priority or interruptible availability? maybe when higher priority work needs to be done by other clients at the VPS they are aborting your tasks? all kinds of stuff like this are possible when it's not your own host

So the provider would have a process to automatically interact with my boinc process running and arbitrarily issue cancel order for "some tasks and not others" just because it wants the CPU I'm paying for 24 hours a day ?

There is no "low priority or interruptible availability", it is a simple VPS offer for private individuals, not a complex offer for professionals with such options...
ID: 707 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 31 Oct 24
Posts: 7
Credit: 21,520,601
RAC: 497,852
Message 708 - Posted: 15 Jan 2025, 16:56:11 UTC - in response to Message 707.  
Last modified: 15 Jan 2025, 17:01:00 UTC

you should ask them. it's clearly not a problem with the app itself. it works fine for everyone else on normal hosts. tell them you are seeing a lot of tasks failing and ask them if they are routinely stopping your VM for other high priority tasks.

the problem is some configuration issue or something the VPS host is doing almost certainly. i couldnt tell you what the issue is specifically since I don't have any idea which VPS this is or how they configure their access/resources. I was merely giving you some examples of possible things.

There is no "low priority or interruptible availability", it is a simple VPS offer for private individuals, not a complex offer for professionals with such options...


you have that reasoning backwards. interruptible cloud instances are FAR more common for "private individuals" as most normal folks just want some compute availability and it doesnt have to be available 24/7 or have any kind of latency or uptime requirements. professionals would be the ones that want dedicated uninterruptible instances, and they pay more for that.
ID: 708 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
stfn

Send message
Joined: 21 Apr 21
Posts: 19
Credit: 1,941,946
RAC: 11,333
Message 709 - Posted: 15 Jan 2025, 21:27:31 UTC

BTW, for VPS for crunching, I recommend getting one of Hetzner servers from their auctions. They are a bit more expensive than typical low end VPS, but you get access to a physical machine with which you can do whatever you want. I've been crunching on them months at a time with no issues.

https://www.hetzner.com/sb
my blog about raspberry pis and astrophotography
ID: 709 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>Le_Pommier] Jerome_C2005

Send message
Joined: 28 Sep 20
Posts: 21
Credit: 705,606
RAC: 2,820
Message 711 - Posted: 16 Jan 2025, 15:37:26 UTC - in response to Message 708.  

you should ask them. it's clearly not a problem with the app itself. it works fine for everyone else on normal hosts.

Again : your assumption. And even though, it is not because I (could) be the only one on the planet to see this that it could not be the result of the interaction of this particular project and "this particular version of debian / machine setup" etc.

tell them you are seeing a lot of tasks failing and ask them if they are routinely stopping your VM for other high priority tasks.

the problem is some configuration issue or something the VPS host is doing almost certainly. i couldnt tell you what the issue is specifically since I don't have any idea which VPS this is or how they configure their access/resources. I was merely giving you some examples of possible things.

There is no "low priority or interruptible availability", it is a simple VPS offer for private individuals, not a complex offer for professionals with such options...


you have that reasoning backwards. interruptible cloud instances are FAR more common for "private individuals" as most normal folks just want some compute availability and it doesnt have to be available 24/7 or have any kind of latency or uptime requirements. professionals would be the ones that want dedicated uninterruptible instances, and they pay more for that.

I've actually been using OVH "continuous service availability" VPS only for boinc for years and this is the first project where I have such a behavior.
At the moment the same host is also crunching milkway, because I saw there was no more gaia tasks for some days, and this didn't happen for milky.
And then I got gaia tasks and again, "auto canceled tasks" are growing, together with non canceled tasks.

The number of canceled tasks is now bigger that successful tasks, but since they use 0 sec of CPU, this is only a problem for the project itself, not for me.

And since I continue to consider that I have nothing to do with this problem (because it is the one and only project where this happens), let it be :)
ID: 711 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 31 Oct 24
Posts: 7
Credit: 21,520,601
RAC: 497,852
Message 712 - Posted: 16 Jan 2025, 22:59:58 UTC - in response to Message 711.  

Must be some kind of BOINC misconfiguration that’s causing the client to abort some tasks. Maybe it thinks it won’t make the deadline? Dunno. BOINC has a lot of gremlins. Definitely specific to that host though.
ID: 712 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>Le_Pommier] Jerome_C2005

Send message
Joined: 28 Sep 20
Posts: 21
Credit: 705,606
RAC: 2,820
Message 713 - Posted: 19 Jan 2025, 18:58:53 UTC

I had confirmation by a notorious boinc project dev that "The status ‘Cancelled by user’ applies to tasks cancelled manually or cancelled automatically by the BOINC client. "

So it this case it means "by the boinc client", in conjunction with this particular project / apps, like if it believed that there was no time to finish the tasks in due time, when they all have a 9 days deadline, which doesn't make sense.

I did a reset of the project but it didn't change anything, loads of tasks cancelled all the time - and the rest finishing normally.

I had only let gaia enabled on that host when this issue started, then I added milkyway at the same time (when there was no more tasks some time ago, and then I let it open in case gaia has no tasks anymore or if the project decides to blacklist me) and I obviously have no problem with milky - or any other project that I have crunched on this host in the past years.
ID: 713 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Gaia01902USA

Send message
Joined: 6 Aug 21
Posts: 11
Credit: 182,677
RAC: 359
Message 714 - Posted: 20 Jan 2025, 19:23:50 UTC - in response to Message 713.  

I'm not claiming to be Boinc savvy, but your problem is interesting, albeit one of a kind it seems.
In BOINC manager, under Options, there is Event Log Options.
Try turning on a few more things there. It might give you an additional hint.

Also, open Notepad while the tasks are crunching. See if any stray characters show up.
I've had some faulty computer|keyboards that do develop a mind of their own.
I can't explain why other tasks are immune, but it might be the CPU/heat load of Gaia...

You might also throttle down Use at Most n% of CPU time and see if working the computer less helps.

Good Luck, it sounds like you're having some success either way.
ID: 714 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>Le_Pommier] Jerome_C2005

Send message
Joined: 28 Sep 20
Posts: 21
Credit: 705,606
RAC: 2,820
Message 715 - Posted: 21 Jan 2025, 15:14:16 UTC
Last modified: 21 Jan 2025, 15:15:26 UTC

I found the reason !!!!

I'm using an account manager (SAM, not BAM) where there is an option " Abandonner les UT non démarrées" (cancel non started tasked), normally used when you want to stop a project on a host, and it was checked (but the project was not on "no new task")... so it explains it all, the project was not guilty at all, it is a behavior of a *standard* boinc client applying an instruction from a boinc manager !

I have no memory of when I why I clicked this, probably long ago after I stopped gaia and I had not removed it afterwards, and it remained there.

You were right Ian&Steve !!
ID: 715 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : A lot of errors

©2025 GAVIP-GC