Author |
Message |
ToniVolunteer moderator Project administrator Project developer Project tester Project scientist Send message
Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level
Scientific publications
|
The acemd3 app is again under test. It should work on windows (including RTX!). |
|
|
|
Reactivated my RTX 2080 on an i7 Windows 10 and unfortunately a NON-acemd3 task downloaded and errored out after 8 seconds. Hopefully, correctly, I excluded all GPUGrid tasks except acemd3 until conditions change. |
|
|
_Ryle_Send message
Joined: 7 Jun 09 Posts: 24 Credit: 1,138,093,416 RAC: 570 Level
Scientific publications
|
Thanks Toni, I'm looking forward to it's release. I hope Linux version also will be released at that time. :) |
|
|
eXaPowerSend message
Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level
Scientific publications
|
The acemd3 app is again under test. It should work on windows (including RTX!).
Windows 8.1 RTX 2080ti error at start of WU. Wu loop until Suspend/resume is used. Error message occurs each time the Wu restarts.
http://www.gpugrid.net/result.php?resultid=21341937
http://www.gpugrid.net/result.php?resultid=21341954
Problem signature:
Problem Event Name: BEX64
Application Name: acemd3.exe
Application Version: 0.0.0.0
Application Timestamp: 5d6535ed
Fault Module Name: ucrtbase.DLL
Fault Module Version: 10.0.17134.12
Fault Module Timestamp: 587decd7
Exception Offset: 000000000006e75e
Exception Code: c0000409
Exception Data: 0000000000000007
OS Version: 6.3.9600.2.0.0.768.101
Locale ID: 1033
Additional Information 1: 723f
Additional Information 2: 723ff68f3f17ee5cfa26fbef8ee09749
Additional Information 3: 096f
Additional Information 4: 096f337e301f747985865265c5b96cfe
|
|
|
|
Hi all,
these are my setting :
ACEMD short runs (2-3 hours on fastest card): yes
ACEMD long runs (8-12 hours on fastest GPU): yes
ACEMD3 Beta: yes
Quantum Chemistry (CPU): no
Quantum Chemistry (CPU, beta): no
Python Runtime: no
Actually i have got only a long time WU, no acemd3 ... with 4 in queue !
Any suggestion ?
Thanks in advance
K.
edit : now 0 WU
____________
Dreams do not always come true. But not because they are too big or impossible. Why did we stop believing.
(Martin Luther King) |
|
|
klepelSend message
Joined: 23 Dec 09 Posts: 189 Credit: 4,298,669,293 RAC: 1,629,088 Level
Scientific publications
|
Task http://www.gpugrid.net/result.php?resultid=21342341
Errored immediately out:
Stderr output
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -59 (0xffffffc5)</message>
<stderr_txt>
# GPU [GeForce RTX 2070] Platform [Windows] Rev [3212] VERSION [80]
# SWAN Device 0 :
# Name : GeForce RTX 2070
# ECC : Disabled
# Global mem : 8192MB
# Capability : 7.5
# PCI ID : 0000:1F:00.0
# Device clock : 1815MHz
# Memory clock : 7001MHz
# Memory width : 256bit
# Driver version : r430_00 : 43160
#SWAN: FATAL: cannot find image for module [.nonbonded.cu.] for device version 750
</stderr_txt>
]]>
|
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1298 Credit: 5,474,151,959 RAC: 10,111,480 Level
Scientific publications
|
You have to have the new acemd3 app enabled and the run test applications setting set. Did you get the new wrapper app for Windows acemd3 version 2.05? |
|
|
ToniVolunteer moderator Project administrator Project developer Project tester Project scientist Send message
Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level
Scientific publications
|
The app (v206) is out for Linux and Windows.
There has been a problem with units with -1-3- in their name (solved).
The scheduler will need improvements. Right now I've seen some cases of the cuda 92 app being sent to RTXes (such cases error out with "gpu architecture"). |
|
|
mmonninSend message
Joined: 2 Jul 16 Posts: 332 Credit: 4,558,131,065 RAC: 15,573,306 Level
Scientific publications
|
The app (v206) is out for Linux and Windows.
There has been a problem with units with -1-3- in their name (solved).
The scheduler will need improvements. Right now I've seen some cases of the cuda 92 app being sent to RTXes (such cases error out with "gpu architecture").
One of the bad ones:
https://www.gpugrid.net/result.php?resultid=21342392
3 other completed successfully in Linux.
One of my PC sran under cuda80 plan class and another PC with cuda100 plan class. Both with Pascal cards. The plan class is determined by compute capability of the driver I guess. |
|
|
|
I had 2 cuda(100) units succeed and 1 fail. I also had 2 cuda(92) fail. I last unit I received was a cuda(92).
http://www.gpugrid.net/results.php?hostid=494023&offset=0&show_names=0&state=0&appid=32
The scheduler is still a problem.
|
|
|
|
The -2-3- units using (cuda100) is the combination that finishes successfully. I had one more unit that was valid, the others failed.
I had 2 cuda(100) units succeed and 1 fail. I also had 2 cuda(92) fail. I last unit I received was a cuda(92).
http://www.gpugrid.net/results.php?hostid=494023&offset=0&show_names=0&state=0&appid=32
The scheduler is still a problem.
|
|
|
ToniVolunteer moderator Project administrator Project developer Project tester Project scientist Send message
Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level
Scientific publications
|
I'm fixing things incrementally. Failing stuff may be resent and succeed. |
|
|
AzmodesSend message
Joined: 7 Jan 17 Posts: 34 Credit: 1,371,429,518 RAC: 0 Level
Scientific publications
|
Not getting any tasks on Linux, nothing but errors on RTX on Windows so far.
I have a Windows system with both GTX and RTX cards in it. Do I have to exclude non-ACEMD3 tasks for the RTX via cc_config?
EDIT: Looks like I've been getting some new tasks on Linux too, but they're erroring out:
http://www.gpugrid.net/result.php?resultid=21343717
http://www.gpugrid.net/result.php?resultid=21341872
Two more were validated on the same system (very short, though?). |
|
|
ToniVolunteer moderator Project administrator Project developer Project tester Project scientist Send message
Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level
Scientific publications
|
Errors with "nelems != 1" were solved. Should go away sooner or later. Please ignore them.
All tests are very short (a few minutes) not to waste your time. They are however very important because I can see the behavior in many realistic card/app combinations. |
|
|
klepelSend message
Joined: 23 Dec 09 Posts: 189 Credit: 4,298,669,293 RAC: 1,629,088 Level
Scientific publications
|
Both computers with Windows 10 and lateste generation Nvidia Cards receive the following application: Long runs (8-12 hours on fastest card) v9.23 (cuda80). Both fail immidiately:
http://www.gpugrid.net/results.php?hostid=504655
http://www.gpugrid.net/results.php?hostid=512242 |
|
|
biodocSend message
Joined: 26 Aug 08 Posts: 183 Credit: 6,772,414,375 RAC: 652,897 Level
Scientific publications
|
I completed 27 ACEMD v2.06 (cuda100) tasks without an error on Linux.
http://www.gpugrid.net/results.php?userid=5539
|
|
|
|
I had 4 WU todays, 4 are OK.
Well done Toni !
On Arch Linux [5.2.11-zen1-1-zen|libc 2.29 (GNU libc)]
NVIDIA GeForce RTX 2080 Ti (4095MB) driver: 435.21
One of them, a toni_test ;-)
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<stderr_txt>
15:29:34 (15838): wrapper (7.7.26016): starting
15:29:34 (15838): wrapper (7.7.26016): starting
15:29:34 (15838): wrapper: running acemd3 (--boinc input --device 0)
15:31:06 (15838): acemd3 exited; CPU time 63.446112
15:31:06 (15838): called boinc_finish(0)
</stderr_txt>
]]> |
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1298 Credit: 5,474,151,959 RAC: 10,111,480 Level
Scientific publications
|
Still waiting on some new apps to go along with new work to test. Not lucky so far. |
|
|
|
On error:
https://www.gpugrid.net/result.php?resultid=21349753
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
07:54:11 (1654): wrapper (7.7.26016): starting
07:54:11 (1654): wrapper (7.7.26016): starting
07:54:11 (1654): wrapper: running acemd3 (--boinc input --device 0)
EXCEPTIONAL CONDITION: /home/user/conda/conda-bld/acemd3_1566914012210/work/src/mdio/bincoord.c, line 193: "nelems != 1"
07:54:14 (1654): acemd3 exited; CPU time 1.975979
07:54:14 (1654): app exit status: 0x86
07:54:14 (1654): called boinc_finish(195)
</stderr_txt>
]]> |
|
|