When I started a job using clusterduck with Slurm and an error is raised, I only see the following stack trace:
Error executing job with overrides: ['seed=0', '+experiment/deformable_plate=ltsgns_mesh_eval', '+platform=kluster_1_gpu']
submitit ERROR (2023-11-10 13:02:46,649) - Submitted job triggered an exception
Traceback (most recent call last):
File "/home/i53/mitarbeiter/philipp_dahlinger/mambaforge/envs/ltsgns_mp/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/i53/mitarbeiter/philipp_dahlinger/mambaforge/envs/ltsgns_mp/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/i53/mitarbeiter/philipp_dahlinger/mambaforge/envs/ltsgns_mp/lib/python3.10/site-packages/submitit/core/_submit.py", line 11, in <module>
submitit_main()
File "/home/i53/mitarbeiter/philipp_dahlinger/mambaforge/envs/ltsgns_mp/lib/python3.10/site-packages/submitit/core/submission.py", line 76, in submitit_main
process_job(args.folder)
File "/home/i53/mitarbeiter/philipp_dahlinger/mambaforge/envs/ltsgns_mp/lib/python3.10/site-packages/submitit/core/submission.py", line 69, in process_job
raise error
File "/home/i53/mitarbeiter/philipp_dahlinger/mambaforge/envs/ltsgns_mp/lib/python3.10/site-packages/submitit/core/submission.py", line 55, in process_job
result = delayed.result()
File "/home/i53/mitarbeiter/philipp_dahlinger/mambaforge/envs/ltsgns_mp/lib/python3.10/site-packages/submitit/core/utils.py", line 133, in result
self._result = self.function(*self.args, **self.kwargs)
File "/home/i53/mitarbeiter/philipp_dahlinger/mambaforge/envs/ltsgns_mp/lib/python3.10/site-packages/hydra_plugins/clusterduck_launcher/clusterduck_launcher.py", line 116, in run_workers
exceptions = [
File "/home/i53/mitarbeiter/philipp_dahlinger/mambaforge/envs/ltsgns_mp/lib/python3.10/site-packages/hydra_plugins/clusterduck_launcher/clusterduck_launcher.py", line 117, in <listcomp>
result.return_value
File "/home/i53/mitarbeiter/philipp_dahlinger/mambaforge/envs/ltsgns_mp/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
IndexError: too many indices for tensor of dimension 5
srun: error: node2: task 0: Exited with exit code 1
I would like to see the stacktrace of my code, where in this example the IndexError was raised in order to find out the problem.
Is that possible?
When I started a job using clusterduck with Slurm and an error is raised, I only see the following stack trace:
I would like to see the stacktrace of my code, where in this example the IndexError was raised in order to find out the problem.
Is that possible?