Segmentation fault when train the net

python train.py --batch_size 24 --experiment_name shapenet-ldif \
  --model_directory $models --model_type "ldif" \
  --dataset_directory $dataset
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
INFO: Making dataset...
INFO: Optimized dataset detected at ./shapenet/optimized
INFO: Mapping...
INFO: is_invalid vs lower_coords: [24, 32, 1] vs [24, 32, 3]
INFO: Post-where lower_coords: [24, 32, 3]
INFO: is_invalid vs sdf coords: [24, 32, 1] vs [24, 32, 1]
INFO: In-out image summaries have been removed.
INFO: The 0-th GPU has 22390 MB free.
INFO: TensorFlow can use up to 93.1397945511389% of the total GPU memory.
INFO: Initializing variables...
INFO: No previous checkpoint detected, training from scratch.
Fatal Python error: Segmentation fault

Thread 0x00007fd78cff9700 (most recent call first):
  File "/home/mayo/anaconda3/envs/tf-1.15/lib/python3.8/threading.py", line 302 in wait
  File "/home/mayo/anaconda3/envs/tf-1.15/lib/python3.8/queue.py", line 170 in get
  File "/home/mayo/anaconda3/envs/tf-1.15/lib/python3.8/site-packages/tensorflow_core/python/summary/writer/event_file_writer.py", line 159 in run
  File "/home/mayo/anaconda3/envs/tf-1.15/lib/python3.8/threading.py", line 932 in _bootstrap_inner
  File "/home/mayo/anaconda3/envs/tf-1.15/lib/python3.8/threading.py", line 890 in _bootstrap

Thread 0x00007fd9b5258340 (most recent call first):
  File "/home/mayo/anaconda3/envs/tf-1.15/lib/python3.8/site-packages/tensorflow_core/python/client/session.py", line 1441 in _call_tf_sessionrun
  File "/home/mayo/anaconda3/envs/tf-1.15/lib/python3.8/site-packages/tensorflow_core/python/client/session.py", line 1349 in _run_fn
  File "/home/mayo/anaconda3/envs/tf-1.15/lib/python3.8/site-packages/tensorflow_core/python/client/session.py", line 1365 in _do_call
  File "/home/mayo/anaconda3/envs/tf-1.15/lib/python3.8/site-packages/tensorflow_core/python/client/session.py", line 1358 in _do_run
  File "/home/mayo/anaconda3/envs/tf-1.15/lib/python3.8/site-packages/tensorflow_core/python/client/session.py", line 1179 in _run
  File "/home/mayo/anaconda3/envs/tf-1.15/lib/python3.8/site-packages/tensorflow_core/python/client/session.py", line 955 in run
  File "train.py", line 263 in main
  File "/home/mayo/anaconda3/envs/tf-1.15/lib/python3.8/site-packages/absl/app.py", line 258 in _run_main
  File "/home/mayo/anaconda3/envs/tf-1.15/lib/python3.8/site-packages/absl/app.py", line 312 in run
  File "train.py", line 283 in <module>
./reproduce_shapenet_autoencoder.sh: line 50: 1295263 Segmentation fault      (core dumped) python train.py --batch_size 24 --experiment_name shapenet-ldif --model_directory $models --model_type "ldif" --dataset_directory $dataset


I have generated the dataset from raw ShapnetCoreV1/03001627 models, by converting .obj file to .ply and then generating watertight .ply file using gaps tools. After that I used the command in the script named reproduce_shapenet_autoencoder.sh to make dataset, everything done successfully. But when I tried to train the net with the dataset, it failed and got the log  showed above.

BTW, the enviroment with my computer: ubuntu20.04 with RTX3090, cuda version = 11.1, and I run the code on tensorflow-1.15.
Could you give me some advice for this issue?
Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault when train the net #17

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Segmentation fault when train the net #17

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions