2023-05-17 06:48:57,396 - testrun_5cafe61a - [INFO] - {'dataset': 'icews14', 'name': 'testrun_5cafe61a', 'gpu': '1', 'train_strategy': 'one_to_n', 'opt': 'adam', 'neg_num': 1000, 'batch_size': 128, 'l2': 0.0, 'lr': 0.0001, 'max_epochs': 500, 'num_workers': 0, 'seed': 42, 'restore': False, 'lbl_smooth': 0.1, 'embed_dim': 400, 'ent_vec_dim': 400, 'rel_vec_dim': 400, 'bias': False, 'form': 'plain', 'k_w': 10, 'k_h': 20, 'num_filt': 96, 'ker_sz': 9, 'perm': 1, 'hid_drop': 0.5, 'feat_drop': 0.2, 'inp_drop': 0.2, 'drop_path': 0.0, 'drop': 0.0, 'in_channels': 1, 'out_channels': 32, 'filt_h': 1, 'filt_w': 9, 'image_h': 128, 'image_w': 128, 'patch_size': 8, 'mixer_dim': 256, 'expansion_factor': 4, 'expansion_factor_token': 0.5, 'mixer_depth': 16, 'mixer_dropout': 0.2, 'log_dir': './log/', 'config_dir': './config/', 'test_only': False, 'grid_search': True} 2023-05-17 06:49:44,802 - concurrent.futures - [ERROR] - exception calling callback for joblib.externals.loky.process_executor._RemoteTraceback: """ Traceback (most recent call last): File "/opt/conda/envs/kgs2s/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 391, in _process_worker call_item = call_queue.get(block=True, timeout=timeout) File "/opt/conda/envs/kgs2s/lib/python3.8/multiprocessing/queues.py", line 116, in get return _ForkingPickler.loads(res) File "/opt/conda/envs/kgs2s/lib/python3.8/site-packages/torch/storage.py", line 222, in _load_from_bytes return torch.load(io.BytesIO(b)) File "/opt/conda/envs/kgs2s/lib/python3.8/site-packages/torch/serialization.py", line 713, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/opt/conda/envs/kgs2s/lib/python3.8/site-packages/torch/serialization.py", line 930, in _legacy_load result = unpickler.load() File "/opt/conda/envs/kgs2s/lib/python3.8/site-packages/torch/serialization.py", line 876, in persistent_load wrap_storage=restore_location(obj, location), File "/opt/conda/envs/kgs2s/lib/python3.8/site-packages/torch/serialization.py", line 175, in default_restore_location result = fn(storage, location) File "/opt/conda/envs/kgs2s/lib/python3.8/site-packages/torch/serialization.py", line 155, in _cuda_deserialize return torch._UntypedStorage(obj.nbytes(), device=torch.device(location)) RuntimeError: CUDA out of memory. Tried to allocate 678.00 MiB (GPU 0; 31.72 GiB total capacity; 0 bytes already allocated; 593.94 MiB free; 0 bytes reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF """ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/opt/conda/envs/kgs2s/lib/python3.8/site-packages/joblib/externals/loky/_base.py", line 26, in _invoke_callbacks callback(self) File "/opt/conda/envs/kgs2s/lib/python3.8/site-packages/joblib/parallel.py", line 385, in __call__ self.parallel.dispatch_next() File "/opt/conda/envs/kgs2s/lib/python3.8/site-packages/joblib/parallel.py", line 834, in dispatch_next if not self.dispatch_one_batch(self._original_iterator): File "/opt/conda/envs/kgs2s/lib/python3.8/site-packages/joblib/parallel.py", line 901, in dispatch_one_batch self._dispatch(tasks) File "/opt/conda/envs/kgs2s/lib/python3.8/site-packages/joblib/parallel.py", line 819, in _dispatch job = self._backend.apply_async(batch, callback=cb) File "/opt/conda/envs/kgs2s/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 556, in apply_async future = self._workers.submit(SafeFunction(func)) File "/opt/conda/envs/kgs2s/lib/python3.8/site-packages/joblib/externals/loky/reusable_executor.py", line 176, in submit return super().submit(fn, *args, **kwargs) File "/opt/conda/envs/kgs2s/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 1129, in submit raise self._flags.broken joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.