torch.distributed.init_process_group
(backend, init_method=None, timeout=datetime.timedelta(0, 1800), world_size=-1, rank=-1, store=None, group_name='')[source]
Initializes the default distributed process group, and this will also initialize the distributed package.
There are 2 main ways to initialize a process group:
store
, rank
, and world_size
explicitly.
init_method
(a URL string) which indicates where/how to discover peers. Optionally specify rank
and world_size
, or encode all required parameters in the URL and omit them.
If neither is specified, init_method
is assumed to be “env://”.
Parameters:
mpi
, gloo
, and nccl
. This field should be given as a lowercase string (e.g., "gloo"
), which can also be accessed via Backend
attributes (e.g., Backend.GLOO
). If using multiple processes per machine with nccl
backend, each process must have exclusive access to every GPU it uses, as sharing GPUs between processes can result in deadlocks.
init_method
or store
is specified. Mutually exclusive with store
.
store
is specified.
store
is specified.
init_method
.
gloo
backend. For nccl
, this is applicable only if the environment variable NCCL_BLOCKING_WAIT
is set to 1.
To enable backend == Backend.MPI
, PyTorch needs to built from source on a system that supports MPI.