ns below for exact dimensions of all variables. You can find more details in https://arxiv.org/abs/1402.1128. Args: input_size: The number of expected features in the input `x` hidden_size: The number of features in the hidden state `h` num_layers: Number of recurrent layers. E.g., setting ``num_layers=2`` would mean stacking two LSTMs together to form a `stacked LSTM`, with the second LSTM taking in outputs of the first LSTM and computing the final results. Default: 1 bias: If ``False``, then the layer does not use bias weights `b_ih` and `b_hh`. Default: ``True`` batch_first: If ``True``, then the input and output tensors are provided as `(batch, seq, feature)` instead of `(seq, batch, feature)`. Note that this does not apply to hidden or cell states. See the Inputs/Outputs sections below for details. Default: ``False`` dropout: If non-zero, introduces a `Dropout` layer on the outputs of each LSTM layer except the last layer, with dropout probability equal to :attr:`dropout`. Default: 0 bidirectional: If ``True``, becomes a bidirectional LSTM. Default: ``False`` proj_size: If ``> 0``, will use LSTM with projections of corresponding size. Default: 0 Inputs: input, (h_0, c_0) * **input**: tensor of shape :math:`(L, H_{in})` for unbatched input, :math:`(L, N, H_{in})` when ``batch_first=False`` or :math:`(N, L, H_{in})` when ``batch_first=True`` containing the features of the input sequence. The input can also be a packed variable length sequence. See :func:`torch.nn.utils.rnn.pack_padded_sequence` or :func:`torch.nn.utils.rnn.pack_sequence` for details. * **h_0**: tensor of shape :math:`(D * \text{num\_layers}, H_{out})` for unbatched input or :math:`(D * \text{num\_layers}, N, H_{out})` containing the initial hidden state for each element in the input sequence. Defaults to zeros if (h_0, c_0) is not provided. * **c_0**: tensor of shape :math:`(D * \text{num\_layers}, H_{cell})` for unbatched input or :math:`(D * \text{num\_layers}, N, H_{cell})` containing the initial cell state for each element in the input sequence. Defaults to zeros if (h_0, c_0) is not provided. where: .. math:: \begin{aligned} N ={} & \text{batch size} \\ L ={} & \text{sequence length} \\ D ={} & 2 \text{ if bidirectional=True otherwise } 1 \\ H_{in} ={} & \text{input\_size} \\ H_{cell} ={} & \text{hidden\_size} \\ H_{out} ={} & \text{proj\_size if } \text{proj\_size}>0 \text{ otherwise hidden\_size} \\ \end{aligned} Outputs: output, (h_n, c_n) * **output**: tensor of shape :math:`(L, D * H_{out})` for unbatched input, :math:`(L, N, D * H_{out})` when ``batch_first=False`` or :math:`(N, L, D * H_{out})` when ``batch_first=True`` containing the output features `(h_t)` from the last layer of the LSTM, for each `t`. If a :class:`torch.nn.utils.rnn.PackedSequence` has been given as the input, the output will also be a packed sequence. When ``bidirectional=True``, `output` will contain a concatenation of the forward and reverse hidden states at each time step in the sequence. * **h_n**: tensor of shape :math:`(D * \text{num\_layers}, H_{out})` for unbatched input or :math:`(D * \text{num\_layers}, N, H_{out})` containing the final hidden state for each element in the sequence. When ``bidirectional=True``, `h_n` will contain a concatenation of the final forward and reverse hidden states, respectively. * **c_n**: tensor of shape :math:`(D * \text{num\_layers}, H_{cell})` for unbatched input or :math:`(D * \text{num\_layers}, N, H_{cell})` containing the final cell state for each element in the sequence. When ``bidirectional=True``, `c_n` will contain a concatenation of the final forward and reverse cell states, respectively. Attributes: weight_ih_l[k] : the learnable input-hidden weights of the :math:`\text{k}^{th}` layer `(W_ii|W_if|W_ig|W_io)`, of shape `(4*hidden_size, input_size)` for `k = 0`. Otherwise, the shape is `(4*hidden_size, num_directions * hidden_size)`. If ``proj_size > 0`` was specified, the shape will be `(4*hidden_size, num_directions * proj_size)` for `k > 0` weight_hh_l[k] : the learnable hidden-hidden weights of the :math:`\text{k}^{th}` layer `(W_hi|W_hf|W_hg|W_ho)`, of shape `(4*hidden_size, hidden_size)`. If ``proj_size > 0`` was specified, the shape will be `(4*hidden_size, proj_size)`. bias_ih_l[k] : the learnable input-hidden bias of the :math:`\text{k}^{th}` layer `(b_ii|b_if|b_ig|b_io)`, of shape `(4*hidden_size)` bias_hh_l[k] : the learnable hidden-hidden bias of the :math:`\text{k}^{th}` layer `(b_hi|b_hf|b_hg|b_ho)`, of shape `(4*hidden_size)` weight_hr_l[k] : the learnable projection weights of the :math:`\text{k}^{th}` layer of shape `(proj_size, hidden_size)`. Only present when ``proj_size > 0`` was specified. weight_ih_l[k]_reverse: Analogous to `weight_ih_l[k]` for the reverse direction. Only present when ``bidirectional=True``. weight_hh_l[k]_reverse: Analogous to `weight_hh_l[k]` for the reverse direction. Only present when ``bidirectional=True``. bias_ih_l[k]_reverse: Analogous to `bias_ih_l[k]` for the reverse direction. Only present when ``bidirectional=True``. bias_hh_l[k]_reverse: Analogous to `bias_hh_l[k]` for the reverse direction. Only present when ``bidirectional=True``. weight_hr_l[k]_reverse: Analogous to `weight_hr_l[k]` for the reverse direction. Only present when ``bidirectional=True`` and ``proj_size > 0`` was specified. .. note:: All the weights and biases are initialized from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where :math:`k = \frac{1}{\text{hidden\_size}}` .. note:: For bidirectional LSTMs, forward and backward are directions 0 and 1 respectively. Example of splitting the output layers when ``batch_first=False``: ``output.view(seq_len, batch, num_directions, hidden_size)``. .. note:: For bidirectional LSTMs, `h_n` is not equivalent to the last element of `output`; the former contains the final forward and reverse hidden states, while the latter contains the final forward hidden state and the initial reverse hidden state. .. note:: ``batch_first`` argument is ignored for unbatched inputs. .. include:: ../cudnn_rnn_determinism.rst .. include:: ../cudnn_persistent_rnn.rst Examples:: >>> rnn = nn.LSTM(10, 20, 2) >>> input = torch.randn(5, 3, 10) >>> h0 = torch.randn(2, 3, 20) >>> c0 = torch.randn(2, 3, 20) >>> output, (hn, cn) = rnn(input, (h0, c0)) c