as (Tensor): None or attn_bias of the last attention layer residual (Optional[Tensor]): residual value prob (float): dropout probability training (bool): whether in training mode or not Returns: Tensor: dropout(x + bias) + residual N)