attention maps returned by `GroupViTVisionTransformer` hw_shape (`tuple(int)`): height and width of the output attention map Returns: `torch.Tensor`: the attention map of shape [batch_size, groups, height, width] Nr