sed in GPT-2). Basically works like a linear layer but the weights are transposed. Args: nf (`int`): The number of output features. nx (`int`): The number of input features. initializer_range (`float`, *optional*, defaults to 0.02): The standard deviation to use to initialize the weights. kwargs (`Dict[str, Any]`, *optional*): Additional keyword arguments passed along to the `__init__` of `tf.keras.layers.Layer`. r>