yamle.models.transformer module#

class yamle.models.transformer.PreNorm(dim, module)[source]#

Bases: Sequential

This class implements the pre-normalization layer.

Parameters:
  • dim (int) – The dimension of the input.

  • module (nn.Module) – The module to be applied after the normalization.

class yamle.models.transformer.FeedForward(dim, hidden_dim, dropout, dense=<class 'torch.nn.modules.linear.Linear'>)[source]#

Bases: Sequential

This class implements the feed-forward layer.

It consists of two linear layers with GELU activation and dropout.

Parameters:
  • dim (int) – The dimension of the input.

  • hidden_dim (int) – The dimension of the hidden layer.

  • dropout (float) – The dropout rate.

  • dense (nn.Module) – The dense layer to be used. Defaults to nn.Linear.

extra_repr()[source]#

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

Return type:

str

class yamle.models.transformer.Attention(dim, heads, dim_head, dropout, causal=False)[source]#

Bases: Module

This class implements the attention layer.

It computes multi-head attention.

Parameters:
  • dim (int) – The dimension of the input.

  • heads (int) – The number of heads.

  • dim_head (int) – The dimension of each head.

  • dropout (float) – The dropout rate.

  • causal (bool) – Whether to use causal attention. Defaults to False.

forward(x)[source]#

Forward pass of the model.

Return type:

Tensor

extra_repr()[source]#

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

Return type:

str

training: bool#
class yamle.models.transformer.TransformerEncoderLayer(dim, heads, dim_head, mlp_dim, dropout, causal=False)[source]#

Bases: Sequential

This class implements the transformer encoder layer.

It consists of a multi-head attention layer and a feed-forward layer. It also implements the residual connection and layer normalization.

Parameters:
  • dim (int) – The dimension of the input.

  • heads (int) – The number of heads.

  • dim_head (int) – The dimension of each head.

  • mlp_dim (int) – The dimension of the hidden layer in the feed-forward layer.

  • dropout (float) – The dropout rate.

  • causal (bool) – Whether to use causal attention. Defaults to False.

extra_repr()[source]#

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

Return type:

str

class yamle.models.transformer.PositionalEncoding(inputs_dim, embedding_dim, dropout, max_len=5000)[source]#

Bases: Module

This class is used to create a module to implement the positional encoding.

Parameters:
  • inputs_dim (int) – The total size of token embeddings.

  • embedding_dim (int) – The number of expected features in the input.

  • dropout (float) – The dropout value.

  • max_len (int) – The max length of the expected input.

forward(x)[source]#

Forward pass of the model.

Return type:

Tensor

reset_parameters()[source]#

This function is used to initialize the parameters of the model.

Return type:

None

training: bool#
class yamle.models.transformer.TransformerModel(embedding_dim, num_heads, num_decoder_layers, hidden_dim, dropout, *args, **kwargs)[source]#

Bases: BaseModel

This class is used to create a Transformer decoder model.

It is based on the PyTorch implementation of the Transformer model.

Parameters:
  • embedding_dim (int) – The embedding dimensions of the model.

  • num_heads (int) – The number of heads in the multiheadattention models.

  • num_decoder_layers (int) – The number of sub-decoder-layers in the decoder.

  • hidden_dim (int) – The dimension of the feedforward network model.

  • dropout (float) – The dropout value.

tasks = ['text_classification']#
reset_parameters()[source]#

This function is used to initialize the parameters of the model.

Return type:

None

forward(x, staged_output=False, input_kwargs={}, output_kwargs={})[source]#

Forward pass of the model.

Note that the input has a shape of (batch_size, seq_len).

Parameters:
  • x (torch.Tensor) – The input tensor.

  • staged_output (bool) – Whether to return the output of each layer.

  • input_kwargs (Dict[str, Any]) – The kwargs for the input layer.

  • output_kwargs (Dict[str, Any]) – The kwargs for the output layer.

Return type:

Tensor

final_layer(x, **output_kwargs)[source]#

This function is used to get the final layer output.

Return type:

Tensor

generate(input, max_len, temperature=1.0, **kwargs)[source]#

This function is used to generate output by passing the input through the model.

Return type:

Tensor

add_method_specific_layers(method, **kwargs)[source]#

This method is used to add method specific layers to the model.

Parameters:

method (str) – The method to use.

Return type:

None

static add_specific_args(parser)[source]#

This function is used to add specific arguments to the parser.

Return type:

ArgumentParser

training: bool#