yamle.models.transformer module#

class yamle.models.transformer.PreNorm(dim, module)[source]#

Bases: Sequential

This class implements the pre-normalization layer.

Parameters:

dim¶ (int) – The dimension of the input.
module¶ (nn.Module) – The module to be applied after the normalization.

class yamle.models.transformer.FeedForward(dim, hidden_dim, dropout, dense=<class 'torch.nn.modules.linear.Linear'>)[source]#

Bases: Sequential

This class implements the feed-forward layer.

It consists of two linear layers with GELU activation and dropout.

Parameters:

dim¶ (int) – The dimension of the input.
hidden_dim¶ (int) – The dimension of the hidden layer.
dropout¶ (float) – The dropout rate.
dense¶ (nn.Module) – The dense layer to be used. Defaults to nn.Linear.

extra_repr()[source]#

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

Return type:: str

class yamle.models.transformer.Attention(dim, heads, dim_head, dropout, causal=False)[source]#

Bases: Module

This class implements the attention layer.

It computes multi-head attention.

Parameters:

dim¶ (int) – The dimension of the input.
heads¶ (int) – The number of heads.
dim_head¶ (int) – The dimension of each head.
dropout¶ (float) – The dropout rate.
causal¶ (bool) – Whether to use causal attention. Defaults to False.

forward(x)[source]#

Forward pass of the model.

Return type:: Tensor

extra_repr()[source]#

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

Return type:: str

training: bool#

class yamle.models.transformer.TransformerEncoderLayer(dim, heads, dim_head, mlp_dim, dropout, causal=False)[source]#

Bases: Sequential

This class implements the transformer encoder layer.

It consists of a multi-head attention layer and a feed-forward layer. It also implements the residual connection and layer normalization.

Parameters:

dim¶ (int) – The dimension of the input.
heads¶ (int) – The number of heads.
dim_head¶ (int) – The dimension of each head.
mlp_dim¶ (int) – The dimension of the hidden layer in the feed-forward layer.
dropout¶ (float) – The dropout rate.
causal¶ (bool) – Whether to use causal attention. Defaults to False.

extra_repr()[source]#

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

Return type:: str

class yamle.models.transformer.PositionalEncoding(inputs_dim, embedding_dim, dropout, max_len=5000)[source]#

Bases: Module

This class is used to create a module to implement the positional encoding.

Parameters:

inputs_dim¶ (int) – The total size of token embeddings.
embedding_dim¶ (int) – The number of expected features in the input.
dropout¶ (float) – The dropout value.
max_len¶ (int) – The max length of the expected input.

forward(x)[source]#

Forward pass of the model.

Return type:: Tensor

reset_parameters()[source]#

This function is used to initialize the parameters of the model.

Return type:: None

training: bool#

class yamle.models.transformer.TransformerModel(embedding_dim, num_heads, num_decoder_layers, hidden_dim, dropout, *args, **kwargs)[source]#

Bases: BaseModel

This class is used to create a Transformer decoder model.

It is based on the PyTorch implementation of the Transformer model.

Parameters:

embedding_dim¶ (int) – The embedding dimensions of the model.
num_heads¶ (int) – The number of heads in the multiheadattention models.
num_decoder_layers¶ (int) – The number of sub-decoder-layers in the decoder.
hidden_dim¶ (int) – The dimension of the feedforward network model.
dropout¶ (float) – The dropout value.

tasks = ['text_classification']#

reset_parameters()[source]#

This function is used to initialize the parameters of the model.

Return type:: None

forward(x, staged_output=False, input_kwargs={}, output_kwargs={})[source]#

Forward pass of the model.

Note that the input has a shape of (batch_size, seq_len).

Parameters:

x¶ (torch.Tensor) – The input tensor.
staged_output¶ (bool) – Whether to return the output of each layer.
input_kwargs¶ (Dict[str, Any]) – The kwargs for the input layer.
output_kwargs¶ (Dict[str, Any]) – The kwargs for the output layer.

Return type:

Tensor

final_layer(x, **output_kwargs)[source]#

This function is used to get the final layer output.

Return type:: Tensor

generate(input, max_len, temperature=1.0, **kwargs)[source]#

This function is used to generate output by passing the input through the model.

Return type:: Tensor

add_method_specific_layers(method, **kwargs)[source]#

This method is used to add method specific layers to the model.

Parameters:: method¶ (str) – The method to use.
Return type:: None

static add_specific_args(parser)[source]#

This function is used to add specific arguments to the parser.

Return type:: ArgumentParser

training: bool#