yamle.models.transformer module#
- class yamle.models.transformer.PreNorm(dim, module)[source]#
Bases:
SequentialThis class implements the pre-normalization layer.
- class yamle.models.transformer.FeedForward(dim, hidden_dim, dropout, dense=<class 'torch.nn.modules.linear.Linear'>)[source]#
Bases:
SequentialThis class implements the feed-forward layer.
It consists of two linear layers with GELU activation and dropout.
- Parameters:
- class yamle.models.transformer.Attention(dim, heads, dim_head, dropout, causal=False)[source]#
Bases:
ModuleThis class implements the attention layer.
It computes multi-head attention.
- Parameters:
- extra_repr()[source]#
Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- Return type:
str
-
training:
bool#
- class yamle.models.transformer.TransformerEncoderLayer(dim, heads, dim_head, mlp_dim, dropout, causal=False)[source]#
Bases:
SequentialThis class implements the transformer encoder layer.
It consists of a multi-head attention layer and a feed-forward layer. It also implements the residual connection and layer normalization.
- Parameters:
dim¶ (int) – The dimension of the input.
heads¶ (int) – The number of heads.
dim_head¶ (int) – The dimension of each head.
mlp_dim¶ (int) – The dimension of the hidden layer in the feed-forward layer.
dropout¶ (float) – The dropout rate.
causal¶ (bool) – Whether to use causal attention. Defaults to False.
- class yamle.models.transformer.PositionalEncoding(inputs_dim, embedding_dim, dropout, max_len=5000)[source]#
Bases:
ModuleThis class is used to create a module to implement the positional encoding.
- Parameters:
- reset_parameters()[source]#
This function is used to initialize the parameters of the model.
- Return type:
None
-
training:
bool#
- class yamle.models.transformer.TransformerModel(embedding_dim, num_heads, num_decoder_layers, hidden_dim, dropout, *args, **kwargs)[source]#
Bases:
BaseModelThis class is used to create a Transformer decoder model.
It is based on the PyTorch implementation of the Transformer model.
- Parameters:
embedding_dim¶ (int) – The embedding dimensions of the model.
num_heads¶ (int) – The number of heads in the multiheadattention models.
num_decoder_layers¶ (int) – The number of sub-decoder-layers in the decoder.
hidden_dim¶ (int) – The dimension of the feedforward network model.
dropout¶ (float) – The dropout value.
- tasks = ['text_classification']#
- reset_parameters()[source]#
This function is used to initialize the parameters of the model.
- Return type:
None
- forward(x, staged_output=False, input_kwargs={}, output_kwargs={})[source]#
Forward pass of the model.
Note that the input has a shape of (batch_size, seq_len).
- final_layer(x, **output_kwargs)[source]#
This function is used to get the final layer output.
- Return type:
Tensor
- generate(input, max_len, temperature=1.0, **kwargs)[source]#
This function is used to generate output by passing the input through the model.
- Return type:
Tensor
- add_method_specific_layers(method, **kwargs)[source]#
This method is used to add method specific layers to the model.
- Parameters:
method¶ (str) – The method to use.
- Return type:
None
- static add_specific_args(parser)[source]#
This function is used to add specific arguments to the parser.
- Return type:
ArgumentParser
-
training:
bool#