yamle.models.visual_transformer module#

class yamle.models.visual_transformer.SpatialPositionalEmbedding(inputs_dim, patch_size, embedding_dim, dropout=0.0, num_cls_tokens=1, positional_embedding=True)[source]#

Bases: Module

This class is used to create a spatial positional embedding to be used in the visual transformer for 2D images.

Parameters:
  • inputs_dim (Tuple[int, int, int]) – The dimension of the input.

  • patch_size (int) – The size of the patch.

  • embedding_dim (int) – The dimension of the embedding.

  • dropout (float) – The dropout rate.

  • num_cls_tokens (int) – The number of class tokens. Defaults to 1.

  • positional_embedding (bool) – Whether to use positional embedding. Defaults to True.

forward(x)[source]#

This method is used to get the forward pass of the model.

Return type:

Tensor

get_cls_token_indices()[source]#

This method is used to get the indices of the class tokens.

They are added as the first tokens in the sequence.

Return type:

Tensor

training: bool#
class yamle.models.visual_transformer.VisualTransformerModel(patch_size=4, pooling='mean', embedding_dim=128, num_heads=6, depth=4, num_cls_tokens=1, hidden_dim=512, width_multiplier=1, dropout=0.0, *args, **kwargs)[source]#

Bases: BaseModel

This class is used to create a visual transformer model.

Parameters:
  • patch_size (int) – The size of the patch to be used.

  • pooling (str) – The pooling to be used. It can be either mean or cls.

  • embedding_dim (int) – The number of expected features in the input.

  • num_heads (int) – The number of heads in the multiheadattention models.

  • depth (int) – The number of sub-encoder-layers in the encoder.

  • num_cls_tokens (int) – The number of class tokens. Defaults to 1.

  • hidden_dim (int) – The dimension of the feedforward network model.

  • width_multiplier (int) – The width multiplier for the hidden dimension.

  • dropout (float) – The dropout value.

tasks = ['classification', 'regression']#
add_method_specific_layers(method, **kwargs)[source]#

This method is used to add method specific layers to the model.

Parameters:

method (str) – The method to use.

Return type:

None

forward(x, staged_output=False, input_kwargs={}, output_kwargs={})[source]#

Forward pass of the model.

Parameters:
  • x (torch.Tensor) – The input tensor.

  • staged_output (bool, optional) – Whether to return the output of each layer. Defaults to False.

  • input_kwargs (Dict[str, Any], optional) – The input kwargs. Defaults to {}.

  • output_kwargs (Dict[str, Any], optional) – The output kwargs. Defaults to {}.

Return type:

Union[Tensor, Tuple[Tensor, List[Tensor]]]

final_layer(x, **output_kwargs)[source]#

This function is used to get the final layer output.

Return type:

Tensor

static add_specific_args(parser)[source]#

This function is used to add specific arguments to the parser.

Return type:

ArgumentParser

training: bool#