yamle.models.visual_transformer module#

class yamle.models.visual_transformer.SpatialPositionalEmbedding(inputs_dim, patch_size, embedding_dim, dropout=0.0, num_cls_tokens=1, positional_embedding=True)[source]#

Bases: Module

This class is used to create a spatial positional embedding to be used in the visual transformer for 2D images.

Parameters:

inputs_dim¶ (Tuple[int, int, int]) – The dimension of the input.
patch_size¶ (int) – The size of the patch.
embedding_dim¶ (int) – The dimension of the embedding.
dropout¶ (float) – The dropout rate.
num_cls_tokens¶ (int) – The number of class tokens. Defaults to 1.
positional_embedding¶ (bool) – Whether to use positional embedding. Defaults to True.

forward(x)[source]#

This method is used to get the forward pass of the model.

Return type:: Tensor

get_cls_token_indices()[source]#

This method is used to get the indices of the class tokens.

They are added as the first tokens in the sequence.

Return type:: Tensor

training: bool#

class yamle.models.visual_transformer.VisualTransformerModel(patch_size=4, pooling='mean', embedding_dim=128, num_heads=6, depth=4, num_cls_tokens=1, hidden_dim=512, width_multiplier=1, dropout=0.0, *args, **kwargs)[source]#

Bases: BaseModel

This class is used to create a visual transformer model.

Parameters:

patch_size¶ (int) – The size of the patch to be used.
pooling¶ (str) – The pooling to be used. It can be either mean or cls.
embedding_dim¶ (int) – The number of expected features in the input.
num_heads¶ (int) – The number of heads in the multiheadattention models.
depth¶ (int) – The number of sub-encoder-layers in the encoder.
num_cls_tokens¶ (int) – The number of class tokens. Defaults to 1.
hidden_dim¶ (int) – The dimension of the feedforward network model.
width_multiplier¶ (int) – The width multiplier for the hidden dimension.
dropout¶ (float) – The dropout value.

tasks = ['classification', 'regression']#

add_method_specific_layers(method, **kwargs)[source]#

This method is used to add method specific layers to the model.

Parameters:: method¶ (str) – The method to use.
Return type:: None

forward(x, staged_output=False, input_kwargs={}, output_kwargs={})[source]#

Forward pass of the model.

Parameters:

x¶ (torch.Tensor) – The input tensor.
staged_output¶ (bool, optional) – Whether to return the output of each layer. Defaults to False.
input_kwargs¶ (Dict[str, Any], optional) – The input kwargs. Defaults to {}.
output_kwargs¶ (Dict[str, Any], optional) – The output kwargs. Defaults to {}.

Return type:

Union[Tensor, Tuple[Tensor, List[Tensor]]]

final_layer(x, **output_kwargs)[source]#

This function is used to get the final layer output.

Return type:: Tensor

static add_specific_args(parser)[source]#

This function is used to add specific arguments to the parser.

Return type:: ArgumentParser

training: bool#