yamle.models.visual_transformer module#
- class yamle.models.visual_transformer.SpatialPositionalEmbedding(inputs_dim, patch_size, embedding_dim, dropout=0.0, num_cls_tokens=1, positional_embedding=True)[source]#
Bases:
ModuleThis class is used to create a spatial positional embedding to be used in the visual transformer for 2D images.
- Parameters:
inputs_dim¶ (Tuple[int, int, int]) – The dimension of the input.
patch_size¶ (int) – The size of the patch.
embedding_dim¶ (int) – The dimension of the embedding.
dropout¶ (float) – The dropout rate.
num_cls_tokens¶ (int) – The number of class tokens. Defaults to 1.
positional_embedding¶ (bool) – Whether to use positional embedding. Defaults to True.
- get_cls_token_indices()[source]#
This method is used to get the indices of the class tokens.
They are added as the first tokens in the sequence.
- Return type:
Tensor
-
training:
bool#
- class yamle.models.visual_transformer.VisualTransformerModel(patch_size=4, pooling='mean', embedding_dim=128, num_heads=6, depth=4, num_cls_tokens=1, hidden_dim=512, width_multiplier=1, dropout=0.0, *args, **kwargs)[source]#
Bases:
BaseModelThis class is used to create a visual transformer model.
- Parameters:
patch_size¶ (int) – The size of the patch to be used.
pooling¶ (str) – The pooling to be used. It can be either mean or cls.
embedding_dim¶ (int) – The number of expected features in the input.
num_heads¶ (int) – The number of heads in the multiheadattention models.
depth¶ (int) – The number of sub-encoder-layers in the encoder.
num_cls_tokens¶ (int) – The number of class tokens. Defaults to 1.
hidden_dim¶ (int) – The dimension of the feedforward network model.
width_multiplier¶ (int) – The width multiplier for the hidden dimension.
dropout¶ (float) – The dropout value.
- tasks = ['classification', 'regression']#
- add_method_specific_layers(method, **kwargs)[source]#
This method is used to add method specific layers to the model.
- Parameters:
method¶ (str) – The method to use.
- Return type:
None
- forward(x, staged_output=False, input_kwargs={}, output_kwargs={})[source]#
Forward pass of the model.
- Parameters:
- Return type:
Union[Tensor,Tuple[Tensor,List[Tensor]]]
- final_layer(x, **output_kwargs)[source]#
This function is used to get the final layer output.
- Return type:
Tensor
- static add_specific_args(parser)[source]#
This function is used to add specific arguments to the parser.
- Return type:
ArgumentParser
-
training:
bool#