yamle.methods.moe module#

class yamle.methods.moe.MultiHeadEnsembleMethod(head_expansion_factor, head_depth, *args, **kwargs)[source]#

Bases: MemberMethod

This class is the extension of the base method which accepts a single input and has multiple heads. The prediction is performed by averaging the predictions of the heads.

Parameters:
  • head_expansion_factor (float) – The hidden size expansion factor for the heads. Default: 1.

  • head_depth (int) – The depth of the heads. Default: 1.

static add_specific_args(parent_parser)[source]#

This method is used to add the specific arguments for this method.

Return type:

ArgumentParser

test_name: Optional[str]#
prepare_data_per_node: bool#
allow_zero_length_dataloader_with_multiple_devices: bool#
training: bool#
class yamle.methods.moe.MixtureOfExpertsMethod(gating_expansion_factor, gating_depth, alpha=1.0, beta=1.0, k=1, noisy_gate=False, *args, **kwargs)[source]#

Bases: MultiHeadEnsembleMethod

This class is the extension of the multi-head method which accepts a single input and has multiple heads. The prediction is performed by averaging the predictions of the heads and the model is trained as the mixture of experts model. A gating network is used to determine the weights of the heads.

Parameters:
  • gating_expansion_factor (float) – The hidden size expansion factor for the gating network. Default: 1.

  • gating_depth (int) – The depth of the gating network. Default: 1.

  • alpha (float) – The alpha parameter for weighting the importance loss. Default: 1.0.

  • beta (float) – The beta parameter for weighting the load-balancing loss. Default: 1.0.

  • k (int) – How many experts to sample from. Default: 1.

  • noisy_gate (bool) – Whether to use noisy gate or not. Default: False.

test_name: Optional[str]#
prepare_data_per_node: bool#
allow_zero_length_dataloader_with_multiple_devices: bool#
training: bool#
static add_specific_args(parent_parser)[source]#

This method is used to add the specific arguments to the parser.

Return type:

ArgumentParser