yamle.methods.moe module#
- class yamle.methods.moe.MultiHeadEnsembleMethod(head_expansion_factor, head_depth, *args, **kwargs)[source]#
Bases:
MemberMethodThis class is the extension of the base method which accepts a single input and has multiple heads. The prediction is performed by averaging the predictions of the heads.
- Parameters:
- static add_specific_args(parent_parser)[source]#
This method is used to add the specific arguments for this method.
- Return type:
ArgumentParser
- test_name: Optional[str]#
- prepare_data_per_node: bool#
- allow_zero_length_dataloader_with_multiple_devices: bool#
- training: bool#
- class yamle.methods.moe.MixtureOfExpertsMethod(gating_expansion_factor, gating_depth, alpha=1.0, beta=1.0, k=1, noisy_gate=False, *args, **kwargs)[source]#
Bases:
MultiHeadEnsembleMethodThis class is the extension of the multi-head method which accepts a single input and has multiple heads. The prediction is performed by averaging the predictions of the heads and the model is trained as the mixture of experts model. A gating network is used to determine the weights of the heads.
- Parameters:
gating_expansion_factor¶ (float) – The hidden size expansion factor for the gating network. Default: 1.
gating_depth¶ (int) – The depth of the gating network. Default: 1.
alpha¶ (float) – The alpha parameter for weighting the importance loss. Default: 1.0.
beta¶ (float) – The beta parameter for weighting the load-balancing loss. Default: 1.0.
k¶ (int) – How many experts to sample from. Default: 1.
noisy_gate¶ (bool) – Whether to use noisy gate or not. Default: False.
- test_name: Optional[str]#
- prepare_data_per_node: bool#
- allow_zero_length_dataloader_with_multiple_devices: bool#
- training: bool#