yamle.methods.moe module#

class yamle.methods.moe.MultiHeadEnsembleMethod(head_expansion_factor, head_depth, *args, **kwargs)[source]#

Bases: MemberMethod

This class is the extension of the base method which accepts a single input and has multiple heads. The prediction is performed by averaging the predictions of the heads.

Parameters:

head_expansion_factor¶ (float) – The hidden size expansion factor for the heads. Default: 1.
head_depth¶ (int) – The depth of the heads. Default: 1.

static add_specific_args(parent_parser)[source]#

This method is used to add the specific arguments for this method.

Return type:: ArgumentParser

test_name: Optional[str]#

prepare_data_per_node: bool#

allow_zero_length_dataloader_with_multiple_devices: bool#

training: bool#

class yamle.methods.moe.MixtureOfExpertsMethod(gating_expansion_factor, gating_depth, alpha=1.0, beta=1.0, k=1, noisy_gate=False, *args, **kwargs)[source]#

Bases: MultiHeadEnsembleMethod

This class is the extension of the multi-head method which accepts a single input and has multiple heads. The prediction is performed by averaging the predictions of the heads and the model is trained as the mixture of experts model. A gating network is used to determine the weights of the heads.

Parameters:

gating_expansion_factor¶ (float) – The hidden size expansion factor for the gating network. Default: 1.
gating_depth¶ (int) – The depth of the gating network. Default: 1.
alpha¶ (float) – The alpha parameter for weighting the importance loss. Default: 1.0.
beta¶ (float) – The beta parameter for weighting the load-balancing loss. Default: 1.0.
k¶ (int) – How many experts to sample from. Default: 1.
noisy_gate¶ (bool) – Whether to use noisy gate or not. Default: False.

test_name: Optional[str]#

prepare_data_per_node: bool#

allow_zero_length_dataloader_with_multiple_devices: bool#

training: bool#

static add_specific_args(parent_parser)[source]#

This method is used to add the specific arguments to the parser.

Return type:: ArgumentParser