API Doc¶

Experiment¶

class xnmt.experiments.ExpGlobal(model_file='{EXP_DIR}/models/{EXP}.mod', log_file='{EXP_DIR}/logs/{EXP}.log', dropout=0.3, weight_noise=0.0, default_layer_dim=512, param_init=bare(GlorotInitializer), bias_init=bare(ZeroInitializer), truncate_dec_batches=False, save_num_checkpoints=1, loss_comb_method='sum', commandline_args={}, placeholders={})[source]¶

Bases: xnmt.persistence.Serializable

An object that holds global settings that can be referenced by components wherever appropriate.

Parameters:

model_file (str) – Location to write model file to
log_file (str) – Location to write log file to
dropout (Real) – Default dropout probability that should be used by supporting components but can be overwritten
weight_noise (Real) – Default weight noise level that should be used by supporting components but can be overwritten
default_layer_dim (Integral) – Default layer dimension that should be used by supporting components but can be overwritten
param_init (ParamInitializer) – Default parameter initializer that should be used by supporting components but can be overwritten
bias_init (ParamInitializer) – Default initializer for bias parameters that should be used by supporting components but can be overwritten
truncate_dec_batches (bool) – whether the decoder drops batch elements as soon as these are masked at some time step.
save_num_checkpoints (Integral) – save DyNet parameters for the most recent n checkpoints, useful for model averaging/ensembling
loss_comb_method (str) – method for combining loss across batch elements (‘sum’ or ‘avg’).
commandline_args (dict) – Holds commandline arguments with which XNMT was launched
placeholders (Dict[str, Any]) – these will be used as arguments for a format() call applied to every string in the config. For example, placeholders: {"PATH":"/some/path"} will cause each occurence of ``"{PATH}" in a string to be replaced by "/some/path". As a special variable, EXP_DIR can be specified to overwrite the default location for writing models, logs, and other files.

class xnmt.experiments.Experiment(name, exp_global=bare(ExpGlobal), preproc=None, model=None, train=None, evaluate=None, random_search_report=None, status=None)[source]¶

Bases: xnmt.persistence.Serializable

A default experiment that performs preprocessing, training, and evaluation.

The initializer calls ParamManager.populate(), meaning that model construction should be finalized at this point. __call__() runs the individual steps.

Parameters:

name (str) – name of experiment
exp_global (Optional[ExpGlobal]) – global experiment settings
preproc (Optional[PreprocRunner]) – carry out preprocessing if specified
model (Optional[TrainableModel]) – The main model. In the case of multitask training, several models must be specified, in which case the models will live not here but inside the training task objects.
train (Optional[TrainingRegimen]) – The training regimen defines the training loop.
evaluate (Optional[List[EvalTask]]) – list of tasks to evaluate the model after training finishes.
random_search_report (Optional[dict]) – When random search is used, this holds the settings that were randomly drawn for documentary purposes.
status (Optional[str]) – Status of the experiment, will be automatically set to “done” in saved model if the experiment has finished running.

Model¶

Model Base Classes¶

class xnmt.models.base.TrainableModel[source]¶

Bases: object

A template class for a basic trainable model, implementing a loss function.

calc_nll(*args, **kwargs)[source]¶

Calculate loss based on input-output pairs.

Losses are accumulated only across unmasked timesteps in each batch element.

Arguments are to be defined by subclasses

Return type:	`Expression`
Returns:	A (possibly batched) expression representing the loss.

class xnmt.models.base.UnconditionedModel(trg_reader)[source]¶

Bases: xnmt.models.base.TrainableModel

A template class for trainable model that computes target losses without conditioning on other inputs.

Parameters:	trg_reader (`InputReader`) – target reader

calc_nll(trg)[source]¶

Calculate loss based on target inputs.

Losses are accumulated only across unmasked timesteps in each batch element.

Parameters:	trg (`Union`[`Batch`, `Sentence`]) – The target, a sentence or a batch of sentences.
Return type:	`Expression`
Returns:	A (possibly batched) expression representing the loss.

class xnmt.models.base.ConditionedModel(src_reader, trg_reader)[source]¶

Bases: xnmt.models.base.TrainableModel

A template class for a trainable model that computes target losses conditioned on a source input.

Parameters:	src_reader (`InputReader`) – source reader trg_reader (`InputReader`) – target reader

calc_nll(src, trg)[source]¶

Calculate loss based on input-output pairs.

Losses are accumulated only across unmasked timesteps in each batch element.

Parameters:	src (`Union`[`Batch`, `Sentence`]) – The source, a sentence or a batch of sentences. trg (`Union`[`Batch`, `Sentence`]) – The target, a sentence or a batch of sentences.
Return type:	`Expression`
Returns:	A (possibly batched) expression representing the loss.

class xnmt.models.base.GeneratorModel(src_reader, trg_reader=None)[source]¶

Bases: object

A template class for models that can perform inference to generate some kind of output.

Parameters:	src_reader (`InputReader`) – source input reader trg_reader (`Optional`[`InputReader`]) – an optional target input reader, needed in some cases such as n-best scoring

generate(src, *args, **kwargs)[source]¶

Generate outputs.

Parameters:	src (`Batch`) – batch of source-side inputs args – *kwargs – Further arguments to be specified by subclasses
Return type:	`Sequence`[`ReadableSentence`]
Returns:	output objects

class xnmt.models.base.CascadeGenerator(generators)[source]¶

Bases: xnmt.models.base.GeneratorModel, xnmt.persistence.Serializable

A cascade that chains several generator models.

This generator does not support calling generate() directly. Instead, it’s sub-generators should be accessed and used to generate outputs one by one.

Parameters:	generators (`Sequence`[`GeneratorModel`]) – list of generators

generate(*args, **kwargs)[source]¶

Generate outputs.

Parameters:	src – batch of source-side inputs args – *kwargs – Further arguments to be specified by subclasses
Return type:	`Sequence`[`ReadableSentence`]
Returns:	output objects

Translator¶

Embedder¶

class xnmt.modelparts.embedders.Embedder[source]¶

Bases: object

An embedder takes in word IDs and outputs continuous vectors.

This can be done on a word-by-word basis, or over a sequence.

embed(word)[source]¶

Embed a single word.

Parameters:	word (`Any`) – This will generally be an integer word ID, but could also be something like a string. It could also be batched, in which case the input will be a `xnmt.batcher.Batch` of integers or other things.
Return type:	`Expression`
Returns:	Expression corresponding to the embedding of the word(s).

embed_sent(x)[source]¶

Embed a full sentence worth of words. By default, just do a for loop.

Parameters:	x (`Any`) – This will generally be a list of word IDs, but could also be a list of strings or some other format. It could also be batched, in which case it will be a (possibly masked) `xnmt.batcher.Batch` object
Return type:	`ExpressionSequence`
Returns:	An expression sequence representing vectors of each word in the input.

choose_vocab(vocab, yaml_path, src_reader, trg_reader)[source]¶

Choose the vocab for the embedder basd on the passed arguments

This is done in order of priority of vocab, model+yaml_path

Parameters:	vocab (`Vocab`) – If None, try to obtain from `src_reader` or `trg_reader`, depending on the `yaml_path` yaml_path (`Path`) – Path of this embedder in the component hierarchy. Automatically determined when deserializing the YAML model. src_reader (`InputReader`) – Model’s src_reader, if exists and unambiguous. trg_reader (`InputReader`) – Model’s trg_reader, if exists and unambiguous.
Return type:	`Vocab`
Returns:	chosen vocab

choose_vocab_size(vocab_size, vocab, yaml_path, src_reader, trg_reader)[source]¶

Choose the vocab size for the embedder based on the passed arguments

This is done in order of priority of vocab_size, vocab, model+yaml_path

Parameters:	vocab_size (`Integral`) – vocab size or None vocab (`Vocab`) – vocab or None yaml_path (`Path`) – Path of this embedder in the component hierarchy. Automatically determined when YAML-deserializing. src_reader (`InputReader`) – Model’s src_reader, if exists and unambiguous. trg_reader (`InputReader`) – Model’s trg_reader, if exists and unambiguous.
Return type:	`int`
Returns:	chosen vocab size

class xnmt.modelparts.embedders.DenseWordEmbedder(emb_dim=Ref(path=exp_global.default_layer_dim), weight_noise=Ref(path=exp_global.weight_noise, default=0.0), word_dropout=0.0, fix_norm=None, param_init=Ref(path=exp_global.param_init, default=GlorotInitializer@140549230336040), bias_init=Ref(path=exp_global.bias_init, default=ZeroInitializer@140549230336488), vocab_size=None, vocab=None, yaml_path='', src_reader=Ref(path=model.src_reader, default=None), trg_reader=Ref(path=model.trg_reader, default=None))[source]¶

Bases: xnmt.modelparts.embedders.Embedder, xnmt.modelparts.transforms.Linear, xnmt.persistence.Serializable

Word embeddings via full matrix.

Parameters:

emb_dim (Integral) – embedding dimension
weight_noise (Real) – apply Gaussian noise with given standard deviation to embeddings
word_dropout (Real) – drop out word types with a certain probability, sampling word types on a per-sentence level, see https://arxiv.org/abs/1512.05287
fix_norm (Optional[Real]) – fix the norm of word vectors to be radius r, see https://arxiv.org/abs/1710.01329
param_init (ParamInitializer) – how to initialize weight matrices
bias_init (ParamInitializer) – how to initialize bias vectors
vocab_size (Optional[Integral]) – vocab size or None
vocab (Optional[Vocab]) – vocab or None
yaml_path (Path) – Path of this embedder in the component hierarchy. Automatically set by the YAML deserializer.
src_reader (Optional[InputReader]) – A reader for the source side. Automatically set by the YAML deserializer.
trg_reader (Optional[InputReader]) – A reader for the target side. Automatically set by the YAML deserializer.

embed(x)[source]¶

Embed a single word.

Parameters:	word – This will generally be an integer word ID, but could also be something like a string. It could also be batched, in which case the input will be a `xnmt.batcher.Batch` of integers or other things.
Return type:	`Expression`
Returns:	Expression corresponding to the embedding of the word(s).

class xnmt.modelparts.embedders.SimpleWordEmbedder(emb_dim=Ref(path=exp_global.default_layer_dim), weight_noise=Ref(path=exp_global.weight_noise, default=0.0), word_dropout=0.0, fix_norm=None, param_init=Ref(path=exp_global.param_init, default=GlorotInitializer@140549230337328), vocab_size=None, vocab=None, yaml_path=, src_reader=Ref(path=model.src_reader, default=None), trg_reader=Ref(path=model.trg_reader, default=None))[source]¶

Bases: xnmt.modelparts.embedders.Embedder, xnmt.persistence.Serializable

Simple word embeddings via lookup.

Parameters:

emb_dim (Integral) – embedding dimension
weight_noise (Real) – apply Gaussian noise with given standard deviation to embeddings
word_dropout (Real) – drop out word types with a certain probability, sampling word types on a per-sentence level, see https://arxiv.org/abs/1512.05287
fix_norm (Optional[Real]) – fix the norm of word vectors to be radius r, see https://arxiv.org/abs/1710.01329
param_init (ParamInitializer) – how to initialize lookup matrices
vocab_size (Optional[Integral]) – vocab size or None
vocab (Optional[Vocab]) – vocab or None
yaml_path (Path) – Path of this embedder in the component hierarchy. Automatically set by the YAML deserializer.
src_reader (Optional[InputReader]) – A reader for the source side. Automatically set by the YAML deserializer.
trg_reader (Optional[InputReader]) – A reader for the target side. Automatically set by the YAML deserializer.

embed(x)[source]¶

Embed a single word.

Parameters:	word – This will generally be an integer word ID, but could also be something like a string. It could also be batched, in which case the input will be a `xnmt.batcher.Batch` of integers or other things.
Return type:	`Expression`
Returns:	Expression corresponding to the embedding of the word(s).

class xnmt.modelparts.embedders.NoopEmbedder(emb_dim)[source]¶

Bases: xnmt.modelparts.embedders.Embedder, xnmt.persistence.Serializable

This embedder performs no lookups but only passes through the inputs.

Normally, the input is a Sentence object, which is converted to an expression.

Parameters:	emb_dim (`Optional`[`Integral`]) – Size of the inputs

embed(x)[source]¶

Embed a single word.

Parameters:	word – This will generally be an integer word ID, but could also be something like a string. It could also be batched, in which case the input will be a `xnmt.batcher.Batch` of integers or other things.
Return type:	`Expression`
Returns:	Expression corresponding to the embedding of the word(s).

embed_sent(x)[source]¶

Embed a full sentence worth of words. By default, just do a for loop.

Parameters:	x (`Sentence`) – This will generally be a list of word IDs, but could also be a list of strings or some other format. It could also be batched, in which case it will be a (possibly masked) `xnmt.batcher.Batch` object
Return type:	`ExpressionSequence`
Returns:	An expression sequence representing vectors of each word in the input.

class xnmt.modelparts.embedders.PretrainedSimpleWordEmbedder(filename, emb_dim=Ref(path=exp_global.default_layer_dim), weight_noise=Ref(path=exp_global.weight_noise, default=0.0), word_dropout=0.0, fix_norm=None, vocab=None, yaml_path=, src_reader=Ref(path=model.src_reader, default=None), trg_reader=Ref(path=model.trg_reader, default=None))[source]¶

Bases: xnmt.modelparts.embedders.SimpleWordEmbedder, xnmt.persistence.Serializable

Simple word embeddings via lookup. Initial pretrained embeddings must be supplied in FastText text format.

Parameters:

filename (str) – Filename for the pretrained embeddings
emb_dim (Integral) – embedding dimension; if None, use exp_global.default_layer_dim
weight_noise (Real) – apply Gaussian noise with given standard deviation to embeddings; if None, use exp_global.weight_noise
word_dropout (Real) – drop out word types with a certain probability, sampling word types on a per-sentence level, see https://arxiv.org/abs/1512.05287
fix_norm (Optional[Real]) – fix the norm of word vectors to be radius r, see https://arxiv.org/abs/1710.01329
vocab (Optional[Vocab]) – vocab or None
yaml_path (Path) – Path of this embedder in the component hierarchy. Automatically set by the YAML deserializer.
src_reader (Optional[InputReader]) – A reader for the source side. Automatically set by the YAML deserializer.
trg_reader (Optional[InputReader]) – A reader for the target side. Automatically set by the YAML deserializer.

class xnmt.modelparts.embedders.PositionEmbedder(max_pos, emb_dim=Ref(path=exp_global.default_layer_dim), param_init=Ref(path=exp_global.param_init, default=GlorotInitializer@140549230379864))[source]¶

Bases: xnmt.modelparts.embedders.Embedder, xnmt.persistence.Serializable

embed(word)[source]¶

Embed a single word.

Parameters:	word – This will generally be an integer word ID, but could also be something like a string. It could also be batched, in which case the input will be a `xnmt.batcher.Batch` of integers or other things.
Returns:	Expression corresponding to the embedding of the word(s).

embed_sent(sent_len)[source]¶

Embed a full sentence worth of words. By default, just do a for loop.

Parameters:	x – This will generally be a list of word IDs, but could also be a list of strings or some other format. It could also be batched, in which case it will be a (possibly masked) `xnmt.batcher.Batch` object
Return type:	`ExpressionSequence`
Returns:	An expression sequence representing vectors of each word in the input.

Transducer¶

class xnmt.transducers.base.FinalTransducerState(main_expr, cell_expr=None)[source]¶

Bases: object

Represents the final encoder state; Currently handles a main (hidden) state and a cell state. If cell state is not provided, it is created as tanh^{-1}(hidden state). Could in the future be extended to handle dimensions other than h and c.

Parameters:	main_expr (`Expression`) – expression for hidden state cell_expr (`Optional`[`Expression`]) – expression for cell state, if exists

cell_expr()[source]¶

Returns: dy.Expression: cell state; if not given, it is inferred as inverse tanh of main expression

Return type:	`Expression`

class xnmt.transducers.base.SeqTransducer[source]¶

Bases: object

A class that transforms one sequence of vectors into another, using expression_seqs.ExpressionSequence objects as inputs and outputs.

transduce(seq)[source]¶

Parameters should be expression_seqs.ExpressionSequence objects wherever appropriate

Parameters:	seq (`ExpressionSequence`) – An expression sequence representing the input to the transduction
Return type:	`ExpressionSequence`
Returns:	result of transduction, an expression sequence

get_final_states()[source]¶

Returns: A list of FinalTransducerState objects corresponding to a fixed-dimension representation of the input, after having invoked transduce()

Return type:	`List`[`FinalTransducerState`]

class xnmt.transducers.base.ModularSeqTransducer(input_dim, modules)[source]¶

Bases: xnmt.transducers.base.SeqTransducer, xnmt.persistence.Serializable

A sequence transducer that stacks several xnmt.transducer.SeqTransducer objects, all of which must accept exactly one argument (an expression_seqs.ExpressionSequence) in their transduce method.

Parameters:	input_dim (`Integral`) – input dimension (not required) modules (`List`[`SeqTransducer`]) – list of SeqTransducer modules

shared_params()[source]¶

Return the shared parameters of this Serializable class.

This can be overwritten to specify what parameters of this component and its subcomponents are shared. Parameter sharing is performed before any components are initialized, and can therefore only include basic data types that are already present in the YAML file (e.g. # dimensions, etc.) Sharing is performed if at least one parameter is specified and multiple shared parameters don’t conflict. In case of conflict a warning is printed, and no sharing is performed. The ordering of shared parameters is irrelevant. Note also that if a submodule is replaced by a reference, its shared parameters are ignored.

Returns:	objects referencing params of this component or a subcompononent e.g.: return [set([".input_dim", ".sub_module.input_dim", ".submodules_list.0.input_dim"])]

transduce(seq)[source]¶

Parameters should be expression_seqs.ExpressionSequence objects wherever appropriate

Parameters:	seq (`ExpressionSequence`) – An expression sequence representing the input to the transduction
Return type:	`ExpressionSequence`
Returns:	result of transduction, an expression sequence

get_final_states()[source]¶

Returns: A list of FinalTransducerState objects corresponding to a fixed-dimension representation of the input, after having invoked transduce()

Return type:	`List`[`FinalTransducerState`]

class xnmt.transducers.base.IdentitySeqTransducer[source]¶

Bases: xnmt.transducers.base.SeqTransducer, xnmt.persistence.Serializable

A transducer that simply returns the input.

transduce(seq)[source]¶

Parameters should be expression_seqs.ExpressionSequence objects wherever appropriate

Parameters:	seq (`ExpressionSequence`) – An expression sequence representing the input to the transduction
Return type:	`ExpressionSequence`
Returns:	result of transduction, an expression sequence

class xnmt.transducers.base.TransformSeqTransducer(transform, downsample_by=1)[source]¶

Bases: xnmt.transducers.base.SeqTransducer, xnmt.persistence.Serializable

A sequence transducer that applies a given transformation to the sequence’s tensor representation

Parameters:	transform (`Transform`) – the Transform to apply to the sequence downsample_by (`Integral`) – if > 1, downsample the sequence via appropriate reshapes. The transform must accept a respectively larger hidden dimension.

get_final_states()[source]¶

Returns: A list of FinalTransducerState objects corresponding to a fixed-dimension representation of the input, after having invoked transduce()

Return type:	`List`[`FinalTransducerState`]

transduce(src)[source]¶

Parameters should be expression_seqs.ExpressionSequence objects wherever appropriate

Parameters:	seq – An expression sequence representing the input to the transduction
Return type:	`ExpressionSequence`
Returns:	result of transduction, an expression sequence

RNN¶

class xnmt.transducers.recurrent.UniLSTMState(network, prev=None, c=None, h=None)[source]¶

Bases: object

State object for UniLSTMSeqTransducer.

class xnmt.transducers.recurrent.UniLSTMSeqTransducer(layers=1, input_dim=Ref(path=exp_global.default_layer_dim), hidden_dim=Ref(path=exp_global.default_layer_dim), dropout=Ref(path=exp_global.dropout, default=0.0), weightnoise_std=Ref(path=exp_global.weight_noise, default=0.0), param_init=Ref(path=exp_global.param_init, default=GlorotInitializer@140549298518616), bias_init=Ref(path=exp_global.bias_init, default=ZeroInitializer@140549298518952), yaml_path=, decoder_input_dim=Ref(path=exp_global.default_layer_dim, default=None), decoder_input_feeding=True)[source]¶

Bases: xnmt.transducers.base.SeqTransducer, xnmt.persistence.Serializable

This implements a single LSTM layer based on the memory-friendly dedicated DyNet nodes. It works similar to DyNet’s CompactVanillaLSTMBuilder, but in addition supports taking multiple inputs that are concatenated on-the-fly.

Parameters:

layers (int) – number of layers
input_dim (int) – input dimension
hidden_dim (int) – hidden dimension
dropout (float) – dropout probability
weightnoise_std (float) – weight noise standard deviation
param_init (ParamInitializer) – how to initialize weight matrices
bias_init (ParamInitializer) – how to initialize bias vectors
yaml_path (str) –
decoder_input_dim (int) – input dimension of the decoder; if yaml_path contains ‘decoder’ and decoder_input_feeding is True, this will be added to input_dim
decoder_input_feeding (bool) – whether this transducer is part of an input-feeding decoder; cf. decoder_input_dim

get_final_states()[source]¶

Returns: A list of FinalTransducerState objects corresponding to a fixed-dimension representation of the input, after having invoked transduce()

Return type:	`List`[`FinalTransducerState`]

transduce(expr_seq)[source]¶

transduce the sequence, applying masks if given (masked timesteps simply copy previous h / c)

Parameters:	expr_seq (`ExpressionSequence`) – expression sequence or list of expression sequences (where each inner list will be concatenated)
Return type:	`ExpressionSequence`
Returns:	expression sequence

class xnmt.transducers.recurrent.BiLSTMSeqTransducer(layers=1, input_dim=Ref(path=exp_global.default_layer_dim), hidden_dim=Ref(path=exp_global.default_layer_dim), dropout=Ref(path=exp_global.dropout, default=0.0), weightnoise_std=Ref(path=exp_global.weight_noise, default=0.0), param_init=Ref(path=exp_global.param_init, default=GlorotInitializer@140549298519848), bias_init=Ref(path=exp_global.bias_init, default=ZeroInitializer@140549298577816), forward_layers=None, backward_layers=None)[source]¶

Bases: xnmt.transducers.base.SeqTransducer, xnmt.persistence.Serializable

This implements a bidirectional LSTM and requires about 8.5% less memory per timestep than DyNet’s CompactVanillaLSTMBuilder due to avoiding concat operations. It uses 2 xnmt.lstm.UniLSTMSeqTransducer objects in each layer.

Parameters:

layers (int) – number of layers
input_dim (int) – input dimension
hidden_dim (int) – hidden dimension
dropout (float) – dropout probability
weightnoise_std (float) – weight noise standard deviation
param_init (ParamInitializer) – a xnmt.param_init.ParamInitializer or list of xnmt.param_init.ParamInitializer objects specifying how to initialize weight matrices. If a list is given, each entry denotes one layer.
bias_init (ParamInitializer) – a xnmt.param_init.ParamInitializer or list of xnmt.param_init.ParamInitializer objects specifying how to initialize bias vectors. If a list is given, each entry denotes one layer.
forward_layers (Optional[Sequence[UniLSTMSeqTransducer]]) – set automatically
backward_layers (Optional[Sequence[UniLSTMSeqTransducer]]) – set automatically

get_final_states()[source]¶

Returns: A list of FinalTransducerState objects corresponding to a fixed-dimension representation of the input, after having invoked transduce()

Return type:	`List`[`FinalTransducerState`]

transduce(es)[source]¶

Parameters should be expression_seqs.ExpressionSequence objects wherever appropriate

Parameters:	seq – An expression sequence representing the input to the transduction
Return type:	`ExpressionSequence`
Returns:	result of transduction, an expression sequence

class xnmt.transducers.recurrent.CustomLSTMSeqTransducer(layers, input_dim, hidden_dim, param_init=Ref(path=exp_global.param_init, default=GlorotInitializer@140549298578264), bias_init=Ref(path=exp_global.bias_init, default=ZeroInitializer@140549298578600))[source]¶

Bases: xnmt.transducers.base.SeqTransducer, xnmt.persistence.Serializable

This implements an LSTM builder based on elementary DyNet operations. It is more memory-hungry than the compact LSTM, but can be extended more easily. It currently does not support dropout or multiple layers and is mostly meant as a starting point for LSTM extensions.

Parameters:

layers (int) – number of layers
input_dim (int) – input dimension; if None, use exp_global.default_layer_dim
hidden_dim (int) – hidden dimension; if None, use exp_global.default_layer_dim
param_init (ParamInitializer) – a xnmt.param_init.ParamInitializer or list of xnmt.param_init.ParamInitializer objects specifying how to initialize weight matrices. If a list is given, each entry denotes one layer. If None, use exp_global.param_init
bias_init (ParamInitializer) – a xnmt.param_init.ParamInitializer or list of xnmt.param_init.ParamInitializer objects specifying how to initialize bias vectors. If a list is given, each entry denotes one layer. If None, use exp_global.param_init

transduce(xs)[source]¶

Parameters should be expression_seqs.ExpressionSequence objects wherever appropriate

Parameters:	seq – An expression sequence representing the input to the transduction
Return type:	`ExpressionSequence`
Returns:	result of transduction, an expression sequence

class xnmt.transducers.pyramidal.PyramidalLSTMSeqTransducer(layers=1, input_dim=Ref(path=exp_global.default_layer_dim), hidden_dim=Ref(path=exp_global.default_layer_dim), downsampling_method='concat', reduce_factor=2, dropout=Ref(path=exp_global.dropout, default=0.0), builder_layers=None)[source]¶

Bases: xnmt.transducers.base.SeqTransducer, xnmt.persistence.Serializable

Builder for pyramidal RNNs that delegates to UniLSTMSeqTransducer objects and wires them together. See https://arxiv.org/abs/1508.01211

Every layer (except the first) reduces sequence length by the specified factor.

Parameters:

layers (Integral) – number of layers
input_dim (Integral) – input dimension
hidden_dim (Integral) – hidden dimension
downsampling_method (str) – how to perform downsampling (concat|skip)
reduce_factor (Union[Integral, Sequence[Integral]]) – integer, or list of ints (different skip for each layer)
dropout (float) – dropout probability; if None, use exp_global.dropout
builder_layers (Optional[Any]) – set automatically

get_final_states()[source]¶

Returns: A list of FinalTransducerState objects corresponding to a fixed-dimension representation of the input, after having invoked transduce()

Return type:	`List`[`FinalTransducerState`]

transduce(es)[source]¶

returns the list of output Expressions obtained by adding the given inputs to the current state, one by one, to both the forward and backward RNNs, and concatenating.

Parameters:	es (`ExpressionSequence`) – an ExpressionSequence
Return type:	`ExpressionSequence`

class xnmt.transducers.residual.ResidualSeqTransducer(child, input_dim, layer_norm=False, dropout=Ref(path=exp_global.dropout, default=0.0))[source]¶

Bases: xnmt.transducers.base.SeqTransducer, xnmt.persistence.Serializable

A sequence transducer that wraps a xnmt.transducers.base.SeqTransducer in an additive residual connection, and optionally performs some variety of normalization.

Parameters:	the child transducer to wrap (child) – layer_norm (`bool`) – whether to perform layer normalization dropout – whether to apply residual dropout

transduce(seq)[source]¶

Parameters should be expression_seqs.ExpressionSequence objects wherever appropriate

Parameters:	seq (`ExpressionSequence`) – An expression sequence representing the input to the transduction
Return type:	`ExpressionSequence`
Returns:	result of transduction, an expression sequence

get_final_states()[source]¶

Returns: A list of FinalTransducerState objects corresponding to a fixed-dimension representation of the input, after having invoked transduce()

Return type:	`List`[`FinalTransducerState`]

Attender¶

class xnmt.modelparts.attenders.Attender[source]¶

Bases: object

A template class for functions implementing attention.

init_sent(sent)[source]¶

Args: sent: the encoder states, aka keys and values. Usually but not necessarily an expression_seqs.ExpressionSequence

Return type:	`None`

calc_attention(state)[source]¶

Compute attention weights.

Parameters:	state (`Expression`) – the current decoder state, aka query, for which to compute the weights.
Return type:	`Expression`
Returns:	DyNet expression containing normalized attention scores

calc_context(state, attention=None)[source]¶

Compute weighted sum.

Parameters:	state (`Expression`) – the current decoder state, aka query, for which to compute the weighted sum. attention (`Optional`[`Expression`]) – the attention vector to use. if not given it is calculated from the state.
Return type:	`Expression`

class xnmt.modelparts.attenders.MlpAttender(input_dim=Ref(path=exp_global.default_layer_dim), state_dim=Ref(path=exp_global.default_layer_dim), hidden_dim=Ref(path=exp_global.default_layer_dim), param_init=Ref(path=exp_global.param_init, default=GlorotInitializer@140549180320064), bias_init=Ref(path=exp_global.bias_init, default=ZeroInitializer@140549180320512), truncate_dec_batches=Ref(path=exp_global.truncate_dec_batches, default=False))[source]¶

Bases: xnmt.modelparts.attenders.Attender, xnmt.persistence.Serializable

Implements the attention model of Bahdanau et. al (2014)

Parameters:

input_dim (Integral) – input dimension
state_dim (Integral) – dimension of state inputs
hidden_dim (Integral) – hidden MLP dimension
param_init (ParamInitializer) – how to initialize weight matrices
bias_init (ParamInitializer) – how to initialize bias vectors
truncate_dec_batches (bool) – whether the decoder drops batch elements as soon as these are masked at some time step.

init_sent(sent)[source]¶

Args: sent: the encoder states, aka keys and values. Usually but not necessarily an expression_seqs.ExpressionSequence

Return type:	`None`

calc_attention(state)[source]¶

Compute attention weights.

Parameters:	state (`Expression`) – the current decoder state, aka query, for which to compute the weights.
Return type:	`Expression`
Returns:	DyNet expression containing normalized attention scores

class xnmt.modelparts.attenders.DotAttender(scale=True, truncate_dec_batches=Ref(path=exp_global.truncate_dec_batches, default=False))[source]¶

Bases: xnmt.modelparts.attenders.Attender, xnmt.persistence.Serializable

Implements dot product attention of https://arxiv.org/abs/1508.04025 Also (optionally) perform scaling of https://arxiv.org/abs/1706.03762

Parameters:	scale (`bool`) – whether to perform scaling truncate_dec_batches (`bool`) – currently unsupported

init_sent(sent)[source]¶

Args: sent: the encoder states, aka keys and values. Usually but not necessarily an expression_seqs.ExpressionSequence

Return type:	`None`

calc_attention(state)[source]¶

Compute attention weights.

Parameters:	state (`Expression`) – the current decoder state, aka query, for which to compute the weights.
Return type:	`Expression`
Returns:	DyNet expression containing normalized attention scores

class xnmt.modelparts.attenders.BilinearAttender(input_dim=Ref(path=exp_global.default_layer_dim), state_dim=Ref(path=exp_global.default_layer_dim), param_init=Ref(path=exp_global.param_init, default=GlorotInitializer@140549180321128), truncate_dec_batches=Ref(path=exp_global.truncate_dec_batches, default=False))[source]¶

Bases: xnmt.modelparts.attenders.Attender, xnmt.persistence.Serializable

Implements a bilinear attention, equivalent to the ‘general’ linear attention of https://arxiv.org/abs/1508.04025

Parameters:	input_dim (`Integral`) – input dimension; if None, use exp_global.default_layer_dim state_dim (`Integral`) – dimension of state inputs; if None, use exp_global.default_layer_dim param_init (`ParamInitializer`) – how to initialize weight matrices; if None, use `exp_global.param_init` truncate_dec_batches (`bool`) – currently unsupported

init_sent(sent)[source]¶

Args: sent: the encoder states, aka keys and values. Usually but not necessarily an expression_seqs.ExpressionSequence

Return type:	`None`

calc_attention(state)[source]¶

Compute attention weights.

Parameters:	state (`Expression`) – the current decoder state, aka query, for which to compute the weights.
Return type:	`Expression`
Returns:	DyNet expression containing normalized attention scores

class xnmt.modelparts.attenders.LatticeBiasedMlpAttender(input_dim=Ref(path=exp_global.default_layer_dim), state_dim=Ref(path=exp_global.default_layer_dim), hidden_dim=Ref(path=exp_global.default_layer_dim), param_init=Ref(path=exp_global.param_init, default=GlorotInitializer@140549181218544), bias_init=Ref(path=exp_global.bias_init, default=ZeroInitializer@140549181218152), truncate_dec_batches=Ref(path=exp_global.truncate_dec_batches, default=False))[source]¶

Bases: xnmt.modelparts.attenders.MlpAttender, xnmt.persistence.Serializable

Modified MLP attention, where lattices are assumed as input and the attention is biased toward confident nodes.

Parameters:

input_dim (Integral) – input dimension
state_dim (Integral) – dimension of state inputs
hidden_dim (Integral) – hidden MLP dimension
param_init (ParamInitializer) – how to initialize weight matrices
bias_init (ParamInitializer) – how to initialize bias vectors
truncate_dec_batches (bool) – whether the decoder drops batch elements as soon as these are masked at some time step.

calc_attention(state)[source]¶

Compute attention weights.

Parameters:	state (`Expression`) – the current decoder state, aka query, for which to compute the weights.
Return type:	`Expression`
Returns:	DyNet expression containing normalized attention scores

Decoder¶

class xnmt.modelparts.decoders.Decoder[source]¶

Bases: object

A template class to convert a prefix of previously generated words and a context vector into a probability distribution over possible next words.

class xnmt.modelparts.decoders.DecoderState[source]¶

Bases: object

A state that holds whatever information is required for the decoder. Child classes must implement the as_vector() method, which will be used by e.g. the attention mechanism

class xnmt.modelparts.decoders.AutoRegressiveDecoderState(rnn_state=None, context=None)[source]¶

Bases: xnmt.modelparts.decoders.DecoderState

A state holding all the information needed for AutoRegressiveDecoder

Parameters:	rnn_state – a DyNet RNN state context – a DyNet expression

class xnmt.modelparts.decoders.AutoRegressiveDecoder(input_dim=Ref(path=exp_global.default_layer_dim), embedder=bare(SimpleWordEmbedder), input_feeding=True, bridge=bare(CopyBridge), rnn=bare(UniLSTMSeqTransducer), transform=bare(AuxNonLinear), scorer=bare(Softmax), truncate_dec_batches=Ref(path=exp_global.truncate_dec_batches, default=False))[source]¶

Bases: xnmt.modelparts.decoders.Decoder, xnmt.persistence.Serializable

Standard autoregressive-decoder.

Parameters:

input_dim (Integral) – input dimension
embedder (Embedder) – embedder for target words
input_feeding (bool) – whether to activate input feeding
bridge (Bridge) – how to initialize decoder state
rnn (UniLSTMSeqTransducer) – recurrent decoder
transform (Transform) – a layer of transformation between rnn and output scorer
scorer (Scorer) – the method of scoring the output (usually softmax)
truncate_dec_batches (bool) – whether the decoder drops batch elements as soon as these are masked at some time step.

shared_params()[source]¶

Return the shared parameters of this Serializable class.

This can be overwritten to specify what parameters of this component and its subcomponents are shared. Parameter sharing is performed before any components are initialized, and can therefore only include basic data types that are already present in the YAML file (e.g. # dimensions, etc.) Sharing is performed if at least one parameter is specified and multiple shared parameters don’t conflict. In case of conflict a warning is printed, and no sharing is performed. The ordering of shared parameters is irrelevant. Note also that if a submodule is replaced by a reference, its shared parameters are ignored.

Returns:	objects referencing params of this component or a subcompononent e.g.: return [set([".input_dim", ".sub_module.input_dim", ".submodules_list.0.input_dim"])]

initial_state(enc_final_states, ss)[source]¶

Get the initial state of the decoder given the encoder final states.

Parameters:	enc_final_states (`Any`) – The encoder final states. Usually but not necessarily an `xnmt.expression_sequence.ExpressionSequence` ss (`Any`) – first input
Return type:	`AutoRegressiveDecoderState`
Returns:	initial decoder state

add_input(dec_state, trg_word)[source]¶

Add an input and return a new update the state.

Parameters:	dec_state (`AutoRegressiveDecoderState`) – An object containing the current state. trg_word (`Any`) – The word to input.
Return type:	`AutoRegressiveDecoderState`
Returns:	The updated decoder state.

Bridge¶

class xnmt.modelparts.bridges.Bridge[source]¶

Bases: object

Responsible for initializing the decoder LSTM, based on the final encoder state

decoder_init(enc_final_states)[source]¶

Parameters:	enc_final_states (`Sequence`[`FinalTransducerState`]) – list of final states for each encoder layer
Return type:	`List`[`Expression`]
Returns:	list of initial hidden and cell expressions for each layer. List indices 0..n-1 hold hidden states, n..2n-1 hold cell states.

class xnmt.modelparts.bridges.NoBridge(dec_layers=1, dec_dim=Ref(path=exp_global.default_layer_dim))[source]¶

Bases: xnmt.modelparts.bridges.Bridge, xnmt.persistence.Serializable

This bridge initializes the decoder with zero vectors, disregarding the encoder final states.

Parameters:	dec_layers (`Integral`) – number of decoder layers to initialize dec_dim (`Integral`) – hidden dimension of decoder states

decoder_init(enc_final_states)[source]¶

Parameters:	enc_final_states (`Sequence`[`FinalTransducerState`]) – list of final states for each encoder layer
Return type:	`List`[`Expression`]
Returns:	list of initial hidden and cell expressions for each layer. List indices 0..n-1 hold hidden states, n..2n-1 hold cell states.

class xnmt.modelparts.bridges.CopyBridge(dec_layers=1, dec_dim=Ref(path=exp_global.default_layer_dim))[source]¶

Bases: xnmt.modelparts.bridges.Bridge, xnmt.persistence.Serializable

This bridge copies final states from the encoder to the decoder initial states. Requires that: - encoder / decoder dimensions match for every layer - num encoder layers >= num decoder layers (if unequal, we disregard final states at the encoder bottom)

Parameters:	dec_layers (`Integral`) – number of decoder layers to initialize dec_dim (`Integral`) – hidden dimension of decoder states

decoder_init(enc_final_states)[source]¶

Parameters:	enc_final_states (`Sequence`[`FinalTransducerState`]) – list of final states for each encoder layer
Return type:	`List`[`Expression`]
Returns:	list of initial hidden and cell expressions for each layer. List indices 0..n-1 hold hidden states, n..2n-1 hold cell states.

class xnmt.modelparts.bridges.LinearBridge(dec_layers=1, enc_dim=Ref(path=exp_global.default_layer_dim), dec_dim=Ref(path=exp_global.default_layer_dim), param_init=Ref(path=exp_global.param_init, default=GlorotInitializer@140549230233248), bias_init=Ref(path=exp_global.bias_init, default=ZeroInitializer@140549230233752), projector=None)[source]¶

Bases: xnmt.modelparts.bridges.Bridge, xnmt.persistence.Serializable

This bridge does a linear transform of final states from the encoder to the decoder initial states. Requires that num encoder layers >= num decoder layers (if unequal, we disregard final states at the encoder bottom)

Parameters:

dec_layers (Integral) – number of decoder layers to initialize
enc_dim (Integral) – hidden dimension of encoder states
dec_dim (Integral) – hidden dimension of decoder states
param_init (ParamInitializer) – how to initialize weight matrices; if None, use exp_global.param_init
bias_init (ParamInitializer) – how to initialize bias vectors; if None, use exp_global.bias_init
projector (Optional[Linear]) – linear projection (created automatically)

decoder_init(enc_final_states)[source]¶

Parameters:	enc_final_states (`Sequence`[`FinalTransducerState`]) – list of final states for each encoder layer
Return type:	`List`[`Expression`]
Returns:	list of initial hidden and cell expressions for each layer. List indices 0..n-1 hold hidden states, n..2n-1 hold cell states.

Transform¶

class xnmt.modelparts.transforms.Transform[source]¶

Bases: object

A class of transforms that change a dynet expression into another.

class xnmt.modelparts.transforms.Identity[source]¶

Bases: xnmt.modelparts.transforms.Transform, xnmt.persistence.Serializable

Identity transform. For use when you think it might be a better idea to not perform a specific transform in a place where you would normally do one.

class xnmt.modelparts.transforms.Linear(input_dim=Ref(path=exp_global.default_layer_dim), output_dim=Ref(path=exp_global.default_layer_dim), bias=True, param_init=Ref(path=exp_global.param_init, default=GlorotInitializer@140549298961656), bias_init=Ref(path=exp_global.bias_init, default=ZeroInitializer@140549298961208))[source]¶

Bases: xnmt.modelparts.transforms.Transform, xnmt.persistence.Serializable

Linear projection with optional bias.

Parameters:	input_dim (`Integral`) – input dimension output_dim (`Integral`) – hidden dimension bias (`bool`) – whether to add a bias param_init (`ParamInitializer`) – how to initialize weight matrices bias_init (`ParamInitializer`) – how to initialize bias vectors

class xnmt.modelparts.transforms.NonLinear(input_dim=Ref(path=exp_global.default_layer_dim), output_dim=Ref(path=exp_global.default_layer_dim), bias=True, activation='tanh', param_init=Ref(path=exp_global.param_init, default=GlorotInitializer@140549298962104), bias_init=Ref(path=exp_global.bias_init, default=ZeroInitializer@140549298962384))[source]¶

Bases: xnmt.modelparts.transforms.Transform, xnmt.persistence.Serializable

Linear projection with optional bias and non-linearity.

Parameters:	input_dim (`Integral`) – input dimension output_dim (`Integral`) – hidden dimension bias (`bool`) – whether to add a bias activation (`str`) – One of `tanh`, `relu`, `sigmoid`, `elu`, `selu`, `asinh` or `identity`. param_init (`ParamInitializer`) – how to initialize weight matrices bias_init (`ParamInitializer`) – how to initialize bias vectors

class xnmt.modelparts.transforms.AuxNonLinear(input_dim=Ref(path=exp_global.default_layer_dim), output_dim=Ref(path=exp_global.default_layer_dim), aux_input_dim=Ref(path=exp_global.default_layer_dim), bias=True, activation='tanh', param_init=Ref(path=exp_global.param_init, default=GlorotInitializer@140549298516712), bias_init=Ref(path=exp_global.bias_init, default=ZeroInitializer@140549298517160))[source]¶

Bases: xnmt.modelparts.transforms.NonLinear, xnmt.persistence.Serializable

NonLinear with an additional auxiliary input.

Parameters:

input_dim (Integral) – input dimension
output_dim (Integral) – hidden dimension
aux_input_dim (Integral) – auxiliary input dimension. The actual input dimension is aux_input_dim + input_dim. This is useful for when you want to do something like input feeding.
bias (bool) – whether to add a bias
activation (str) – One of tanh, relu, sigmoid, elu, selu, asinh or identity.
param_init (ParamInitializer) – how to initialize weight matrices
bias_init (ParamInitializer) – how to initialize bias vectors

class xnmt.modelparts.transforms.MLP(input_dim=Ref(path=exp_global.default_layer_dim), hidden_dim=Ref(path=exp_global.default_layer_dim), output_dim=Ref(path=exp_global.default_layer_dim), bias=True, activation='tanh', hidden_layers=1, param_init=Ref(path=exp_global.param_init, default=GlorotInitializer@140549298517720), bias_init=Ref(path=exp_global.bias_init, default=ZeroInitializer@140549298518168), layers=None)[source]¶

Bases: xnmt.modelparts.transforms.Transform, xnmt.persistence.Serializable

A multi-layer perceptron. Defined as one or more NonLinear transforms of equal hidden dimension and type, then a Linear transform to the output dimension.

class xnmt.modelparts.transforms.Cwise(op='rectify')[source]¶

Bases: xnmt.modelparts.transforms.Transform, xnmt.persistence.Serializable

A component-wise transformation that can be an arbitrary unary DyNet operation.

Parameters:	op (`str`) – arbitrary unary DyNet node

Scorer¶

class xnmt.modelparts.scorers.Scorer[source]¶

Bases: object

A template class of things that take in a vector and produce a score over discrete output items.

calc_scores(x)[source]¶

Calculate the score of each discrete decision, where the higher the score is the better the model thinks a decision is. These often correspond to unnormalized log probabilities.

Parameters:	x (`Expression`) – The vector used to make the prediction
Return type:	`Expression`

best_k(x, k, normalize_scores=False)[source]¶

Returns a list of the k items with the highest scores. The items may not be in sorted order.

Parameters:	x (`Expression`) – The vector used to make the prediction k (`Integral`) – Number of items to return normalize_scores (`bool`) – whether to normalize the scores

sample(x, n)[source]¶: Return samples from the scores that are treated as probability distributions.

calc_probs(x)[source]¶

Calculate the normalized probability of a decision.

Parameters:	x (`Expression`) – The vector used to make the prediction
Return type:	`Expression`

calc_log_probs(x)[source]¶

Calculate the log probability of a decision

log(calc_prob()) == calc_log_prob()

Both functions exist because it might help save memory.

Parameters:	x (`Expression`) – The vector used to make the prediction
Return type:	`Expression`

calc_loss(x, y)[source]¶

Calculate the loss incurred by making a particular decision.

Parameters:	x (`Expression`) – The vector used to make the prediction y (`Union`[`int`, `List`[`int`]]) – The correct label(s)
Return type:	`Expression`

class xnmt.modelparts.scorers.Softmax(input_dim=Ref(path=exp_global.default_layer_dim), vocab_size=None, vocab=None, trg_reader=Ref(path=model.trg_reader, default=None), label_smoothing=0.0, param_init=Ref(path=exp_global.param_init, default=GlorotInitializer@140549230289472), bias_init=Ref(path=exp_global.bias_init, default=ZeroInitializer@140549230289920), output_projector=None)[source]¶

Bases: xnmt.modelparts.scorers.Scorer, xnmt.persistence.Serializable

A class that does an affine transform from the input to the vocabulary size, and calculates a softmax.

Note that all functions in this class rely on calc_scores(), and thus this class can be sub-classed by any other class that has an alternative method for calculating un-normalized log probabilities by simply overloading the calc_scores() function.

Parameters:

input_dim (Integral) – Size of the input vector
vocab_size (Optional[Integral]) – Size of the vocab to predict
vocab (Optional[Vocab]) – A vocab object from which the vocab size can be derived automatically
trg_reader (Optional[InputReader]) – An input reader for the target, which can be used to derive the vocab size
label_smoothing (Real) – Whether to apply label smoothing (a value of 0.1 is good if so)
param_init (ParamInitializer) – How to initialize the parameters
bias_init (ParamInitializer) – How to initialize the bias
output_projector (Optional[Linear]) – The projection to be used before the output

calc_scores(x)[source]¶

Calculate the score of each discrete decision, where the higher the score is the better the model thinks a decision is. These often correspond to unnormalized log probabilities.

Parameters:	x (`Expression`) – The vector used to make the prediction
Return type:	`Expression`

best_k(x, k, normalize_scores=False)[source]¶

Returns a list of the k items with the highest scores. The items may not be in sorted order.

Parameters:	x (`Expression`) – The vector used to make the prediction k (`Integral`) – Number of items to return normalize_scores (`bool`) – whether to normalize the scores

sample(x, n, temperature=1.0)[source]¶: Return samples from the scores that are treated as probability distributions.

can_loss_be_derived_from_scores()[source]¶: This method can be used to determine whether dy.pickneglogsoftmax can be used to quickly calculate the loss value. If False, then the calc_loss method should (1) calc log_softmax, (2) perform necessary modification, (3) pick the loss

calc_loss(x, y)[source]¶

Calculate the loss incurred by making a particular decision.

Parameters:	x (`Expression`) – The vector used to make the prediction y (`Union`[`Integral`, `List`[`Integral`]]) – The correct label(s)
Return type:	`Expression`

calc_probs(x)[source]¶

Calculate the normalized probability of a decision.

Parameters:	x (`Expression`) – The vector used to make the prediction
Return type:	`Expression`

calc_log_probs(x)[source]¶

Calculate the log probability of a decision

log(calc_prob()) == calc_log_prob()

Both functions exist because it might help save memory.

Parameters:	x (`Expression`) – The vector used to make the prediction
Return type:	`Expression`

class xnmt.modelparts.scorers.LexiconSoftmax(input_dim=Ref(path=exp_global.default_layer_dim), vocab_size=None, vocab=None, trg_reader=Ref(path=model.trg_reader, default=None), attender=Ref(path=model.attender), label_smoothing=0.0, param_init=Ref(path=exp_global.param_init, default=GlorotInitializer@140549230290480), bias_init=Ref(path=exp_global.bias_init, default=ZeroInitializer@140549230290928), output_projector=None, lexicon_file=None, lexicon_alpha=0.001, lexicon_type='bias', coef_predictor=None, src_vocab=Ref(path=model.src_reader.vocab, default=None))[source]¶

Bases: xnmt.modelparts.scorers.Softmax, xnmt.persistence.Serializable

A subclass of the softmax class that can make use of an external lexicon probability as described in: http://anthology.aclweb.org/D/D16/D16-1162.pdf

Parameters:

input_dim (Integral) – Size of the input vector
vocab_size (Optional[Integral]) – Size of the vocab to predict
vocab (Optional[Vocab]) – A vocab object from which the vocab size can be derived automatically
trg_reader (Optional[InputReader]) – An input reader for the target, which can be used to derive the vocab size
label_smoothing (Real) – Whether to apply label smoothing (a value of 0.1 is good if so)
param_init (ParamInitializer) – How to initialize the parameters
bias_init (ParamInitializer) – How to initialize the bias
output_projector (Optional[Linear]) – The projection to be used before the output
lexicon_file – A file containing “trg src p(trg|src)”
lexicon_alpha – smoothing constant for bias method
lexicon_type – Either bias or linear method

calc_scores(x)[source]¶

Calculate the score of each discrete decision, where the higher the score is the better the model thinks a decision is. These often correspond to unnormalized log probabilities.

Parameters:	x (`Expression`) – The vector used to make the prediction
Return type:	`Expression`

calc_probs(x)[source]¶

Calculate the normalized probability of a decision.

Parameters:	x (`Expression`) – The vector used to make the prediction
Return type:	`Expression`

calc_log_probs(x)[source]¶

Calculate the log probability of a decision

log(calc_prob()) == calc_log_prob()

Both functions exist because it might help save memory.

Parameters:	x (`Expression`) – The vector used to make the prediction
Return type:	`Expression`

can_loss_be_derived_from_scores()[source]¶: This method can be used to determine whether dy.pickneglogsoftmax can be used to quickly calculate the loss value. If False, then the calc_loss method should (1) calc log_softmax, (2) perform necessary modification, (3) pick the loss

SequenceLabeler¶

class xnmt.models.sequence_labelers.SeqLabeler(src_reader, trg_reader, src_embedder=bare(SimpleWordEmbedder), encoder=bare(BiLSTMSeqTransducer), transform=bare(NonLinear), scorer=bare(Softmax), inference=bare(IndependentOutputInference), auto_cut_pad=False)[source]¶

Bases: xnmt.models.base.ConditionedModel, xnmt.models.base.GeneratorModel, xnmt.persistence.Serializable, xnmt.reports.Reportable

A simple sequence labeler based on an encoder and an output softmax layer.

Parameters:

src_reader (InputReader) – A reader for the source side.
trg_reader (InputReader) – A reader for the target side.
src_embedder (Embedder) – A word embedder for the input language
encoder (SeqTransducer) – An encoder to generate encoded inputs
transform (Transform) – A transform to be applied before making predictions
scorer (Scorer) – The class to actually make predictions
inference (Inference) – The inference method used for this model
auto_cut_pad (bool) – If True, cut or pad target sequences so the match the length of the encoded inputs. If False, an error is thrown if there is a length mismatch.

shared_params()[source]¶

Return the shared parameters of this Serializable class.

This can be overwritten to specify what parameters of this component and its subcomponents are shared. Parameter sharing is performed before any components are initialized, and can therefore only include basic data types that are already present in the YAML file (e.g. # dimensions, etc.) Sharing is performed if at least one parameter is specified and multiple shared parameters don’t conflict. In case of conflict a warning is printed, and no sharing is performed. The ordering of shared parameters is irrelevant. Note also that if a submodule is replaced by a reference, its shared parameters are ignored.

Return type:	`Sequence`[`Set`[`str`]]
Returns:	objects referencing params of this component or a subcompononent e.g.: return [set([".input_dim", ".sub_module.input_dim", ".submodules_list.0.input_dim"])]

calc_nll(src, trg)[source]¶

Calculate loss based on input-output pairs.

Losses are accumulated only across unmasked timesteps in each batch element.

Parameters:	src (`Union`[`Batch`, `Sentence`]) – The source, a sentence or a batch of sentences. trg (`Union`[`Batch`, `Sentence`]) – The target, a sentence or a batch of sentences.
Return type:	`Expression`
Returns:	A (possibly batched) expression representing the loss.

generate(src, normalize_scores=False)[source]¶

Generate outputs.

Parameters:	src (`Batch`) – batch of source-side inputs args – *kwargs – Further arguments to be specified by subclasses
Return type:	`Sequence`[`ReadableSentence`]
Returns:	output objects

set_trg_vocab(trg_vocab=None)[source]¶

Set target vocab for generating outputs. If not specified, word IDs are generated instead.

Parameters:	trg_vocab (`Optional`[`Vocab`]) – target vocab, or None to generate word IDs
Return type:	`None`

Classifier¶

class xnmt.models.classifiers.SequenceClassifier(src_reader, trg_reader, src_embedder=bare(SimpleWordEmbedder), encoder=bare(BiLSTMSeqTransducer), inference=bare(IndependentOutputInference), transform=bare(NonLinear), scorer=bare(Softmax))[source]¶

Bases: xnmt.models.base.ConditionedModel, xnmt.models.base.GeneratorModel, xnmt.persistence.Serializable

A sequence classifier.

Runs embeddings through an encoder, feeds the average over all encoder outputs to a transform and scoring layer.

Parameters:

src_reader (InputReader) – A reader for the source side.
trg_reader (InputReader) – A reader for the target side.
src_embedder (Embedder) – A word embedder for the input language
encoder (SeqTransducer) – An encoder to generate encoded inputs
inference – how to perform inference
transform (Transform) – A transform performed before the scoring function
scorer (Scorer) – A scoring function over the multiple choices

shared_params()[source]¶

Return the shared parameters of this Serializable class.

This can be overwritten to specify what parameters of this component and its subcomponents are shared. Parameter sharing is performed before any components are initialized, and can therefore only include basic data types that are already present in the YAML file (e.g. # dimensions, etc.) Sharing is performed if at least one parameter is specified and multiple shared parameters don’t conflict. In case of conflict a warning is printed, and no sharing is performed. The ordering of shared parameters is irrelevant. Note also that if a submodule is replaced by a reference, its shared parameters are ignored.

Returns:	objects referencing params of this component or a subcompononent e.g.: return [set([".input_dim", ".sub_module.input_dim", ".submodules_list.0.input_dim"])]

calc_nll(src, trg)[source]¶

Calculate loss based on input-output pairs.

Losses are accumulated only across unmasked timesteps in each batch element.

Parameters:	src (`Union`[`Batch`, `Sentence`]) – The source, a sentence or a batch of sentences. trg (`Union`[`Batch`, `Sentence`]) – The target, a sentence or a batch of sentences.
Return type:	`Expression`
Returns:	A (possibly batched) expression representing the loss.

generate(src, normalize_scores=False)[source]¶

Generate outputs.

Parameters:	src (`Union`[`Batch`, `Sentence`]) – batch of source-side inputs args – *kwargs – Further arguments to be specified by subclasses
Returns:	output objects

Loss¶

class xnmt.losses.FactoredLossExpr(init_loss=None)[source]¶

Bases: object

Loss consisting of (possibly batched) DyNet expressions, with one expression per loss factor.

Used to represent losses within a training step.

Parameters:	init_loss (`Optional`[`Dict`[`str`, `Expression`]]) – initial loss values

compute(comb_method='sum')[source]¶

Compute loss as DyNet expression by summing over factors and batch elements.

Parameters:	comb_method (`str`) – method for combining loss across batch elements (‘sum’ or ‘avg’).
Return type:	`Expression`
Returns:	Scalar DyNet expression.

value()[source]¶

Get list of per-batch-element loss values, summed over factors.

Return type:	`List`[`float`]
Returns:	List of same length as batch-size.

get_factored_loss_val(comb_method='sum')[source]¶

Create factored loss values by calling .value() for each DyNet loss expression and applying batch combination.

Parameters:	comb_method (`str`) – method for combining loss across batch elements (‘sum’ or ‘avg’).
Return type:	`FactoredLossVal`
Returns:	Factored loss values.

get_nobackprop_loss()[source]¶

Get dictionary of named non-backpropagating loss expressions

Return type:	`Dict`[`str`, `Expression`]
Returns:	Loss expressions

class xnmt.losses.FactoredLossVal(loss_dict=None)[source]¶

Bases: object

Loss consisting of (unbatched) float values, with one value per loss factor.

Used to represent losses accumulated across several training steps.

sum_factors()[source]¶

Return the sum of all loss factors.

Return type:	`float`
Returns:	A float value.

items()[source]¶

Get name/value tuples for loss factors.

Return type:	`List`[`Tuple`[`str`, `float`]]
Returns:	Name/value tuples.

clear()[source]¶

Clears all loss factors.

Return type:	`None`

LossCalculator¶

class xnmt.loss_calculators.LossCalculator[source]¶

Bases: object

A template class implementing the training strategy and corresponding loss calculation.

class xnmt.loss_calculators.MLELoss[source]¶

Bases: xnmt.persistence.Serializable, xnmt.loss_calculators.LossCalculator

Max likelihood loss calculator.

class xnmt.loss_calculators.GlobalFertilityLoss[source]¶

Bases: xnmt.persistence.Serializable, xnmt.loss_calculators.LossCalculator

A fertility loss according to Cohn+, 2016. Incorporating Structural Alignment Biases into an Attentional Neural Translation Model

https://arxiv.org/pdf/1601.01085.pdf

class xnmt.loss_calculators.CompositeLoss(pt_losses, loss_weight=None)[source]¶

Bases: xnmt.persistence.Serializable, xnmt.loss_calculators.LossCalculator

Summing losses from multiple LossCalculator.

class xnmt.loss_calculators.ReinforceLoss(baseline=None, evaluation_metric=bare(FastBLEUEvaluator), search_strategy=bare(SamplingSearch), inv_eval=True, decoder_hidden_dim=Ref(path=exp_global.default_layer_dim))[source]¶

Bases: xnmt.persistence.Serializable, xnmt.loss_calculators.LossCalculator

Reinforce Loss according to Ranzato+, 2015. SEQUENCE LEVEL TRAINING WITH RECURRENT NEURAL NETWORKS.

(This is not the MIXER algorithm)

https://arxiv.org/pdf/1511.06732.pdf

class xnmt.loss_calculators.MinRiskLoss(evaluation_metric=bare(FastBLEUEvaluator), alpha=0.005, inv_eval=True, unique_sample=True, search_strategy=bare(SamplingSearch))[source]¶: Bases: xnmt.persistence.Serializable, xnmt.loss_calculators.LossCalculator

class xnmt.loss_calculators.FeedbackLoss(child_loss=bare(MLELoss), repeat=1)[source]¶

Bases: xnmt.persistence.Serializable, xnmt.loss_calculators.LossCalculator

A loss that first calculates a standard loss function, then feeds it back to the model using the model.additional_loss function.

Parameters:	child_loss (`LossCalculator`) – The loss that will be fed back to the model repeat (`Integral`) – Repeat the process multiple times and use the sum of the losses. This is useful when there is some non-determinism (such as sampling in the encoder, etc.)

Training¶

TrainingRegimen¶

class xnmt.train.regimens.TrainingRegimen[source]¶

Bases: object

A training regimen is a class that implements a training loop.

run_training(save_fct)[source]¶

Run training steps in a loop until stopping criterion is reached.

Parameters:	save_fct (`Callable`) – function to be invoked to save a model at dev checkpoints
Return type:	`None`

backward(loss, dynet_profiling)[source]¶

Perform backward pass to accumulate gradients.

Parameters:	loss (`Expression`) – Result of self.training_step(…) dynet_profiling (`Integral`) – if > 0, print the computation graph
Return type:	`None`

update(trainer)[source]¶

Update DyNet weights using the given optimizer.

Parameters:	trainer (`XnmtOptimizer`) – DyNet trainer
Return type:	`None`

class xnmt.train.regimens.SimpleTrainingRegimen(model=Ref(path=model), src_file=None, trg_file=None, dev_every=0, dev_zero=False, batcher=bare(SrcBatcher{'batch_size': 32}), loss_calculator=bare(MLELoss), trainer=bare(SimpleSGDTrainer{'e0': 0.1}), run_for_epochs=None, lr_decay=1.0, lr_decay_times=3, patience=1, initial_patience=None, dev_tasks=None, dev_combinator=None, restart_trainer=False, reload_command=None, name='{EXP}', sample_train_sents=None, max_num_train_sents=None, max_src_len=None, max_trg_len=None, loss_comb_method=Ref(path=exp_global.loss_comb_method, default=sum), update_every=1, commandline_args=Ref(path=exp_global.commandline_args, default={}))[source]¶

Bases: xnmt.train.tasks.SimpleTrainingTask, xnmt.train.regimens.TrainingRegimen, xnmt.persistence.Serializable

Parameters:

model (ConditionedModel) – the model
src_file (Union[None, str, Sequence[str]]) – the source training file
trg_file (Optional[str]) – the target training file
dev_every (Integral) – dev checkpoints every n sentences (0 for only after epoch)
dev_zero (bool) – if True, add a checkpoint before training loop is entered (useful with pretrained models).
batcher (Batcher) – Type of batcher
loss_calculator (LossCalculator) – The method for calculating the loss.
trainer (XnmtOptimizer) – Trainer object, default is SGD with learning rate 0.1
run_for_epochs (Optional[Integral]) –
lr_decay (Real) –
lr_decay_times (Integral) – Early stopping after decaying learning rate a certain number of times
patience (Integral) – apply LR decay after dev scores haven’t improved over this many checkpoints
initial_patience (Optional[Integral]) – if given, allows adjusting patience for the first LR decay
dev_tasks (Optional[Sequence[EvalTask]]) – A list of tasks to use during the development stage.
dev_combinator (Optional[str]) – A formula to combine together development scores into a single score to choose whether to perform learning rate decay, etc. e.g. ‘x[0]-x[1]’ would say that the first dev task score minus the second dev task score is our measure of how well we’re doing. If not specified, only the score from the first dev task will be used.
restart_trainer (bool) – Restart trainer (useful for Adam) and revert weights to best dev checkpoint when applying LR decay (https://arxiv.org/pdf/1706.09733.pdf)
reload_command (Optional[str]) – Command to change the input data after each epoch. –epoch EPOCH_NUM will be appended to the command. To just reload the data after each epoch set the command to True.
name (str) – will be prepended to log outputs if given
sample_train_sents (Optional[Integral]) –
max_num_train_sents (Optional[Integral]) –
max_src_len (Optional[Integral]) –
max_trg_len (Optional[Integral]) –
loss_comb_method (str) – method for combining loss across batch elements (sum or avg).
update_every (Integral) – simulate large-batch training by accumulating gradients over several steps before updating parameters
commandline_args (dict) –

run_training(save_fct)[source]¶

Main training loop (overwrites TrainingRegimen.run_training())

Return type:	`None`

update(trainer)[source]¶

Update DyNet weights using the given optimizer.

Parameters:	trainer (`XnmtOptimizer`) – DyNet trainer
Return type:	`None`

class xnmt.train.regimens.AutobatchTrainingRegimen(model=Ref(path=model), src_file=None, trg_file=None, dev_every=0, dev_zero=False, batcher=bare(SrcBatcher{'batch_size': 32}), loss_calculator=bare(MLELoss), trainer=bare(SimpleSGDTrainer{'e0': 0.1}), run_for_epochs=None, lr_decay=1.0, lr_decay_times=3, patience=1, initial_patience=None, dev_tasks=None, dev_combinator=None, restart_trainer=False, reload_command=None, name='{EXP}', sample_train_sents=None, max_num_train_sents=None, max_src_len=None, max_trg_len=None, loss_comb_method=Ref(path=exp_global.loss_comb_method, default=sum), update_every=1, commandline_args=Ref(path=exp_global.commandline_args, default={}))[source]¶

Bases: xnmt.train.regimens.SimpleTrainingRegimen

This regimen overrides SimpleTrainingRegimen by accumulating (summing) losses into a FactoreLossExpr before running forward/backward in the computation graph. It is designed to work with DyNet autobatching and when parts of architecture make batching difficult (such as structured encoders like TreeLSTMS or Graph Networks). The actual batch size is set through the “update_every” parameter, while the underlying Batcher is expected to have “batch_size” equal to 1.

Parameters:

model (ConditionedModel) – the model
src_file (Union[None, str, Sequence[str]]) – the source training file
trg_file (Optional[str]) – the target training file
dev_every (Integral) – dev checkpoints every n sentences (0 for only after epoch)
dev_zero (bool) – if True, add a checkpoint before training loop is entered (useful with pretrained models).
batcher (Batcher) – Type of batcher
loss_calculator (LossCalculator) – The method for calculating the loss.
trainer (XnmtOptimizer) – Trainer object, default is SGD with learning rate 0.1
run_for_epochs (Optional[Integral]) –
lr_decay (Real) –
lr_decay_times (Integral) – Early stopping after decaying learning rate a certain number of times
patience (Integral) – apply LR decay after dev scores haven’t improved over this many checkpoints
initial_patience (Optional[Integral]) – if given, allows adjusting patience for the first LR decay
dev_tasks (Optional[Sequence[EvalTask]]) – A list of tasks to use during the development stage.
dev_combinator (Optional[str]) – A formula to combine together development scores into a single score to choose whether to perform learning rate decay, etc. e.g. ‘x[0]-x[1]’ would say that the first dev task score minus the second dev task score is our measure of how good we’re doing. If not specified, only the score from the first dev task will be used.
restart_trainer (bool) – Restart trainer (useful for Adam) and revert weights to best dev checkpoint when applying LR decay (https://arxiv.org/pdf/1706.09733.pdf)
reload_command (Optional[str]) – Command to change the input data after each epoch. –epoch EPOCH_NUM will be appended to the command. To just reload the data after each epoch set the command to True.
name (str) – will be prepended to log outputs if given
sample_train_sents (Optional[Integral]) –
max_num_train_sents (Optional[Integral]) –
max_src_len (Optional[Integral]) –
max_trg_len (Optional[Integral]) –
loss_comb_method (str) – method for combining loss across batch elements (sum or avg).
update_every (Integral) – how many instances to accumulate before updating parameters. This effectively sets the batch size under DyNet autobatching.
commandline_args (dict) –

run_training(save_fct)[source]¶

Main training loop (overwrites TrainingRegimen.run_training())

Return type:	`None`

class xnmt.train.regimens.MultiTaskTrainingRegimen(tasks, trainer=bare(SimpleSGDTrainer{'e0': 0.1}), dev_zero=False, update_every=1, commandline_args=Ref(path=exp_global.commandline_args, default=None))[source]¶

Bases: xnmt.train.regimens.TrainingRegimen

Base class for multi-task training classes. Mainly initializes tasks, performs sanity-checks, and manages set_train events.

Parameters:

tasks (Sequence[TrainingTask]) – list of training tasks. The first item takes on the role of the main task, meaning it will control early stopping, learning rate schedule, and model checkpoints.
trainer (XnmtOptimizer) – Trainer object, default is SGD with learning rate 0.1
dev_zero (bool) – if True, add a checkpoint before training loop is entered (useful with pretrained models).
update_every (Integral) – simulate large-batch training by accumulating gradients over several steps before updating parameters
commandline_args (dict) –

trigger_train_event(value)[source]¶

Trigger set_train event, but only if that would lead to a change of the value of set_train. :type value: bool :param value: True or False

Return type:	`None`

update(trainer)[source]¶

Update DyNet weights using the given optimizer.

Parameters:	trainer (`XnmtOptimizer`) – DyNet trainer
Return type:	`None`

class xnmt.train.regimens.SameBatchMultiTaskTrainingRegimen(tasks, trainer=bare(SimpleSGDTrainer{'e0': 0.1}), dev_zero=False, per_task_backward=True, loss_comb_method=Ref(path=exp_global.loss_comb_method, default=sum), update_every=1, n_task_steps=None, commandline_args=Ref(path=exp_global.commandline_args, default=None))[source]¶

Bases: xnmt.train.regimens.MultiTaskTrainingRegimen, xnmt.persistence.Serializable

Multi-task training where gradients are accumulated and weight updates are thus performed jointly for each task. The relative weight between tasks can be configured setting the number of steps to accumulate over for each task. Note that the batch size for each task also has an influence on task weighting. The stopping criterion of the first task is used (other tasks’ stopping criteria are ignored).

Parameters:

tasks (Sequence[TrainingTask]) – Training tasks
trainer (XnmtOptimizer) – The trainer is shared across tasks
dev_zero (bool) – If True, add a checkpoint before training loop is entered (useful with pretrained models).
per_task_backward (bool) – If True, call backward() for each task separately and renew computation graph between tasks. Yields the same results, but True uses less memory while False may be faster when using autobatching.
loss_comb_method (str) – Method for combining loss across batch elements (‘sum’ or ‘avg’).
update_every (Integral) – Simulate large-batch training by accumulating gradients over several steps before updating parameters. This is implemented as an outer loop, i.e. we first accumulate gradients from steps for each task, and then loop according to this parameter so that we collect multiple steps for each task and always according to the same ratio.
n_task_steps (Optional[Sequence[Integral]]) – The number steps to accumulate for each task, useful for weighting tasks.
commandline_args (dict) –

run_training(save_fct)[source]¶

Run training steps in a loop until stopping criterion is reached.

Parameters:	save_fct (`Callable`) – function to be invoked to save a model at dev checkpoints
Return type:	`None`

class xnmt.train.regimens.AlternatingBatchMultiTaskTrainingRegimen(tasks, task_weights=None, trainer=bare(SimpleSGDTrainer{'e0': 0.1}), dev_zero=False, loss_comb_method=Ref(path=exp_global.loss_comb_method, default=sum), update_every_within=1, update_every_across=1, commandline_args=Ref(path=exp_global.commandline_args, default=None))[source]¶

Bases: xnmt.train.regimens.MultiTaskTrainingRegimen, xnmt.persistence.Serializable

Multi-task training where training steps are performed one after another.

The relative weight between tasks are explicitly specified explicitly, and for each step one task is drawn at random accordingly. The stopping criterion of the first task is used (other tasks’ stopping criteria are ignored).

Parameters:

tasks (Sequence[TrainingTask]) – training tasks
trainer (XnmtOptimizer) – the trainer is shared across tasks
dev_zero (bool) – if True, add a checkpoint before training loop is entered (useful with pretrained models).
loss_comb_method (str) – method for combining loss across batch elements (‘sum’ or ‘avg’).
update_every_within (Integral) – Simulate large-batch training by accumulating gradients over several steps before updating parameters. The behavior here is to draw multiple times from the same task until update is invoked.
update_every_across (Integral) – Simulate large-batch training by accumulating gradients over several steps before updating parameters. The behavior here is to draw tasks randomly several times before doing parameter updates.
commandline_args –

run_training(save_fct)[source]¶

Run training steps in a loop until stopping criterion is reached.

Parameters:	save_fct (`Callable`) – function to be invoked to save a model at dev checkpoints
Return type:	`None`

class xnmt.train.regimens.SerialMultiTaskTrainingRegimen(tasks, trainer=bare(SimpleSGDTrainer{'e0': 0.1}), dev_zero=False, loss_comb_method=Ref(path=exp_global.loss_comb_method, default=sum), update_every=1, commandline_args=Ref(path=exp_global.commandline_args, default=None))[source]¶

Bases: xnmt.train.regimens.MultiTaskTrainingRegimen, xnmt.persistence.Serializable

Trains only first task until stopping criterion met, then the same for the second task, etc.

Useful to realize a pretraining-finetuning strategy.

Parameters:

tasks (Sequence[TrainingTask]) – training tasks. The currently active task is treated as main task.
trainer (XnmtOptimizer) – the trainer is shared across tasks
dev_zero (bool) – if True, add a checkpoint before training loop is entered (useful with pretrained models).
loss_comb_method (str) – method for combining loss across batch elements (‘sum’ or ‘avg’).
update_every (Integral) – simulate large-batch training by accumulating gradients over several steps before updating parameters
commandline_args (dict) –

run_training(save_fct)[source]¶

Run training steps in a loop until stopping criterion is reached.

Parameters:	save_fct (`Callable`) – function to be invoked to save a model at dev checkpoints
Return type:	`None`

TrainingTask¶

class xnmt.train.tasks.TrainingTask(model)[source]¶

Bases: object

Base class for a training task. Training tasks can perform training steps and keep track of the training state, but may not implement the actual training loop.

Parameters:	model (`TrainableModel`) – The model to train

should_stop_training()[source]¶

Returns:	True iff training is finished, i.e. training_step(…) should not be called again

training_step(**kwargs)[source]¶

Perform forward pass for the next training step and handle training logic (switching epoch, reshuffling, ..)

Parameters:	**kwargs – depends on subclass implementations
Return type:	`FactoredLossExpr`
Returns:	Loss

next_minibatch()[source]¶

Infinitely loop over training minibatches.

Return type:	`Iterator`[+T_co]
Returns:	Generator yielding (src_batch,trg_batch) tuples

checkpoint(control_learning_schedule=False)[source]¶

Perform a dev checkpoint.

Parameters:	control_learning_schedule (`bool`) – If `False`, only evaluate dev data. If `True`, also perform model saving, LR decay etc. if needed.
Return type:	`bool`
Returns:	`True` iff the model needs saving

cur_num_minibatches()[source]¶

Current number of minibatches (may change between epochs, e.g. for randomizing batchers or if reload_command is given)

Return type:	`int`

cur_num_sentences()[source]¶

Current number of parallel sentences (may change between epochs, e.g. if reload_command is given)

Return type:	`int`

class xnmt.train.tasks.SimpleTrainingTask(model, src_file=None, trg_file=None, dev_every=0, batcher=bare(SrcBatcher{'batch_size': 32}), loss_calculator=bare(MLELoss), run_for_epochs=None, lr_decay=1.0, lr_decay_times=3, patience=1, initial_patience=None, dev_tasks=None, dev_combinator=None, restart_trainer=False, reload_command=None, name=None, sample_train_sents=None, max_num_train_sents=None, max_src_len=None, max_trg_len=None)[source]¶

Bases: xnmt.train.tasks.TrainingTask, xnmt.persistence.Serializable

Parameters:

model (ConditionedModel) – a trainable supervised model
src_file (Union[str, Sequence[str], None]) – The file for the source data.
trg_file (Optional[str]) – The file for the target data.
dev_every (Integral) – dev checkpoints every n sentences (0 for only after epoch)
batcher (Batcher) – Type of batcher
loss_calculator (LossCalculator) –
run_for_epochs (Optional[Integral]) – number of epochs (None for unlimited epochs)
lr_decay (Real) – decay learning rate by multiplying by this factor
lr_decay_times (Integral) – Early stopping after decaying learning rate a certain number of times
patience (Integral) – apply LR decay after dev scores haven’t improved over this many checkpoints
initial_patience (Optional[Integral]) – if given, allows adjusting patience for the first LR decay
dev_tasks (Optional[Sequence[EvalTask]]) – A list of tasks to run on the development set
dev_combinator – A formula to combine together development scores into a single score to choose whether to perform learning rate decay, etc. e.g. ‘x[0]-x[1]’ would say that the first dev task score minus the second dev task score is our measure of how good we’re doing. If not specified, only the score from the first dev task will be used.
restart_trainer (bool) – Restart trainer (useful for Adam) and revert weights to best dev checkpoint when applying LR decay (https://arxiv.org/pdf/1706.09733.pdf)
reload_command (Optional[str]) – Command to change the input data after each epoch. –epoch EPOCH_NUM will be appended to the command. To just reload the data after each epoch set the command to ‘true’.
sample_train_sents (Optional[Integral]) – If given, load a random subset of training sentences before each epoch. Useful when training data does not fit in memory.
max_num_train_sents (Optional[Integral]) – Train only on the first n sentences
max_src_len (Optional[Integral]) – Discard training sentences with source-side longer than this
max_trg_len (Optional[Integral]) – Discard training sentences with target-side longer than this
name (Optional[str]) – will be prepended to log outputs if given

should_stop_training()[source]¶

Signal stopping if self.early_stopping_reached is marked or we exhausted the number of requested epochs.

Return type:	`bool`

cur_num_minibatches()[source]¶

Current number of minibatches (may change between epochs, e.g. for randomizing batchers or if reload_command is given)

Return type:	`Integral`

cur_num_sentences()[source]¶

Current number of parallel sentences (may change between epochs, e.g. if reload_command is given)

Return type:	`Integral`

next_minibatch()[source]¶

Infinitely loops over training minibatches and advances internal epoch state after every complete sweep over the corpus.

Return type:	`Iterator`[+T_co]
Returns:	Generator yielding (src_batch,trg_batch) tuples

training_step(src, trg)[source]¶

Perform forward pass for the next training step and handle training logic (switching epoch, reshuffling, ..)

Parameters:	src (`Batch`) – src minibatch trg (`Batch`) – trg minibatch
Returns:	Loss

checkpoint(control_learning_schedule=True)[source]¶

Performs a dev checkpoint

Parameters:	control_learning_schedule (`bool`) – If False, only evaluate dev data. If True, also perform model saving, LR decay etc. if needed.
Returns:	True if the model needs saving, False otherwise

class xnmt.train.tasks.TrainingState[source]¶

Bases: object

This holds the state of the training loop.

Parameters¶

ParamManager¶

class xnmt.param_collections.ParamManager[source]¶

Bases: object

A static class that manages the currently loaded DyNet parameters of all components.

Responsibilities are registering of all components that use DyNet parameters and loading pretrained parameters. Components can register parameters by calling ParamManager.my_params(self) from within their __init__() method. This allocates a subcollection with a unique identifier for this component. When loading previously saved parameters, one or several paths are specified to look for the corresponding saved DyNet collection named after this identifier.

static init_param_col()[source]¶

Initializes or resets the parameter collection.

This must be invoked before every time a new model is loaded (e.g. on startup and between consecutive experiments).

Return type:	`None`

static add_load_path(data_file)[source]¶

Add new data directory path to load from.

When calling populate(), pretrained parameters from all directories added in this way are searched for the requested component identifiers.

Parameters:	data_file (`str`) – a data directory (usually named `*.data`) containing DyNet parameter collections.
Return type:	`None`

static populate()[source]¶

Populate the parameter collections.

Searches the given data paths and loads parameter collections if they exist, otherwise leave parameters in their randomly initialized state.

Return type:	`None`

static my_params(subcol_owner)[source]¶

Creates a dedicated parameter subcollection for a serializable object.

This should only be called from the __init__ method of a Serializable.

Parameters:	subcol_owner (Serializable) – The object which is requesting to be assigned a subcollection.
Return type:	`ParameterCollection`
Returns:	The assigned subcollection.

static global_collection()[source]¶

Access the top-level parameter collection, including all parameters.

Return type:	`ParameterCollection`
Returns:	top-level DyNet parameter collection

exception xnmt.param_collections.RevertingUnsavedModelException[source]¶: Bases: Exception

Optimizer¶

class xnmt.optimizers.XnmtOptimizer(optimizer, skip_noisy=False)[source]¶

Bases: object

A base classe for trainers. Trainers are mostly simple wrappers of DyNet trainers but can add extra functionality.

Parameters:	optimizer (`Trainer`) – the underlying DyNet optimizer (trainer) skip_noisy (`bool`) – keep track of a moving average and a moving standard deviation of the log of the gradient norm values, and abort a step if the norm of the gradient exceeds four standard deviations of the moving average. Reference: https://arxiv.org/pdf/1804.09849.pdf

update()[source]¶

Update the parameters.

Return type:	`None`

status()[source]¶

Outputs information about the trainer in the stderr.

(number of updates since last call, number of clipped gradients, learning rate, etc…)

Return type:	`None`

set_clip_threshold(thr)[source]¶

Set clipping thershold

To deactivate clipping, set the threshold to be <=0

Parameters:	thr (`Real`) – Clipping threshold
Return type:	`None`

get_clip_threshold()[source]¶

Get clipping threshold

Return type:	`Real`
Returns:	Gradient clipping threshold

restart()[source]¶

Restarts the optimizer

Clears all momentum values and assimilate (if applicable)

Return type:	`None`

class xnmt.optimizers.SimpleSGDTrainer(e0=0.1, skip_noisy=False)[source]¶

Bases: xnmt.optimizers.XnmtOptimizer, xnmt.persistence.Serializable

Stochastic gradient descent trainer

This trainer performs stochastic gradient descent, the goto optimization procedure for neural networks.

Parameters:	e0 (`Real`) – Initial learning rate skip_noisy (`bool`) – keep track of a moving average and a moving standard deviation of the log of the gradient norm values, and abort a step if the norm of the gradient exceeds four standard deviations of the moving average. Reference: https://arxiv.org/pdf/1804.09849.pdf

class xnmt.optimizers.MomentumSGDTrainer(e0=0.01, mom=0.9, skip_noisy=False)[source]¶

Bases: xnmt.optimizers.XnmtOptimizer, xnmt.persistence.Serializable

Stochastic gradient descent with momentum

This is a modified version of the SGD algorithm with momentum to stablize the gradient trajectory.

Parameters:	e0 (`Real`) – Initial learning rate mom (`Real`) – Momentum skip_noisy (`bool`) – keep track of a moving average and a moving standard deviation of the log of the gradient norm values, and abort a step if the norm of the gradient exceeds four standard deviations of the moving average. Reference: https://arxiv.org/pdf/1804.09849.pdf

class xnmt.optimizers.AdagradTrainer(e0=0.1, eps=1e-20, skip_noisy=False)[source]¶

Bases: xnmt.optimizers.XnmtOptimizer, xnmt.persistence.Serializable

Adagrad optimizer

The adagrad algorithm assigns a different learning rate to each parameter.

Parameters:	e0 (`Real`) – Initial learning rate eps (`Real`) – Epsilon parameter to prevent numerical instability skip_noisy (`bool`) – keep track of a moving average and a moving standard deviation of the log of the gradient norm values, and abort a step if the norm of the gradient exceeds four standard deviations of the moving average. Reference: https://arxiv.org/pdf/1804.09849.pdf

class xnmt.optimizers.AdadeltaTrainer(eps=1e-06, rho=0.95, skip_noisy=False)[source]¶

Bases: xnmt.optimizers.XnmtOptimizer, xnmt.persistence.Serializable

AdaDelta optimizer

The AdaDelta optimizer is a variant of Adagrad aiming to prevent vanishing learning rates.

Parameters:

eps (Real) – Epsilon parameter to prevent numerical instability
rho (Real) – Update parameter for the moving average of updates in the numerator
skip_noisy (bool) – keep track of a moving average and a moving standard deviation of the log of the gradient norm values, and abort a step if the norm of the gradient exceeds four standard deviations of the moving average. Reference: https://arxiv.org/pdf/1804.09849.pdf

class xnmt.optimizers.AdamTrainer(alpha=0.001, beta_1=0.9, beta_2=0.999, eps=1e-08, skip_noisy=False)[source]¶

Bases: xnmt.optimizers.XnmtOptimizer, xnmt.persistence.Serializable

Adam optimizer

The Adam optimizer is similar to RMSProp but uses unbiased estimates of the first and second moments of the gradient

Parameters:

alpha (Real) – Initial learning rate
beta_1 (Real) – Moving average parameter for the mean
beta_2 (Real) – Moving average parameter for the variance
eps (Real) – Epsilon parameter to prevent numerical instability
skip_noisy (bool) – keep track of a moving average and a moving standard deviation of the log of the gradient norm values, and abort a step if the norm of the gradient exceeds four standard deviations of the moving average. Reference: https://arxiv.org/pdf/1804.09849.pdf

class xnmt.optimizers.NoamTrainer(alpha=1.0, dim=512, warmup_steps=4000, beta_1=0.9, beta_2=0.98, eps=1e-09, skip_noisy=False)[source]¶

Bases: xnmt.optimizers.XnmtOptimizer, xnmt.persistence.Serializable

Proposed in the paper “Attention is all you need” (https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf) [Page 7, Eq. 3] In this the learning rate of Adam Optimizer is increased for the first warmup steps followed by a gradual decay

Parameters:

alpha (Real) –
dim (Integral) –
warmup_steps (Optional[Integral]) –
beta_1 (Real) –
beta_2 (Real) –
eps (Real) –
skip_noisy (bool) – keep track of a moving average and a moving standard deviation of the log of the gradient norm values, and abort a step if the norm of the gradient exceeds four standard deviations of the moving average. Reference: https://arxiv.org/pdf/1804.09849.pdf

update()[source]¶

Update the parameters.

Return type:	`None`

class xnmt.optimizers.DummyTrainer[source]¶

Bases: xnmt.optimizers.XnmtOptimizer, xnmt.persistence.Serializable

A dummy trainer that does not perform any parameter updates.

update()[source]¶

Update the parameters.

Return type:	`None`

status()[source]¶

Outputs information about the trainer in the stderr.

(number of updates since last call, number of clipped gradients, learning rate, etc…)

Return type:	`None`

set_clip_threshold(thr)[source]¶

Set clipping thershold

To deactivate clipping, set the threshold to be <=0

Parameters:	thr – Clipping threshold
Return type:	`None`

get_clip_threshold()[source]¶

Get clipping threshold

Return type:	`None`
Returns:	Gradient clipping threshold

restart()[source]¶

Restarts the optimizer

Clears all momentum values and assimilate (if applicable)

Return type:	`None`

ParamInitializer¶

class xnmt.param_initializers.ParamInitializer[source]¶

Bases: object

A parameter initializer that delegates to the DyNet initializers and possibly performs some extra configuration.

initializer(dim, is_lookup=False, num_shared=1)[source]¶

Parameters:	dim – dimension of parameter tensor is_lookup – True if parameters are a lookup matrix num_shared – Indicates if one parameter object holds multiple matrices
Returns:	a dynet initializer object

class xnmt.param_initializers.NormalInitializer(mean=0, var=1)[source]¶

Bases: xnmt.param_initializers.ParamInitializer, xnmt.persistence.Serializable

Wraps DyNet’s NormalInitializer: http://dynet.readthedocs.io/en/latest/python_ref.html#dynet.NormalInitializer

Initialize the parameters with a gaussian distribution.

Parameters:	mean (`Real`) – Mean of the distribution var (`Real`) – Variance of the distribution

initializer(dim, is_lookup=False, num_shared=1)[source]¶

Parameters:	dim (`Tuple`[`Integral`]) – dimension of parameter tensor is_lookup (`bool`) – True if parameters are a lookup matrix num_shared (`Integral`) – Indicates if one parameter object holds multiple matrices
Return type:	`NormalInitializer`
Returns:	a dynet initializer object

class xnmt.param_initializers.UniformInitializer(scale)[source]¶

Bases: xnmt.param_initializers.ParamInitializer, xnmt.persistence.Serializable

Wraps DyNet’s UniformInitializer: http://dynet.readthedocs.io/en/latest/python_ref.html#dynet.UniformInitializer

Initialize the parameters with a uniform distribution. :type scale: Real :param scale: Parameters are sampled from $\mathcal U([-\texttt{scale},\texttt{scale}])$

initializer(dim, is_lookup=False, num_shared=1)[source]¶

Parameters:	dim (`Tuple`[`Integral`]) – dimension of parameter tensor is_lookup (`bool`) – True if parameters are a lookup matrix num_shared (`Integral`) – Indicates if one parameter object holds multiple matrices
Return type:	`UniformInitializer`
Returns:	a dynet initializer object

class xnmt.param_initializers.ConstInitializer(c)[source]¶

Bases: xnmt.param_initializers.ParamInitializer, xnmt.persistence.Serializable

Wraps DyNet’s ConstInitializer: http://dynet.readthedocs.io/en/latest/python_ref.html#dynet.ConstInitializer

Initialize the parameters with a constant value.

Parameters:	c (`Real`) – Value to initialize the parameters

initializer(dim, is_lookup=False, num_shared=1)[source]¶

Parameters:	dim (`Tuple`[`Integral`]) – dimension of parameter tensor is_lookup (`bool`) – True if parameters are a lookup matrix num_shared (`Integral`) – Indicates if one parameter object holds multiple matrices
Return type:	`ConstInitializer`
Returns:	a dynet initializer object

class xnmt.param_initializers.GlorotInitializer(gain=1.0)[source]¶

Bases: xnmt.param_initializers.ParamInitializer, xnmt.persistence.Serializable

Wraps DyNet’s GlorotInitializer: http://dynet.readthedocs.io/en/latest/python_ref.html#dynet.GlorotInitializer

Initializes the weights according to Glorot & Bengio (2011)

If the dimensions of the parameter matrix are $m,n$ , the weights are sampled from $\mathcal U([-g\sqrt{\frac{6}{m+n}},g\sqrt{\frac{6}{m+n}}])$

The gain $g$ depends on the activation function :

$\text{tanh}$ : 1.0

$\text{ReLU}$ : 0.5

$\text{sigmoid}$ : 4.0

Any smooth function $f$ : $\frac{1}{f'(0)}$

In addition to the DyNet class, this also supports the case where one parameter object stores several matrices (as is popular for computing LSTM gates, for instance).

Note: This is also known as Xavier initialization

Parameters:	gain (`Real`) – Gain (Depends on the activation function)

initializer(dim, is_lookup=False, num_shared=1)[source]¶

Parameters:	dim (`Tuple`[`Integral`]) – dimensions of parameter tensor is_lookup (`bool`) – Whether the parameter is a lookup parameter num_shared (`Integral`) – If > 1, treat the first dimension as spanning multiple matrices, each of which is initialized individually
Return type:	`UniformInitializer`
Returns:	a dynet initializer object

class xnmt.param_initializers.FromFileInitializer(fname)[source]¶

Bases: xnmt.param_initializers.ParamInitializer, xnmt.persistence.Serializable

Wraps DyNet’s FromFileInitializer: http://dynet.readthedocs.io/en/latest/python_ref.html#dynet.FromFileInitializer

Initialize parameter from file.

Parameters:	fname (`str`) – File name

initializer(dim, is_lookup=False, num_shared=1)[source]¶

Parameters:	dim (`Tuple`[`Integral`]) – dimension of parameter tensor is_lookup (`bool`) – True if parameters are a lookup matrix num_shared (`Integral`) – Indicates if one parameter object holds multiple matrices
Return type:	`FromFileInitializer`
Returns:	a dynet initializer object

class xnmt.param_initializers.NumpyInitializer(array)[source]¶

Bases: xnmt.param_initializers.ParamInitializer, xnmt.persistence.Serializable

Wraps DyNet’s NumpyInitializer: http://dynet.readthedocs.io/en/latest/python_ref.html#dynet.NumpyInitializer

Initialize from numpy array

Alternatively, use ParameterCollection.parameters_from_numpy()

Parameters:	array (`ndarray`) – Numpy array

initializer(dim, is_lookup=False, num_shared=1)[source]¶

Parameters:	dim (`Tuple`[`Integral`]) – dimension of parameter tensor is_lookup (`bool`) – True if parameters are a lookup matrix num_shared (`Integral`) – Indicates if one parameter object holds multiple matrices
Return type:	`NumpyInitializer`
Returns:	a dynet initializer object

class xnmt.param_initializers.ZeroInitializer[source]¶

Bases: xnmt.param_initializers.ParamInitializer, xnmt.persistence.Serializable

Initializes parameter matrix to zero (most appropriate for bias parameters).

initializer(dim, is_lookup=False, num_shared=1)[source]¶

Parameters:	dim (`Tuple`[`Integral`]) – dimension of parameter tensor is_lookup (`bool`) – True if parameters are a lookup matrix num_shared (`Integral`) – Indicates if one parameter object holds multiple matrices
Return type:	`ConstInitializer`
Returns:	a dynet initializer object

class xnmt.param_initializers.LeCunUniformInitializer(scale=1.0)[source]¶

Bases: xnmt.param_initializers.ParamInitializer, xnmt.persistence.Serializable

Reference: LeCun 98, Efficient Backprop http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf

Parameters:	scale (`Real`) – scale

initializer(dim, is_lookup=False, num_shared=1)[source]¶

Parameters:	dim (`Tuple`[`Integral`]) – dimension of parameter tensor is_lookup (`bool`) – True if parameters are a lookup matrix num_shared (`Integral`) – Indicates if one parameter object holds multiple matrices
Return type:	`UniformInitializer`
Returns:	a dynet initializer object

Inference¶

AutoRegressiveInference¶

class xnmt.inferences.Inference(src_file=None, trg_file=None, ref_file=None, max_src_len=None, max_num_sents=None, mode='onebest', batcher=bare(InOrderBatcher{'batch_size': 1}), reporter=None)[source]¶

Bases: object

A template class for classes that perform inference.

Parameters:

src_file (Optional[str]) – path of input src file to be translated
trg_file (Optional[str]) – path of file where trg translatons will be written
ref_file (Optional[str]) – path of file with reference translations, e.g. for forced decoding
max_src_len (Optional[int]) – Remove sentences from data to decode that are longer than this on the source side
max_num_sents (Optional[int]) – Stop decoding after the first n sentences.
mode (str) –
type of decoding to perform.
- onebest: generate one best.
- score: output scores, useful for rescoring
- forced: perform forced decoding.
- forceddebug: perform forced decoding, calculate training loss, and make sure the scores are identical for debugging purposes.
batcher (InOrderBatcher) – inference batcher, needed e.g. in connection with pad_src_token_to_multiple
reporter (Union[None, Reporter, Sequence[Reporter]]) – a reporter to create reports for each decoded sentence

perform_inference(generator, src_file=None, trg_file=None, ref_file=None)[source]¶

Perform inference.

Parameters:	generator (`GeneratorModel`) – the model to be used src_file (`Optional`[`str`]) – path of input src file to be translated trg_file (`Optional`[`str`]) – path of file where trg translatons will be written
Return type:	`None`

class xnmt.inferences.IndependentOutputInference(src_file=None, trg_file=None, ref_file=None, max_src_len=None, max_num_sents=None, post_process=None, mode='onebest', batcher=bare(InOrderBatcher{'batch_size': 1}), reporter=None)[source]¶

Bases: xnmt.inferences.Inference, xnmt.persistence.Serializable

Inference when outputs are produced independently, including for classifiers that produce only a single output.

Assumes that generator.generate() takes arguments src, idx

Parameters:

src_file (Optional[str]) – path of input src file to be translated
trg_file (Optional[str]) – path of file where trg translatons will be written
ref_file (Optional[str]) – path of file with reference translations, e.g. for forced decoding
max_src_len (Optional[int]) – Remove sentences from data to decode that are longer than this on the source side
max_num_sents (Optional[int]) – Stop decoding after the first n sentences.
post_process (Union[None, str, OutputProcessor, Sequence[OutputProcessor]]) – post-processing of translation outputs (available string shortcuts: none, join-char, join-bpe, join-piece)
mode (str) –
type of decoding to perform.
- onebest: generate one best.
- score: output scores, useful for rescoring
batcher (InOrderBatcher) – inference batcher, needed e.g. in connection with pad_src_token_to_multiple
reporter (Union[None, Reporter, Sequence[Reporter]]) – a reporter to create reports for each decoded sentence

class xnmt.inferences.AutoRegressiveInference(src_file=None, trg_file=None, ref_file=None, max_src_len=None, max_num_sents=None, post_process=[], search_strategy=bare(BeamSearch), mode='onebest', batcher=bare(InOrderBatcher{'batch_size': 1}), reporter=None)[source]¶

Bases: xnmt.inferences.Inference, xnmt.persistence.Serializable

Performs inference for auto-regressive models that expand based on their own previous outputs.

Assumes that generator.generate() takes arguments src, idx, search_strategy, forced_trg_ids

Parameters:

src_file (Optional[str]) – path of input src file to be translated
trg_file (Optional[str]) – path of file where trg translatons will be written
ref_file (Optional[str]) – path of file with reference translations, e.g. for forced decoding
max_src_len (Optional[int]) – Remove sentences from data to decode that are longer than this on the source side
max_num_sents (Optional[int]) – Stop decoding after the first n sentences.
post_process (Union[str, OutputProcessor, Sequence[OutputProcessor]]) – post-processing of translation outputs (available string shortcuts: none,``join-char``,``join-bpe``,``join-piece``)
search_strategy (SearchStrategy) – a search strategy used during decoding.
mode (str) –
type of decoding to perform.
- onebest: generate one best.
- score: output scores, useful for rescoring
batcher (InOrderBatcher) – inference batcher, needed e.g. in connection with pad_src_token_to_multiple
reporter (Union[None, Reporter, Sequence[Reporter]]) – a reporter to create reports for each decoded sentence

class xnmt.inferences.CascadeInference(steps)[source]¶

Bases: xnmt.inferences.Inference, xnmt.persistence.Serializable

Inference class that performs inference as a series of independent inference steps.

Steps are performed using a list of inference sub-objects and a list of models. Intermediate outputs are written out to disk and then read by the next time step.

The generator passed to perform_inference must be a xnmt.models.CascadeGenerator.

Parameters:	steps (`Sequence`[`Inference`]) – list of inference objects

perform_inference(generator, src_file=None, trg_file=None, ref_file=None)[source]¶

Perform inference.

Parameters:	generator (`CascadeGenerator`) – the model to be used src_file (`Optional`[`str`]) – path of input src file to be translated trg_file (`Optional`[`str`]) – path of file where trg translatons will be written
Return type:	`None`

SearchStrategy¶

class xnmt.search_strategies.SearchOutput(word_ids, attentions, score, state, mask)¶

Bases: tuple

Output of the search words_ids: list of generated word ids attentions: list of corresponding attention vector of word_ids score: a single value of log(p(E|F)) logsoftmaxes: a corresponding softmax vector of the score. score = logsoftmax[word_id] state: a NON-BACKPROPAGATEABLE state that is used to produce the logsoftmax layer

state is usually used to generate ‘baseline’ in reinforce loss

masks: whether the particular word id should be ignored or not (1 for not, 0 for yes)

attentions¶: Alias for field number 1

mask¶: Alias for field number 4

score¶: Alias for field number 2

state¶: Alias for field number 3

word_ids¶: Alias for field number 0

class xnmt.search_strategies.SearchStrategy[source]¶

Bases: object

A template class to generate translation from the output probability model. (Non-batched operation)

generate_output(translator, initial_state, src_length=None)[source]¶

Parameters:	translator (xnmt.models.translators.AutoRegressiveTranslator) – a translator initial_state (`AutoRegressiveDecoderState`) – initial decoder state src_length (`Optional`[`Integral`]) – length of src sequence, required for some types of length normalization
Return type:	`List`[`SearchOutput`]
Returns:	List of (word_ids, attentions, score, logsoftmaxes)

class xnmt.search_strategies.GreedySearch(max_len=100)[source]¶

Bases: xnmt.persistence.Serializable, xnmt.search_strategies.SearchStrategy

Performs greedy search (aka beam search with beam size 1)

Parameters:	max_len (`Integral`) – maximum number of tokens to generate.

generate_output(translator, initial_state, src_length=None)[source]¶

Parameters:	translator (xnmt.models.translators.AutoRegressiveTranslator) – a translator initial_state (`AutoRegressiveDecoderState`) – initial decoder state src_length (`Optional`[`Integral`]) – length of src sequence, required for some types of length normalization
Return type:	`List`[`SearchOutput`]
Returns:	List of (word_ids, attentions, score, logsoftmaxes)

class xnmt.search_strategies.BeamSearch(beam_size=1, max_len=100, len_norm=bare(NoNormalization), one_best=True, scores_proc=None)[source]¶

Bases: xnmt.persistence.Serializable, xnmt.search_strategies.SearchStrategy

Performs beam search.

Parameters:

beam_size (Integral) – number of beams
max_len (Integral) – maximum number of tokens to generate.
len_norm (LengthNormalization) – type of length normalization to apply
one_best (bool) – Whether to output the best hyp only or all completed hyps.
scores_proc (Optional[Callable[[ndarray], None]]) – apply an optional operation on all scores prior to choosing the top k. E.g. use with xnmt.length_normalization.EosBooster.

class Hypothesis(score, output, parent, word)¶

Bases: tuple

output¶: Alias for field number 1

parent¶: Alias for field number 2

score¶: Alias for field number 0

word¶: Alias for field number 3

generate_output(translator, initial_state, src_length=None)[source]¶

Parameters:	translator (xnmt.models.translators.AutoRegressiveTranslator) – a translator initial_state (`AutoRegressiveDecoderState`) – initial decoder state src_length (`Optional`[`Integral`]) – length of src sequence, required for some types of length normalization
Return type:	`List`[`SearchOutput`]
Returns:	List of (word_ids, attentions, score, logsoftmaxes)

class xnmt.search_strategies.SamplingSearch(max_len=100, sample_size=5)[source]¶

Bases: xnmt.persistence.Serializable, xnmt.search_strategies.SearchStrategy

Performs search based on the softmax probability distribution. Similar to greedy searchol

Parameters:	max_len (`Integral`) – sample_size (`Integral`) –

generate_output(translator, initial_state, src_length=None)[source]¶

Parameters:	translator (xnmt.models.translators.AutoRegressiveTranslator) – a translator initial_state (`AutoRegressiveDecoderState`) – initial decoder state src_length (`Optional`[`Integral`]) – length of src sequence, required for some types of length normalization
Return type:	`List`[`SearchOutput`]
Returns:	List of (word_ids, attentions, score, logsoftmaxes)

class xnmt.search_strategies.MctsSearch(visits=200, max_len=100)[source]¶

Bases: xnmt.persistence.Serializable, xnmt.search_strategies.SearchStrategy

Performs search with Monte Carlo Tree Search

generate_output(translator, dec_state, src_length=None)[source]¶

Parameters:	translator (xnmt.models.translators.AutoRegressiveTranslator) – a translator initial_state – initial decoder state src_length (`Optional`[`Integral`]) – length of src sequence, required for some types of length normalization
Return type:	`List`[`SearchOutput`]
Returns:	List of (word_ids, attentions, score, logsoftmaxes)

LengthNormalization¶

class xnmt.length_norm.LengthNormalization[source]¶

Bases: object

A template class to adjust scores for length normalization during search.

normalize_completed(completed_hyps, src_length=None)[source]¶

Apply normalization step to completed hypotheses after search and return the normalized scores.

Parameters:	completed_hyps (`Sequence`[`Hypothesis`]) – list of completed Hypothesis objects, will be normalized in-place src_length (`Optional`[`int`]) – length of source sequence (None if not given)
Return type:	`Sequence`[`float`]
Returns:	normalized scores

normalize_partial_topk(score_so_far, score_to_add, new_len)[source]¶

Apply normalization step after expanding a partial hypothesis and selecting the top k scores.

Parameters:	score_so_far – log score of the partial hypothesis score_to_add – log score of the top-k item that is to be added new_len – new length of partial hypothesis with current word already appended
Returns:	new score after applying score_to_add to score_so_far

class xnmt.length_norm.NoNormalization[source]¶

Bases: xnmt.length_norm.LengthNormalization, xnmt.persistence.Serializable

Adding no form of length normalization.

normalize_completed(completed_hyps, src_length=None)[source]¶

Apply normalization step to completed hypotheses after search and return the normalized scores.

Parameters:	completed_hyps (`Sequence`[`Hypothesis`]) – list of completed Hypothesis objects, will be normalized in-place src_length (`Optional`[`int`]) – length of source sequence (None if not given)
Return type:	`Sequence`[`float`]
Returns:	normalized scores

class xnmt.length_norm.AdditiveNormalization(penalty=-0.1, apply_during_search=False)[source]¶

Bases: xnmt.length_norm.LengthNormalization, xnmt.persistence.Serializable

Adding a fixed word penalty everytime the word is added.

normalize_completed(completed_hyps, src_length=None)[source]¶

Apply normalization step to completed hypotheses after search and return the normalized scores.

Parameters:	completed_hyps (`Sequence`[`Hypothesis`]) – list of completed Hypothesis objects, will be normalized in-place src_length (`Optional`[`int`]) – length of source sequence (None if not given)
Return type:	`Sequence`[`float`]
Returns:	normalized scores

normalize_partial_topk(score_so_far, score_to_add, new_len)[source]¶

Apply normalization step after expanding a partial hypothesis and selecting the top k scores.

Parameters:	score_so_far – log score of the partial hypothesis score_to_add – log score of the top-k item that is to be added new_len – new length of partial hypothesis with current word already appended
Returns:	new score after applying score_to_add to score_so_far

class xnmt.length_norm.PolynomialNormalization(m=1, apply_during_search=False)[source]¶

Bases: xnmt.length_norm.LengthNormalization, xnmt.persistence.Serializable

Dividing by the length (raised to some power)

normalize_completed(completed_hyps, src_length=None)[source]¶

Apply normalization step to completed hypotheses after search and return the normalized scores.

Parameters:	completed_hyps (`Sequence`[`Hypothesis`]) – list of completed Hypothesis objects, will be normalized in-place src_length (`Optional`[`int`]) – length of source sequence (None if not given)
Return type:	`Sequence`[`float`]
Returns:	normalized scores

normalize_partial_topk(score_so_far, score_to_add, new_len)[source]¶

Apply normalization step after expanding a partial hypothesis and selecting the top k scores.

Parameters:	score_so_far – log score of the partial hypothesis score_to_add – log score of the top-k item that is to be added new_len – new length of partial hypothesis with current word already appended
Returns:	new score after applying score_to_add to score_so_far

class xnmt.length_norm.MultinomialNormalization(sent_stats)[source]¶

Bases: xnmt.length_norm.LengthNormalization, xnmt.persistence.Serializable

The algorithm followed by: Tree-to-Sequence Attentional Neural Machine Translation https://arxiv.org/pdf/1603.06075.pdf

normalize_completed(completed_hyps, src_length=None)[source]¶

Parameters:	completed_hyps (`Sequence`[`Hypothesis`]) – src_length (`Optional`[`int`]) – length of the src sent
Return type:	`Sequence`[`float`]

class xnmt.length_norm.GaussianNormalization(sent_stats)[source]¶

Bases: xnmt.length_norm.LengthNormalization, xnmt.persistence.Serializable

The Gaussian regularization encourages the inference to select sents that have similar lengths as the sents in the training set. refer: https://arxiv.org/pdf/1509.04942.pdf

normalize_completed(completed_hyps, src_length=None)[source]¶

Apply normalization step to completed hypotheses after search and return the normalized scores.

Parameters:	completed_hyps (`Sequence`[`Hypothesis`]) – list of completed Hypothesis objects, will be normalized in-place src_length (`Optional`[`int`]) – length of source sequence (None if not given)
Return type:	`Sequence`[`float`]
Returns:	normalized scores

class xnmt.length_norm.EosBooster(boost_val)[source]¶

Bases: xnmt.persistence.Serializable

Callable that applies boosting of end-of-sequence token, can be used with xnmt.search_strategy.BeamSearch.

Parameters:	boost_val (`Real`) – value to add to the eos token’s log probability. Positive values make sentences shorter, negative values make sentences longer.

Evaluation¶

EvalTasks¶

class xnmt.eval.tasks.EvalTask[source]¶

Bases: object

An EvalTask is a task that does evaluation and returns one or more EvalScore objects.

class xnmt.eval.tasks.LossEvalTask(src_file, ref_file=None, model=Ref(path=model), batcher=Ref(path=train.batcher, default=SrcBatcher@140549180156224), loss_calculator=bare(MLELoss), max_src_len=None, max_trg_len=None, max_num_sents=None, loss_comb_method=Ref(path=exp_global.loss_comb_method, default=sum), desc=None)[source]¶

Bases: xnmt.eval.tasks.EvalTask, xnmt.persistence.Serializable

A task that does evaluation of the loss function.

Parameters:

src_file (Union[str, Sequence[str]]) – source file name
ref_file (Optional[str]) – reference file name
model (GeneratorModel) – generator model to use for inference
batcher (Batcher) – batcher to use
loss_calculator (LossCalculator) – loss calculator
max_src_len (Optional[int]) – omit sentences with source length greater than specified number
max_trg_len (Optional[int]) – omit sentences with target length greater than specified number
max_num_sents (Optional[int]) – compute loss only for the first n sentences in the given corpus
loss_comb_method (str) – method for combining loss across batch elements (‘sum’ or ‘avg’).
desc (Optional[Any]) – description to pass on to computed score objects

eval()[source]¶

Perform evaluation task.

Return type:	`EvalScore`
Returns:	Evaluated score

class xnmt.eval.tasks.AccuracyEvalTask(src_file, ref_file, hyp_file, model=Ref(path=model), eval_metrics='bleu', inference=None, perform_inference=True, desc=None)[source]¶

Bases: xnmt.eval.tasks.EvalTask, xnmt.persistence.Serializable

A task that does evaluation of some measure of accuracy.

Parameters:

src_file (Union[str, Sequence[str]]) – path(s) to read source file(s) from
ref_file (Union[str, Sequence[str]]) – path(s) to read reference file(s) from
hyp_file (str) – path to write hypothesis file to
model (GeneratorModel) – generator model to generate hypothesis with
eval_metrics (Union[str, Evaluator, Sequence[Evaluator]]) – list of evaluation metrics (list of Evaluator objects or string of comma-separated shortcuts)
inference (Optional[Inference]) – inference object
perform_inference (bool) – Whether to generate the output or not. One eval task can use an already existing hyp_file that was generated by the previous eval tasks.
desc (Optional[Any]) – human-readable description passed on to resulting score objects

class xnmt.eval.tasks.DecodingEvalTask(src_file, hyp_file, model=Ref(path=model), inference=None)[source]¶

Bases: xnmt.eval.tasks.EvalTask, xnmt.persistence.Serializable

A task that does performs decoding without comparing against a reference.

Parameters:	src_file (`Union`[`str`, `Sequence`[`str`]]) – path(s) to read source file(s) from hyp_file (`str`) – path to write hypothesis file to model (`GeneratorModel`) – generator model to generate hypothesis with inference (`Optional`[`Inference`]) – inference object

Eval Metrics¶

This module contains classes to compute evaluation metrics and to hold the resulting scores.

EvalScore subclasses represent a computed score, including useful statistics, and can be printed with an informative string representation.

Evaluator subclasses are used to compute these scores. Currently the following are implemented:

LossScore (created directly by the model)
BLEUEvaluator and FastBLEUEvaluator create BLEUScore objects
GLEUEvaluator creates GLEUScore objects
WEREvaluator creates WERScore objects
CEREvaluator creates CERScore objects
ExternalEvaluator creates ExternalScore objects
SequenceAccuracyEvaluator creates SequenceAccuracyScore objects

class xnmt.eval.metrics.EvalScore(desc=None)[source]¶

Bases: object

A template class for scores as resulting from using an Evaluator.

Parameters:	desc (`Optional`[`Any`]) – human-readable description to include in log outputs

higher_is_better()[source]¶

Return True if higher values are favorable, False otherwise.

Return type:	`bool`
Returns:	Whether higher values are favorable.

value()[source]¶

Get the numeric value of the evaluated metric.

Return type:	`float`
Returns:	Numeric evaluation score.

metric_name()[source]¶

Get the metric name.

Return type:	`str`
Returns:	Metric name as string.

score_str()[source]¶

A string representation of the evaluated score, potentially including additional statistics.

Return type:	`str`
Returns:	String representation of score.

better_than(another_score)[source]¶

Compare score against another score and return True iff this score is better.

Parameters:	another_score (`EvalScore`) – score to _compare against.
Return type:	`bool`
Returns:	Whether this score is better than `another_score`.

class xnmt.eval.metrics.SentenceLevelEvalScore(desc=None)[source]¶

Bases: xnmt.eval.metrics.EvalScore

A template class for scores that work on a sentence-level and can be aggregated to corpus-level.

static aggregate(scores, desc=None)[source]¶

Aggregate a sequence of sentence-level scores into a corpus-level score.

Parameters:	scores (`Sequence`[`SentenceLevelEvalScore`]) – list of sentence-level scores. desc (`Optional`[`Any`]) – human-readable description.
Return type:	`SentenceLevelEvalScore`
Returns:	Score object that is the aggregate of all sentence-level scores.

class xnmt.eval.metrics.LossScore(loss, loss_stats=None, num_ref_words=None, desc=None)[source]¶

Bases: xnmt.eval.metrics.EvalScore, xnmt.persistence.Serializable

Score indicating the value of the loss function of a neural network.

Parameters:	loss (`Real`) – the (primary) loss value loss_stats (`Optional`[`Dict`[`str`, `Real`]]) – info on additional loss values num_ref_words (`Optional`[`Integral`]) – number of reference tokens desc (`Optional`[`Any`]) – human-readable description to include in log outputs

value()[source]¶

Get the numeric value of the evaluated metric.

Returns:	Numeric evaluation score.

metric_name()[source]¶

Get the metric name.

Returns:	Metric name as string.

higher_is_better()[source]¶

Return True if higher values are favorable, False otherwise.

Returns:	Whether higher values are favorable.

score_str()[source]¶

A string representation of the evaluated score, potentially including additional statistics.

Returns:	String representation of score.

class xnmt.eval.metrics.BLEUScore(bleu, frac_score_list=None, brevity_penalty_score=None, hyp_len=None, ref_len=None, ngram=4, desc=None)[source]¶

Bases: xnmt.eval.metrics.EvalScore, xnmt.persistence.Serializable

Class to keep a BLEU score.

Parameters:

bleu (Real) – actual BLEU score between 0 and 1
frac_score_list (Optional[Sequence[Real]]) – list of fractional scores for each n-gram order
brevity_penalty_score (Optional[Real]) – brevity penalty that was multiplied to the precision score.
hyp_len (Optional[Integral]) – length of hypothesis
ref_len (Optional[Integral]) – length of reference
ngram (Integral) – match n-grams up to this order (usually 4)
desc (Optional[Any]) – human-readable description to include in log outputs

value()[source]¶

Get the numeric value of the evaluated metric.

Returns:	Numeric evaluation score.

metric_name()[source]¶

Get the metric name.

Returns:	Metric name as string.

higher_is_better()[source]¶

Return True if higher values are favorable, False otherwise.

Returns:	Whether higher values are favorable.

score_str()[source]¶

A string representation of the evaluated score, potentially including additional statistics.

Returns:	String representation of score.

class xnmt.eval.metrics.GLEUScore(corpus_n_match, corpus_total, hyp_len, ref_len, desc=None)[source]¶

Bases: xnmt.eval.metrics.SentenceLevelEvalScore, xnmt.persistence.Serializable

Class to keep a GLEU (Google BLEU) score.

Parameters:	gleu – actual GLEU score between 0 and 1 hyp_len (`Integral`) – length of hypothesis ref_len (`Integral`) – length of reference desc (`Optional`[`Any`]) – human-readable description to include in log outputs

value()[source]¶

Get the numeric value of the evaluated metric.

Returns:	Numeric evaluation score.

metric_name()[source]¶

Get the metric name.

Returns:	Metric name as string.

higher_is_better()[source]¶

Return True if higher values are favorable, False otherwise.

Returns:	Whether higher values are favorable.

score_str()[source]¶

A string representation of the evaluated score, potentially including additional statistics.

Returns:	String representation of score.

static aggregate(scores, desc=None)[source]¶

Aggregate a sequence of sentence-level scores into a corpus-level score.

Parameters:	scores (`Sequence`[`SentenceLevelEvalScore`]) – list of sentence-level scores. desc (`Optional`[`Any`]) – human-readable description.
Returns:	Score object that is the aggregate of all sentence-level scores.

class xnmt.eval.metrics.LevenshteinScore(correct, substitutions, insertions, deletions, desc=None)[source]¶

Bases: xnmt.eval.metrics.SentenceLevelEvalScore

A template class for Levenshtein-based scores.

Parameters:	correct (`Integral`) – number of correct matches substitutions (`Integral`) – number of substitution errors insertions (`Integral`) – number of insertion errors deletions (`Integral`) – number of deletion errors desc (`Optional`[`Any`]) – human-readable description to include in log outputs

value()[source]¶

Get the numeric value of the evaluated metric.

Returns:	Numeric evaluation score.

higher_is_better()[source]¶

Return True if higher values are favorable, False otherwise.

Returns:	Whether higher values are favorable.

score_str()[source]¶

A string representation of the evaluated score, potentially including additional statistics.

Returns:	String representation of score.

static aggregate(scores, desc=None)[source]¶

Aggregate a sequence of sentence-level scores into a corpus-level score.

Parameters:	scores (`Sequence`[`LevenshteinScore`]) – list of sentence-level scores. desc (`Optional`[`Any`]) – human-readable description.
Return type:	`LevenshteinScore`
Returns:	Score object that is the aggregate of all sentence-level scores.

class xnmt.eval.metrics.WERScore(correct, substitutions, insertions, deletions, desc=None)[source]¶

Bases: xnmt.eval.metrics.LevenshteinScore, xnmt.persistence.Serializable

Class to keep a word error rate.

metric_name()[source]¶

Get the metric name.

Returns:	Metric name as string.

class xnmt.eval.metrics.CERScore(correct, substitutions, insertions, deletions, desc=None)[source]¶

Bases: xnmt.eval.metrics.LevenshteinScore, xnmt.persistence.Serializable

Class to keep a character error rate.

metric_name()[source]¶

Get the metric name.

Returns:	Metric name as string.

class xnmt.eval.metrics.RecallScore(recall, hyp_len, ref_len, nbest=5, desc=None)[source]¶

Bases: xnmt.eval.metrics.SentenceLevelEvalScore, xnmt.persistence.Serializable

Class to keep a recall score.

Parameters:	recall (`Real`) – recall score value between 0 and 1 hyp_len (`Integral`) – length of hypothesis ref_len (`Integral`) – length of reference nbest (`Integral`) – recall computed within n-best of specified n desc (`Optional`[`Any`]) – human-readable description to include in log outputs

higher_is_better()[source]¶

Return True if higher values are favorable, False otherwise.

Returns:	Whether higher values are favorable.

score_str()[source]¶

A string representation of the evaluated score, potentially including additional statistics.

Returns:	String representation of score.

value()[source]¶

Get the numeric value of the evaluated metric.

Returns:	Numeric evaluation score.

metric_name()[source]¶

Get the metric name.

Returns:	Metric name as string.

static aggregate(scores, desc=None)[source]¶

Aggregate a sequence of sentence-level scores into a corpus-level score.

Parameters:	scores (`Sequence`[`RecallScore`]) – list of sentence-level scores. desc (`Optional`[`Any`]) – human-readable description.
Return type:	`RecallScore`
Returns:	Score object that is the aggregate of all sentence-level scores.

class xnmt.eval.metrics.ExternalScore(value, higher_is_better=True, desc=None)[source]¶

Bases: xnmt.eval.metrics.EvalScore, xnmt.persistence.Serializable

Class to keep a score computed with an external tool.

Parameters:	value (`Real`) – score value higher_is_better (`bool`) – whether higher scores or lower scores are favorable desc (`Optional`[`Any`]) – human-readable description to include in log outputs

value()[source]¶

Get the numeric value of the evaluated metric.

Returns:	Numeric evaluation score.

metric_name()[source]¶

Get the metric name.

Returns:	Metric name as string.

higher_is_better()[source]¶

Return True if higher values are favorable, False otherwise.

Returns:	Whether higher values are favorable.

score_str()[source]¶

A string representation of the evaluated score, potentially including additional statistics.

Returns:	String representation of score.

class xnmt.eval.metrics.SequenceAccuracyScore(num_correct, num_total, desc=None)[source]¶

Bases: xnmt.eval.metrics.SentenceLevelEvalScore, xnmt.persistence.Serializable

Class to keep a sequence accuracy score.

Parameters:	num_correct (`Integral`) – number of correct outputs num_total (`Integral`) – number of total outputs desc (`Optional`[`Any`]) – human-readable description to include in log outputs

higher_is_better()[source]¶

Return True if higher values are favorable, False otherwise.

Returns:	Whether higher values are favorable.

value()[source]¶

Get the numeric value of the evaluated metric.

Returns:	Numeric evaluation score.

metric_name()[source]¶

Get the metric name.

Returns:	Metric name as string.

score_str()[source]¶

A string representation of the evaluated score, potentially including additional statistics.

Returns:	String representation of score.

static aggregate(scores, desc=None)[source]¶

Aggregate a sequence of sentence-level scores into a corpus-level score.

Parameters:	scores (`Sequence`[`SentenceLevelEvalScore`]) – list of sentence-level scores. desc (`Optional`[`Any`]) – human-readable description.
Returns:	Score object that is the aggregate of all sentence-level scores.

class xnmt.eval.metrics.FMeasure(true_pos, false_neg, false_pos, desc=None)[source]¶

Bases: xnmt.eval.metrics.SentenceLevelEvalScore, xnmt.persistence.Serializable

higher_is_better()[source]¶

Return True if higher values are favorable, False otherwise.

Returns:	Whether higher values are favorable.

value()[source]¶

Get the numeric value of the evaluated metric.

Returns:	Numeric evaluation score.

metric_name()[source]¶

Get the metric name.

Returns:	Metric name as string.

score_str()[source]¶

A string representation of the evaluated score, potentially including additional statistics.

Returns:	String representation of score.

static aggregate(scores, desc=None)[source]¶

Aggregate a sequence of sentence-level scores into a corpus-level score.

Parameters:	scores (`Sequence`[`SentenceLevelEvalScore`]) – list of sentence-level scores. desc (`Optional`[`Any`]) – human-readable description.
Returns:	Score object that is the aggregate of all sentence-level scores.

class xnmt.eval.metrics.Evaluator[source]¶

Bases: object

A template class to evaluate the quality of output.

evaluate(ref, hyp, desc=None)[source]¶

Calculate the quality of output given a reference.

Parameters:	ref (`Sequence`[+T_co]) – list of reference sents ( a sentence is a list of tokens ) hyp (`Sequence`[+T_co]) – list of hypothesis sents ( a sentence is a list of tokens ) desc (`Optional`[`Any`]) – optional description that is passed on to score objects

Returns:

Return type:	`EvalScore`

evaluate_multi_ref(ref, hyp, desc=None)[source]¶

Calculate the quality of output given multiple references.

Parameters:	ref (`Sequence`[`Sequence`[+T_co]]) – list of tuples of reference sentences ( a sentence is a list of tokens ) hyp (`Sequence`[+T_co]) – list of hypothesis sentences ( a sentence is a list of tokens ) desc (`Optional`[`Any`]) – optional description that is passed on to score objects
Return type:	`EvalScore`

class xnmt.eval.metrics.SentenceLevelEvaluator(write_sentence_scores=None)[source]¶

Bases: xnmt.eval.metrics.Evaluator

A template class for sentence-level evaluators.

Parameters:	write_sentence_scores (`Optional`[`str`]) – path of file to write sentence-level scores to (in YAML format)

evaluate(ref, hyp, desc=None)[source]¶

Calculate the quality of output given a reference.

Parameters:	ref (`Sequence`[+T_co]) – list of reference sents ( a sentence is a list of tokens ) hyp (`Sequence`[+T_co]) – list of hypothesis sents ( a sentence is a list of tokens ) desc (`Optional`[`Any`]) – optional description that is passed on to score objects

Returns:

Return type:	`SentenceLevelEvalScore`

evaluate_multi_ref(ref, hyp, desc=None)[source]¶

Calculate the quality of output given multiple references.

Parameters:	ref (`Sequence`[`Sequence`[+T_co]]) – list of tuples of reference sentences ( a sentence is a list of tokens ) hyp (`Sequence`[+T_co]) – list of hypothesis sentences ( a sentence is a list of tokens ) desc (`Optional`[`Any`]) – optional description that is passed on to score objects
Return type:	`EvalScore`

class xnmt.eval.metrics.FastBLEUEvaluator(ngram=4, smooth=1)[source]¶

Bases: xnmt.eval.metrics.SentenceLevelEvaluator, xnmt.persistence.Serializable

Class for computing BLEU scores using a fast Cython implementation.

Does not support multiple references. BLEU scores are computed according to K Papineni et al “BLEU: a method for automatic evaluation of machine translation”

Parameters:	ngram (`Integral`) – consider ngrams up to this order (usually 4) smooth (`Real`) –

class xnmt.eval.metrics.BLEUEvaluator(ngram=4)[source]¶

Bases: xnmt.eval.metrics.Evaluator, xnmt.persistence.Serializable

Compute BLEU scores against one or several references.

BLEU scores are computed according to K Papineni et al “BLEU: a method for automatic evaluation of machine translation”

Parameters:	ngram (`Integral`) – consider ngrams up to this order (usually 4)

evaluate(ref, hyp, desc=None)[source]¶

Parameters:	ref (`Sequence`[`Sequence`[`str`]]) – reference sentences (single-reference case: sentence is list of strings; hyp (`Sequence`[`Sequence`[`str`]]) – list of hypothesis sentences ( a sentence is a list of tokens ) desc (`Optional`[`Any`]) – description to pass on to returned score
Return type:	`BLEUScore`
Returns:	Score, including intermediate results such as ngram ratio, sentence length, brevity penalty

evaluate_multi_ref(ref, hyp, desc=None)[source]¶

Parameters:	ref (`Sequence`[`Sequence`[`Sequence`[`str`]]]) – list of tuples of reference sentences ( a sentence is a list of tokens ) hyp (`Sequence`[`Sequence`[`str`]]) – list of hypothesis sentences ( a sentence is a list of tokens ) desc (`Optional`[`Any`]) – optional description that is passed on to score objects
Return type:	`BLEUScore`
Returns:	Score, including intermediate results such as ngram ratio, sentence length, brevity penalty

class xnmt.eval.metrics.GLEUEvaluator(min_length=1, max_length=4, write_sentence_scores=None)[source]¶

Bases: xnmt.eval.metrics.SentenceLevelEvaluator, xnmt.persistence.Serializable

Class for computing GLEU (Google BLEU) Scores.

GLEU scores are described in https://arxiv.org/pdf/1609.08144v2.pdf as follows:

“The BLEU score has some undesirable properties when used for single sentences, as it was designed to be a corpus measure. We therefore use a slightly different score for our RL experiments which we call the ‘GLEU score’. For the GLEU score, we record all sub-sequences of 1, 2, 3 or 4 tokens in output and target sequence (n-grams). We then compute a recall, which is the ratio of the number of matching n-grams to the number of total n-grams in the target (ground truth) sequence, and a precision, which is the ratio of the number of matching n-grams to the number of total n-grams in the generated output sequence. Then GLEU score is simply the minimum of recall and precision. This GLEU score’s range is always between 0 (no matches) and 1 (all match) and it is symmetrical when switching output and target. According to our experiments, GLEU score correlates quite well with the BLEU metric on a corpus level but does not have its drawbacks for our per sentence reward objective.”

Parameters:	min_length (`Integral`) – minimum n-gram order to consider max_length (`Integral`) – maximum n-gram order to consider write_sentence_scores (`Optional`[`str`]) – path of file to write sentence-level scores to (in YAML format)

evaluate_one_sent(ref, hyp)[source]¶

Parameters:	ref (`Sequence`[`str`]) – reference sentence ( a sent is a list of tokens ) hyp (`Sequence`[`str`]) – hypothesis sentence ( a sent is a list of tokens )
Returns:	GLEU score object

class xnmt.eval.metrics.WEREvaluator(case_sensitive=False, write_sentence_scores=None)[source]¶

Bases: xnmt.eval.metrics.SentenceLevelEvaluator, xnmt.persistence.Serializable

A class to evaluate the quality of output in terms of word error rate.

Parameters:	case_sensitive (`bool`) – whether scoring should be case-sensitive write_sentence_scores (`Optional`[`str`]) – path of file to write sentence-level scores to (in YAML format)

class xnmt.eval.metrics.CEREvaluator(case_sensitive=False, write_sentence_scores=None)[source]¶

Bases: xnmt.eval.metrics.SentenceLevelEvaluator, xnmt.persistence.Serializable

A class to evaluate the quality of output in terms of character error rate.

Parameters:	case_sensitive (`bool`) – whether scoring should be case-sensitive write_sentence_scores (`Optional`[`str`]) – path of file to write sentence-level scores to (in YAML format)

evaluate_one_sent(ref, hyp)[source]¶

Calculate the quality of output sentence given a reference.

Parameters:	ref (`Sequence`[`str`]) – list of reference words hyp (`Sequence`[`str`]) – list of decoded words
Returns:	(ins+del+sub) / (ref_len)
Return type:	character error rate

class xnmt.eval.metrics.ExternalEvaluator(path=None, higher_better=True)[source]¶

Bases: xnmt.eval.metrics.Evaluator, xnmt.persistence.Serializable

A class to evaluate the quality of the output according to an external evaluation script.

Does not support multiple references. The external script should only print a number representing the calculated score.

Parameters:	path (`Optional`[`str`]) – path to external command line tool. higher_better (`bool`) – whether to interpret higher scores as favorable.

evaluate(ref, hyp, desc=None)[source]¶

Calculate the quality of output according to an external script.

Parameters:	ref – (ignored) hyp – (ignored) desc – description to pass on to returned score
Returns:	external eval script score

class xnmt.eval.metrics.RecallEvaluator(nbest=5, write_sentence_scores=None)[source]¶

Bases: xnmt.eval.metrics.SentenceLevelEvaluator, xnmt.persistence.Serializable

Compute recall by counting true positives.

Parameters:	nbest (`Integral`) – compute recall within n-best of specified n write_sentence_scores (`Optional`[`str`]) – path of file to write sentence-level scores to (in YAML format)

evaluate(ref, hyp, desc=None)[source]¶

Calculate the quality of output given a reference.

Parameters:	ref – list of reference sents ( a sentence is a list of tokens ) hyp – list of hypothesis sents ( a sentence is a list of tokens ) desc – optional description that is passed on to score objects

Returns:

class xnmt.eval.metrics.SequenceAccuracyEvaluator(case_sensitive=False, write_sentence_scores=None)[source]¶

Bases: xnmt.eval.metrics.SentenceLevelEvaluator, xnmt.persistence.Serializable

A class to evaluate the quality of output in terms of sequence accuracy.

Parameters:	case_sensitive – whether differences in capitalization are to be considered write_sentence_scores (`Optional`[`str`]) – path of file to write sentence-level scores to (in YAML format)

evaluate_one_sent(ref, hyp)[source]¶

Calculate the accuracy of output given a references.

Parameters:	ref (`Sequence`[`str`]) – list of list of reference words hyp (`Sequence`[`str`]) – list of list of decoded words

Return: formatted string

class xnmt.eval.metrics.FMeasureEvaluator(pos_token='1', write_sentence_scores=None)[source]¶

Bases: xnmt.eval.metrics.SentenceLevelEvaluator, xnmt.persistence.Serializable

A class to evaluate the quality of output in terms of classification F-score.

Parameters:	pos_token (`str`) – token for the ‘positive’ class write_sentence_scores (`Optional`[`str`]) – path of file to write sentence-level scores to (in YAML format)

evaluate_one_sent(ref, hyp)[source]¶

Calculate the accuracy of output given a references.

Parameters:	ref (`Sequence`[`str`]) – list of list of reference words hyp (`Sequence`[`str`]) – list of list of decoded words

Return: formatted string

class xnmt.eval.metrics.SegmentationFMeasureEvaluator(write_sentence_scores=None)[source]¶: Bases: xnmt.eval.metrics.SentenceLevelEvaluator, xnmt.persistence.Serializable

Data¶

Sentence¶

class xnmt.sent.Sentence(idx=None, score=None)[source]¶

Bases: object

A template class to represent a single data example of any type, used for both model input and output.

Parameters:	idx (`Optional`[`int`]) – running sentence number (0-based; unique among sentences loaded from the same file, but not across files) score (`Optional`[`Real`]) – a score given to this sentence by a model

sent_len()[source]¶

Return length of input, included padded tokens.

Returns: length

Return type:	`int`

len_unpadded()[source]¶

Return length of input prior to applying any padding.

Returns: unpadded length

Return type:	`int`

create_padded_sent(pad_len)[source]¶

Return a new, padded version of the sentence (or self if pad_len is zero).

Parameters:	pad_len (`Integral`) – number of tokens to append
Return type:	`Sentence`
Returns:	padded sentence

create_truncated_sent(trunc_len)[source]¶

Create a new, right-truncated version of the sentence (or self if trunc_len is zero).

Parameters:	trunc_len (`Integral`) – number of tokens to truncate
Return type:	`Sentence`
Returns:	truncated sentence

get_unpadded_sent()[source]¶

Return the unpadded sentence.

If self is unpadded, return self, if not return reference to original unpadded sentence if possible, otherwise create a new sentence.

Return type:	`Sentence`

class xnmt.sent.ReadableSentence(idx, score=None, output_procs=[])[source]¶

Bases: xnmt.sent.Sentence

A base class for sentences based on readable strings.

Parameters:	idx (`Integral`) – running sentence number (0-based; unique among sentences loaded from the same file, but not across files) score (`Optional`[`Real`]) – a score given to this sentence by a model output_procs (`Union`[`OutputProcessor`, `Sequence`[`OutputProcessor`]]) – output processors to be applied when calling sent_str()

str_tokens(**kwargs)[source]¶

Return list of readable string tokens.

Parameters:	**kwargs – should accept arbitrary keyword args

Returns: list of tokens.

Return type:	`List`[`str`]

sent_str(custom_output_procs=None, **kwargs)[source]¶

Return a single string containing the readable version of the sentence.

Parameters:	custom_output_procs – if not None, overwrite the sentence’s default output processors **kwargs – should accept arbitrary keyword args

Returns: readable string

Return type:	`str`

class xnmt.sent.ScalarSentence(value, idx=None, vocab=None, score=None)[source]¶

Bases: xnmt.sent.ReadableSentence

A sentence represented by a single integer value, optionally interpreted via a vocab.

This is useful for classification-style problems.

Parameters:	value (`Integral`) – scalar value idx (`Optional`[`Integral`]) – running sentence number (0-based; unique among sentences loaded from the same file, but not across files) vocab (`Optional`[`Vocab`]) – optional vocab to give different scalar values a string representation. score (`Optional`[`Real`]) – a score given to this sentence by a model

sent_len()[source]¶

Return length of input, included padded tokens.

Returns: length

Return type:	`int`

len_unpadded()[source]¶

Return length of input prior to applying any padding.

Returns: unpadded length

Return type:	`int`

create_padded_sent(pad_len)[source]¶

Return a new, padded version of the sentence (or self if pad_len is zero).

Parameters:	pad_len (`Integral`) – number of tokens to append
Return type:	`ScalarSentence`
Returns:	padded sentence

create_truncated_sent(trunc_len)[source]¶

Create a new, right-truncated version of the sentence (or self if trunc_len is zero).

Parameters:	trunc_len (`Integral`) – number of tokens to truncate
Return type:	`ScalarSentence`
Returns:	truncated sentence

get_unpadded_sent()[source]¶

Return the unpadded sentence.

If self is unpadded, return self, if not return reference to original unpadded sentence if possible, otherwise create a new sentence.

str_tokens(**kwargs)[source]¶

Return list of readable string tokens.

Parameters:	**kwargs – should accept arbitrary keyword args

Returns: list of tokens.

Return type:	`List`[`str`]

class xnmt.sent.CompoundSentence(sents)[source]¶

Bases: xnmt.sent.Sentence

A compound sentence contains several sentence objects that present different ‘views’ on the same data examples.

Parameters:	sents (`Sequence`[`Sentence`]) – a list of sentences

sent_len()[source]¶

Return length of input, included padded tokens.

Returns: length

Return type:	`int`

len_unpadded()[source]¶

Return length of input prior to applying any padding.

Returns: unpadded length

Return type:	`int`

create_padded_sent(pad_len)[source]¶

Return a new, padded version of the sentence (or self if pad_len is zero).

Parameters:	pad_len – number of tokens to append
Returns:	padded sentence

create_truncated_sent(trunc_len)[source]¶

Create a new, right-truncated version of the sentence (or self if trunc_len is zero).

Parameters:	trunc_len – number of tokens to truncate
Returns:	truncated sentence

get_unpadded_sent()[source]¶

Return the unpadded sentence.

If self is unpadded, return self, if not return reference to original unpadded sentence if possible, otherwise create a new sentence.

class xnmt.sent.SimpleSentence(words, idx=None, vocab=None, score=None, output_procs=[], pad_token=1, unpadded_sent=None)[source]¶

Bases: xnmt.sent.ReadableSentence

A simple sentence, represented as a list of tokens

Parameters:

words (Sequence[Integral]) – list of integer word ids
idx (Optional[Integral]) – running sentence number (0-based; unique among sentences loaded from the same file, but not across files)
vocab (Optional[Vocab]) – optionally vocab mapping word ids to strings
score (Optional[Real]) – a score given to this sentence by a model
output_procs (Union[OutputProcessor, Sequence[OutputProcessor]]) – output processors to be applied when calling sent_str()
pad_token (Integral) – special token used for padding
unpadded_sent (Optional[SimpleSentence]) – reference to original, unpadded sentence if available

sent_len()[source]¶

Return length of input, included padded tokens.

Returns: length

create_padded_sent(pad_len)[source]¶

Return a new, padded version of the sentence (or self if pad_len is zero).

Parameters:	pad_len (`Integral`) – number of tokens to append
Return type:	`SimpleSentence`
Returns:	padded sentence

create_truncated_sent(trunc_len)[source]¶

Create a new, right-truncated version of the sentence (or self if trunc_len is zero).

Parameters:	trunc_len (`Integral`) – number of tokens to truncate
Return type:	`SimpleSentence`
Returns:	truncated sentence

get_unpadded_sent()[source]¶

Return the unpadded sentence.

If self is unpadded, return self, if not return reference to original unpadded sentence if possible, otherwise create a new sentence.

str_tokens(exclude_ss_es=True, exclude_unk=False, exclude_padded=True, **kwargs)[source]¶

Return list of readable string tokens.

Parameters:	**kwargs – should accept arbitrary keyword args

Returns: list of tokens.

Return type:	`List`[`str`]

class xnmt.sent.SegmentedSentence(segment=[], **kwargs)[source]¶: Bases: xnmt.sent.SimpleSentence

class xnmt.sent.ArraySentence(nparr, idx=None, padded_len=0, score=None, unpadded_sent=None)[source]¶

Bases: xnmt.sent.Sentence

A sentence based on a numpy array containing a continuous-space vector for each token.

Parameters:	idx (`Optional`[`Integral`]) – running sentence number (0-based; unique among sentences loaded from the same file, but not across files) nparr (`ndarray`) – numpy array of dimension num_tokens x token_size padded_len (`Integral`) – how many padded tokens are contained in the given nparr score (`Optional`[`Real`]) – a score given to this sentence by a model

sent_len()[source]¶

Return length of input, included padded tokens.

Returns: length

len_unpadded()[source]¶

Return length of input prior to applying any padding.

Returns: unpadded length

create_padded_sent(pad_len)[source]¶

Return a new, padded version of the sentence (or self if pad_len is zero).

Parameters:	pad_len (`Integral`) – number of tokens to append
Return type:	`ArraySentence`
Returns:	padded sentence

create_truncated_sent(trunc_len)[source]¶

Create a new, right-truncated version of the sentence (or self if trunc_len is zero).

Parameters:	trunc_len (`Integral`) – number of tokens to truncate
Return type:	`ArraySentence`
Returns:	truncated sentence

get_unpadded_sent()[source]¶

Return the unpadded sentence.

If self is unpadded, return self, if not return reference to original unpadded sentence if possible, otherwise create a new sentence.

class xnmt.sent.NbestSentence(base_sent, nbest_id, print_score=False)[source]¶

Bases: xnmt.sent.SimpleSentence

Output in the context of an nbest list.

Parameters:	base_sent (`SimpleSentence`) – The base sent object nbest_id (`Integral`) – The sentence id in the nbest list print_score (`bool`) – If True, print nbest_id, score, content separated by `\|\|\|`. If False, drop the score.

sent_str(custom_output_procs=None, **kwargs)[source]¶

Return a single string containing the readable version of the sentence.

Parameters:	custom_output_procs – if not None, overwrite the sentence’s default output processors **kwargs – should accept arbitrary keyword args

Returns: readable string

Return type:	`str`

class xnmt.sent.GraphSentence(idx, graph, vocab, num_padded=0, unpadded_sent=None)[source]¶

Bases: xnmt.sent.ReadableSentence

A graph structure.

This is a wrapper for a graph datastructure.

Parameters:

idx (Optional[Integral]) – running sentence number (0-based; unique among sentences loaded from the same file, but not across files)
graph (HyperGraph) – hypergraph containing graphs
vocab (Vocab) – vocabulary for word IDs
num_padded (Integral) – denoting that this many words are padded (without adding any physical nodes)
unpadded_sent (Optional[GraphSentence]) – reference to original, unpadded sentence if available

sent_len()[source]¶

Return number of nodes in the graph, including padded words.

Return type:	`int`
Returns:	Number of nodes in graph.

len_unpadded()[source]¶

Return number of nodes in the graph, without counting padded words.

Return type:	`int`
Returns:	Number of nodes in graph.

create_padded_sent(pad_len)[source]¶

Return padded graph.

Parameters:	pad_len (`Integral`) – Number of tokens to pad.
Return type:	`GraphSentence`
Returns:	New padded graph, or self if pad_len==0.

create_truncated_sent(trunc_len)[source]¶

Return self, as truncation is not supported.

Parameters:	trunc_len (`Integral`) – Number of tokens to truncate, must be 0.
Return type:	`GraphSentence`
Returns:	self.

get_unpadded_sent()[source]¶

Return the unpadded sentence.

If self is unpadded, return self, if not return reference to original unpadded sentence if possible, otherwise create a new sentence.

Return type:	`GraphSentence`

reversed()[source]¶

Create a graph with reversed direction.

The new graph will have graph nodes in reversed order and switched successors/predecessors. It will have the same number of padded nodes (again at the end of the nodes!).

Return type:	`GraphSentence`
Returns:	Reversed graph.

str_tokens(**kwargs)[source]¶

Return list of readable string tokens.

Parameters:	**kwargs – ignored

Returns: list of tokens of linearized graph.

Return type:	`List`[`str`]

sent_str(custom_output_procs=None, **kwargs)[source]¶

Return a single string containing the readable version of the sentence.

Parameters:	custom_output_procs – ignored **kwargs – ignored

Returns: readable string

Return type:	`str`

class xnmt.sent.LatticeNode(node_id, value, fwd_log_prob=0, marginal_log_prob=0, bwd_log_prob=0)[source]¶

Bases: xnmt.graph.HyperNode

A lattice node.

Parameters:

node_id (int) – Unique identifier for node
value (Integral) – Word id assigned to this node.
fwd_log_prob (Optional[Real]) – Lattice log probability normalized in forward-direction (successors sum to 1)
marginal_log_prob (Optional[Real]) – Lattice log probability globally normalized
bwd_log_prob (Optional[Real]) – Lattice log probability normalized in backward-direction (predecessors sum to 1)

class xnmt.sent.SyntaxTreeNode(node_id, value, head, node_type=<Type.NONE: 0>)[source]¶

Bases: xnmt.graph.HyperNode

class Type[source]¶

Bases: enum.Enum

An enumeration.

class xnmt.sent.RNNGSequenceSentence(idx, graph, surface_vocab, nt_vocab, all_surfaces=False, num_padded=0, unpadded_sent=None)[source]¶

Bases: xnmt.sent.ReadableSentence

sent_len()[source]¶

Return length of input, included padded tokens.

Returns: length

Return type:	`int`

len_unpadded()[source]¶

Return length of input prior to applying any padding.

Returns: unpadded length

Return type:	`int`

create_padded_sent(pad_len)[source]¶

Return a new, padded version of the sentence (or self if pad_len is zero).

Parameters:	pad_len (`Integral`) – number of tokens to append
Return type:	`ScalarSentence`
Returns:	padded sentence

create_truncated_sent(trunc_len)[source]¶

Create a new, right-truncated version of the sentence (or self if trunc_len is zero).

Parameters:	trunc_len (`Integral`) – number of tokens to truncate
Return type:	`ScalarSentence`
Returns:	truncated sentence

get_unpadded_sent()[source]¶

Return the unpadded sentence.

If self is unpadded, return self, if not return reference to original unpadded sentence if possible, otherwise create a new sentence.

str_tokens(**kwargs)[source]¶

Return list of readable string tokens.

Parameters:	**kwargs – should accept arbitrary keyword args

Returns: list of tokens.

Return type:	`List`[`str`]

sent_str()[source]¶

Return a single string containing the readable version of the sentence.

Parameters:	custom_output_procs – if not None, overwrite the sentence’s default output processors **kwargs – should accept arbitrary keyword args

Returns: readable string

InputReader¶

class xnmt.input_readers.InputReader[source]¶

Bases: object

A base class to read in a file and turn it into an input

read_sents(filename, filter_ids=None)[source]¶

Read sentences and return an iterator.

Parameters:	filename (`str`) – data file filter_ids (`Optional`[`Sequence`[`Integral`]]) – only read sentences with these ids (0-indexed)

Returns: iterator over sentences from filename

Return type:	`Iterator`[`Sentence`]

count_sents(filename)[source]¶

Count the number of sentences in a data file.

Parameters:	filename (`str`) – data file

Returns: number of sentences in the data file

Return type:	`int`

needs_reload()[source]¶

Overwrite this method if data needs to be reload for each epoch

Return type:	`bool`

class xnmt.input_readers.BaseTextReader[source]¶

Bases: xnmt.input_readers.InputReader

read_sent(line, idx)[source]¶

Convert a raw text line into an input object.

Parameters:	line (`str`) – a single input string idx (`Integral`) – sentence number

Returns: a SentenceInput object for the input sentence

Return type:	`Sentence`

iterate_filtered(filename, filter_ids=None)[source]¶

Parameters:	filename (`str`) – data file (text file) filter_ids (`Optional`[`Sequence`[`Integral`]]) –

Returns: iterator over lines as strings (useful for subclasses to implement read_sents)

Return type:	`Iterator`[+T_co]

class xnmt.input_readers.PlainTextReader(vocab=None, read_sent_len=False, output_proc=[])[source]¶

Bases: xnmt.input_readers.BaseTextReader, xnmt.persistence.Serializable

Handles the typical case of reading plain text files, with one sent per line.

Parameters:

vocab (Optional[Vocab]) – Vocabulary to convert string tokens to integer ids. If not given, plain text will be assumed to contain space-separated integer ids.
read_sent_len (bool) – if set, read the length of each sentence instead of the sentence itself. EOS is not counted.
output_proc (Sequence[OutputProcessor]) – output processors to revert the created sentences back to a readable string

read_sent(line, idx)[source]¶

Convert a raw text line into an input object.

Parameters:	line (`str`) – a single input string idx (`Integral`) – sentence number

Returns: a SentenceInput object for the input sentence

Return type:	`Sentence`

class xnmt.input_readers.CompoundReader(readers, vocab=None)[source]¶

Bases: xnmt.input_readers.InputReader, xnmt.persistence.Serializable

A compound reader reads inputs using several input readers at the same time.

The resulting inputs will be of type sent.CompoundSentence, which holds the results from the different readers as a tuple. Inputs can be read from different locations (if input file name is a sequence of filenames) or all from the same location (if it is a string). The latter can be used to read the same inputs using several input different readers which might capture different aspects of the input data.

Parameters:	readers (`Sequence`[`InputReader`]) – list of input readers to use vocab (`Optional`[`Vocab`]) – not used by this reader, but some parent components may require access to the vocab.

read_sents(filename, filter_ids=None)[source]¶

Read sentences and return an iterator.

Parameters:	filename (`Union`[`str`, `Sequence`[`str`]]) – data file filter_ids (`Optional`[`Sequence`[`Integral`]]) – only read sentences with these ids (0-indexed)

Returns: iterator over sentences from filename

Return type:	`Iterator`[`Sentence`]

count_sents(filename)[source]¶

Count the number of sentences in a data file.

Parameters:	filename (`str`) – data file

Returns: number of sentences in the data file

Return type:	`int`

needs_reload()[source]¶

Overwrite this method if data needs to be reload for each epoch

Return type:	`bool`

class xnmt.input_readers.SentencePieceTextReader(model_file, sample_train=False, l=-1, alpha=0.1, vocab=None, output_proc=[<class 'xnmt.output.JoinPieceTextOutputProcessor'>])[source]¶

Bases: xnmt.input_readers.BaseTextReader, xnmt.persistence.Serializable

Read in text and segment it with sentencepiece. Optionally perform sampling for subword regularization, only at training time. https://arxiv.org/pdf/1804.10959.pdf

read_sent(line, idx)[source]¶

Convert a raw text line into an input object.

Parameters:	line (`str`) – a single input string idx (`Integral`) – sentence number

Returns: a SentenceInput object for the input sentence

Return type:	`SimpleSentence`

class xnmt.input_readers.RamlTextReader(tau=1.0, vocab=None, output_proc=[])[source]¶

Bases: xnmt.input_readers.BaseTextReader, xnmt.persistence.Serializable

Handles the RAML sampling, can be used on the target side, or on both the source and target side. Randomly replaces words according to Hamming Distance. https://arxiv.org/pdf/1808.07512.pdf https://arxiv.org/pdf/1609.00150.pdf

read_sent(line, idx)[source]¶

Convert a raw text line into an input object.

Parameters:	line (`str`) – a single input string idx (`Integral`) – sentence number

Returns: a SentenceInput object for the input sentence

Return type:	`SimpleSentence`

needs_reload()[source]¶

Overwrite this method if data needs to be reload for each epoch

Return type:	`bool`

class xnmt.input_readers.CharFromWordTextReader(vocab=None, read_sent_len=False, output_proc=[])[source]¶

Bases: xnmt.input_readers.PlainTextReader, xnmt.persistence.Serializable

Read in word based corpus and turned that into SegmentedSentence. SegmentedSentece’s words are characters, but it contains the information of the segmentation.

x = SegmentedSentence(“i code today”) (TRUE) x.words == [“i”, “c”, “o”, “d”, “e”, “t”, “o”, “d”, “a”, “y”] (TRUE) x.segment == [0, 4, 9]

It means that the segmentation (end of words) happen in the 0th, 4th and 9th position of the char sequence.

read_sent(line, idx)[source]¶

Convert a raw text line into an input object.

Parameters:	line (`str`) – a single input string idx (`Integral`) – sentence number

Returns: a SentenceInput object for the input sentence

Return type:	`SegmentedSentence`

class xnmt.input_readers.H5Reader(transpose=False, feat_from=None, feat_to=None, feat_skip=None, timestep_skip=None, timestep_truncate=None)[source]¶

Bases: xnmt.input_readers.InputReader, xnmt.persistence.Serializable

Handles the case where sents are sequences of continuous-space vectors.

The input is a “.h5” file, which can be created for example using xnmt.preproc.MelFiltExtractor

The data items are assumed to be labeled with integers 0, 1, .. (converted to strings).

Each data item will be a 2D matrix representing a sequence of vectors. They can be in either order, depending on the value of the “transpose” variable: * sents[sent_id][feat_ind,timestep] if transpose=False * sents[sent_id][timestep,feat_ind] if transpose=True

Parameters:

transpose (bool) – whether inputs are transposed or not.
feat_from (Optional[Integral]) – use feature dimensions in a range, starting at this index (inclusive)
feat_to (Optional[Integral]) – use feature dimensions in a range, ending at this index (exclusive)
feat_skip (Optional[Integral]) – stride over features
timestep_skip (Optional[Integral]) – stride over timesteps
timestep_truncate (Optional[Integral]) – cut off timesteps if sequence is longer than specified value

read_sents(filename, filter_ids=None)[source]¶

Read sentences and return an iterator.

Parameters:	filename (`str`) – data file filter_ids (`Optional`[`Sequence`[`Integral`]]) – only read sentences with these ids (0-indexed)

Returns: iterator over sentences from filename

Return type:	`Iterator`[`ArraySentence`]

count_sents(filename)[source]¶

Count the number of sentences in a data file.

Parameters:	filename (`str`) – data file

Returns: number of sentences in the data file

Return type:	`Integral`

class xnmt.input_readers.NpzReader(transpose=False, feat_from=None, feat_to=None, feat_skip=None, timestep_skip=None, timestep_truncate=None)[source]¶

Bases: xnmt.input_readers.InputReader, xnmt.persistence.Serializable

Handles the case where sents are sequences of continuous-space vectors.

The input is a “.npz” file, which consists of multiply “.npy” files, each corresponding to a single sequence of continuous features. This can be created in two ways: * Use the builtin function numpy.savez_compressed() * Create a bunch of .npy files, and run “zip” on them to zip them into an archive.

The file names should be named XXX_0, XXX_1, etc., where the final number after the underbar indicates the order of the sequence in the corpus. This is done automatically by numpy.savez_compressed(), in which case the names will be arr_0, arr_1, etc.

Each numpy file will be a 2D matrix representing a sequence of vectors. They can be in either order, depending on the value of the “transpose” variable. * sents[sent_id][feat_ind,timestep] if transpose=False * sents[sent_id][timestep,feat_ind] if transpose=True

Parameters:

transpose (bool) – whether inputs are transposed or not.
feat_from (Optional[Integral]) – use feature dimensions in a range, starting at this index (inclusive)
feat_to (Optional[Integral]) – use feature dimensions in a range, ending at this index (exclusive)
feat_skip (Optional[Integral]) – stride over features
timestep_skip (Optional[Integral]) – stride over timesteps
timestep_truncate (Optional[Integral]) – cut off timesteps if sequence is longer than specified value

read_sents(filename, filter_ids=None)[source]¶

Read sentences and return an iterator.

Parameters:	filename (`str`) – data file filter_ids (`Optional`[`Sequence`[`Integral`]]) – only read sentences with these ids (0-indexed)

Returns: iterator over sentences from filename

Return type:	`None`

count_sents(filename)[source]¶

Count the number of sentences in a data file.

Parameters:	filename (`str`) – data file

Returns: number of sentences in the data file

Return type:	`Integral`

class xnmt.input_readers.IDReader[source]¶

Bases: xnmt.input_readers.BaseTextReader, xnmt.persistence.Serializable

Handles the case where we need to read in a single ID (like retrieval problems).

Files must be text files containing a single integer per line.

read_sent(line, idx)[source]¶

Convert a raw text line into an input object.

Parameters:	line (`str`) – a single input string idx (`Integral`) – sentence number

Returns: a SentenceInput object for the input sentence

Return type:	`ScalarSentence`

read_sents(filename, filter_ids=None)[source]¶

Read sentences and return an iterator.

Parameters:	filename (`str`) – data file filter_ids (`Optional`[`Sequence`[`Integral`]]) – only read sentences with these ids (0-indexed)

Returns: iterator over sentences from filename

Return type:	`list`

class xnmt.input_readers.CoNLLToRNNGActionsReader(surface_vocab, nt_vocab)[source]¶

Bases: xnmt.input_readers.BaseTextReader, xnmt.persistence.Serializable

Handles the reading of CoNLL File Format:

ID FORM LEMMA POS FEAT HEAD DEPREL

A single line represents a single edge of dependency parse tree.

read_sents(filename, filter_ids=None)[source]¶

Read sentences and return an iterator.

Parameters:	filename (`str`) – data file filter_ids (`Optional`[`Sequence`[`Integral`]]) – only read sentences with these ids (0-indexed)

Returns: iterator over sentences from filename

class xnmt.input_readers.LatticeReader(vocab, text_input=False, flatten=False)[source]¶

Bases: xnmt.input_readers.BaseTextReader, xnmt.persistence.Serializable

Reads lattices from a text file.

The expected lattice file format is as follows: * 1 line per lattice * lines are serialized python lists / tuples * 2 lists per lattice: - list of nodes, with every node a 4-tuple: (lexicon_entry, fwd_log_prob, marginal_log_prob, bwd_log_prob) - list of arcs, each arc a tuple: (node_id_start, node_id_end) - node_id references the nodes and is 0-indexed - node_id_start < node_id_end * All paths must share a common start and end node, i.e. <s> and </s> need to be contained in the lattice

A simple example lattice:: [(‘<s>’, 0.0, 0.0, 0.0), (‘buenas’, 0, 0.0, 0.0), (‘tardes’, 0, 0.0, 0.0), (‘</s>’, 0.0, 0.0, 0.0)],[(0, 1), (1, 2), (2, 3)]

Parameters:	vocab (`Vocab`) – Vocabulary to convert string tokens to integer ids. If not given, plain text will be assumed to contain space-separated integer ids. text_input (`bool`) – If `True`, assume a standard text file as input and convert it to a flat lattice. flatten – If `True`, convert to a flat lattice, with all probabilities set to 1.

read_sent(line, idx)[source]¶

Convert a raw text line into an input object.

Parameters:	line – a single input string idx – sentence number

Returns: a SentenceInput object for the input sentence

xnmt.input_readers.read_parallel_corpus(src_reader, trg_reader, src_file, trg_file, batcher=None, sample_sents=None, max_num_sents=None, max_src_len=None, max_trg_len=None)[source]¶

A utility function to read a parallel corpus.

Parameters:	src_reader (`InputReader`) – trg_reader (`InputReader`) – src_file (`str`) – trg_file (`str`) – batcher (`Optional`[`Batcher`]) – sample_sents (`Optional`[`Integral`]) – if not None, denote the number of sents that should be randomly chosen from all available sents. max_num_sents (`Optional`[`Integral`]) – if not None, read only the first this many sents max_src_len (`Optional`[`Integral`]) – skip pair if src side is too long max_trg_len (`Optional`[`Integral`]) – skip pair if trg side is too long
Return type:	`tuple`
Returns:	A tuple of (src_data, trg_data, src_batches, trg_batches) where `_batches = _data` if `batcher=None`

Vocab¶

class xnmt.vocabs.Vocab(i2w=None, vocab_file=None, sentencepiece_vocab=False)[source]¶

Bases: xnmt.persistence.Serializable

An open vocabulary that converts between strings and integer ids.

The open vocabulary is realized via a special unknown-word token that is used whenever a word is not inside the list of known tokens. This class is immutable, i.e. its contents are not to change after the vocab has been initialized.

For initialization, i2w or vocab_file must be specified, but not both.

Parameters:	i2w (`Optional`[`Sequence`[`str`]]) – complete list of known words, including `<s>` and `</s>`. vocab_file (`Optional`[`str`]) – file containing one word per line, and not containing <s>, </s>, <unk> sentencepiece_vocab (`bool`) – Set to `True` if `vocab_file` is the output of the sentencepiece tokenizer. Defaults to `False`.

static i2w_from_vocab_file(vocab_file, sentencepiece_vocab=False)[source]¶

Load the vocabulary from a file.

If sentencepiece_vocab is set to True, this will accept a sentencepiece vocabulary file

Parameters:	vocab_file (`str`) – file containing one word per line, and not containing `<s>`, `</s>`, `<unk>` sentencepiece_vocab (bool) – Set to `True` if `vocab_file` is the output of the sentencepiece tokenizer. Defaults to `False`.
Return type:	`List`[`str`]

is_compatible(other)[source]¶

Check if this vocab produces the same conversions as another one.

Return type:	`bool`

Batcher¶

class xnmt.batchers.Batch[source]¶

Bases: abc.ABC

An abstract base class for minibatches of things.

class xnmt.batchers.ListBatch(batch_elements, mask=None)[source]¶

Bases: list, xnmt.batchers.Batch

A class containing a minibatch of things.

This class behaves like a Python list, but adds semantics that the contents form a (mini)batch of things. An optional mask can be specified to indicate padded parts of the inputs. Should be treated as an immutable object.

Parameters:	batch_elements (`list`) – list of things mask (`Optional`[`Mask`]) – optional mask when batch contains items of unequal size

class xnmt.batchers.CompoundBatch(*batch_elements)[source]¶

Bases: xnmt.batchers.Batch

A compound batch contains several parallel batches.

Parameters:	*batch_elements – one or several batches

class xnmt.batchers.Mask(np_arr)[source]¶

Bases: object

An immutable mask specifies padded parts in a sequence or batch of sequences.

Masks are represented as numpy array of dimensions batchsize x seq_len, with parts belonging to the sequence set to 0, and parts that should be masked set to 1

Parameters:	np_arr (`ndarray`) – numpy array

cmult_by_timestep_expr(expr, timestep, inverse=False)[source]¶

Parameters:	expr (`Expression`) – a dynet expression corresponding to one timestep timestep (`Integral`) – index of current timestep inverse (`bool`) – True will keep the unmasked parts, False will zero out the unmasked parts
Return type:	`Expression`

class xnmt.batchers.Batcher(batch_size, granularity='sent', pad_src_to_multiple=1, sort_within_by_trg_len=True)[source]¶

Bases: object

A template class to convert a list of sentences to several batches of sentences.

Parameters:	batch_size (`Integral`) – batch size granularity (`str`) – ‘sent’ or ‘word’ pad_src_to_multiple (`Integral`) – pad source sentences so its length is multiple of this integer. sort_within_by_trg_len (`bool`) – whether to sort by reverse trg len inside a batch

is_random()[source]¶

Return type:	`bool`
Returns:	True if there is some randomness in the batching process, False otherwise.

create_single_batch(src_sents, trg_sents=None, sort_by_trg_len=False)[source]¶

Create a single batch, either source-only or source-and-target.

Parameters:	src_sents (`Sequence`[`Sentence`]) – list of source-side inputs trg_sents (`Optional`[`Sequence`[`Sentence`]]) – optional list of target-side inputs sort_by_trg_len (`bool`) – if True (and targets are specified), sort source- and target batches by target length
Return type:	`Union`[`Batch`, `Tuple`[`Batch`]]
Returns:	a tuple of batches if targets were given, otherwise a single batch

pack(src, trg)[source]¶

Create a list of src/trg batches based on provided src/trg inputs.

Parameters:	src (`Sequence`[`Sentence`]) – list of src-side inputs trg (`Sequence`[`Sentence`]) – list of trg-side inputs
Return type:	`Tuple`[`Sequence`[`Batch`], `Sequence`[`Batch`]]
Returns:	tuple of lists of src and trg batches

class xnmt.batchers.InOrderBatcher(batch_size=1, pad_src_to_multiple=1)[source]¶

Bases: xnmt.batchers.Batcher, xnmt.persistence.Serializable

A class to create batches in order of the original corpus, both across and within batches.

Parameters:	batch_size (`Integral`) – batch size pad_src_to_multiple (`Integral`) – pad source sentences so its length is multiple of this integer.

pack(src, trg)[source]¶

Pack batches. Unlike other batches, the trg sentences are optional.

Parameters:	src (`Sequence`[`Sentence`]) – list of src-side inputs trg (`Optional`[`Sequence`[`Sentence`]]) – optional list of trg-side inputs
Return type:	`Tuple`[`Sequence`[`Batch`], `Sequence`[`Batch`]]
Returns:	src batches if trg was not given; tuple of src batches and trg batches if trg was given

class xnmt.batchers.ShuffleBatcher(batch_size, granularity='sent', pad_src_to_multiple=1)[source]¶

Bases: xnmt.batchers.Batcher

A template class to create batches through randomly shuffling without sorting.

Sentences inside each batch are sorted by reverse trg length.

Parameters:	batch_size (`Integral`) – batch size granularity (`str`) – ‘sent’ or ‘word’ pad_src_to_multiple (`Integral`) – pad source sentences so its length is multiple of this integer.

pack(src, trg)[source]¶

Create a list of src/trg batches based on provided src/trg inputs.

Parameters:	src (`Sequence`[`Sentence`]) – list of src-side inputs trg (`Optional`[`Sequence`[`Sentence`]]) – list of trg-side inputs
Return type:	`Tuple`[`Sequence`[`Batch`], `Sequence`[`Batch`]]
Returns:	tuple of lists of src and trg batches

is_random()[source]¶

Returns: True if there is some randomness in the batching process, False otherwise.

Return type:	`bool`

class xnmt.batchers.SortBatcher(batch_size, granularity='sent', sort_key=<function SortBatcher.<lambda>>, break_ties_randomly=True, pad_src_to_multiple=1)[source]¶

Bases: xnmt.batchers.Batcher

A template class to create batches through bucketing sentence length.

Sentences inside each batch are sorted by reverse trg length.

Parameters:	batch_size (`Integral`) – batch size granularity (`str`) – ‘sent’ or ‘word’ pad_src_to_multiple (`Integral`) – pad source sentences so its length is multiple of this integer.

pack(src, trg)[source]¶

Create a list of src/trg batches based on provided src/trg inputs.

Parameters:	src (`Sequence`[`Sentence`]) – list of src-side inputs trg (`Optional`[`Sequence`[`Sentence`]]) – list of trg-side inputs
Return type:	`Tuple`[`Sequence`[`Batch`], `Sequence`[`Batch`]]
Returns:	tuple of lists of src and trg batches

is_random()[source]¶

Returns: True if there is some randomness in the batching process, False otherwise.

Return type:	`bool`

xnmt.batchers.mark_as_batch(data, mask=None)[source]¶

Mark a sequence of items as batch

Parameters:	data (`Sequence`[+T_co]) – sequence of things mask (`Optional`[`Mask`]) – optional mask

Returns: a batch of things

Return type:	`Batch`

xnmt.batchers.is_batched(data)[source]¶

Check whether some data is batched.

Parameters:	data (`Sequence`[+T_co]) – data to check
Return type:	`bool`
Returns:	True iff data is batched.

xnmt.batchers.pad(batch, pad_to_multiple=1)[source]¶

Apply padding to sentences in a batch.

Parameters:	batch (`Sequence`[+T_co]) – batch of sentences pad_to_multiple (`Integral`) – pad sentences so their length is a multiple of this integer.
Return type:	`Batch`
Returns:	batch containing padded items and a corresponding batch mask.

class xnmt.batchers.SrcBatcher(batch_size, break_ties_randomly=True, pad_src_to_multiple=1)[source]¶

Bases: xnmt.batchers.SortBatcher, xnmt.persistence.Serializable

A batcher that creates fixed-size batches, grouped by src len.

Sentences inside each batch are sorted by reverse trg length.

Parameters:	batch_size (`Integral`) – batch size break_ties_randomly (`bool`) – if True, randomly shuffle sentences of the same src length before creating batches. pad_src_to_multiple (`Integral`) – pad source sentences so its length is multiple of this integer.

class xnmt.batchers.TrgBatcher(batch_size, break_ties_randomly=True, pad_src_to_multiple=1)[source]¶

Bases: xnmt.batchers.SortBatcher, xnmt.persistence.Serializable

A batcher that creates fixed-size batches, grouped by trg len.

Sentences inside each batch are sorted by reverse trg length.

Parameters:	batch_size (`Integral`) – batch size break_ties_randomly (`bool`) – if True, randomly shuffle sentences of the same src length before creating batches. pad_src_to_multiple (`Integral`) – pad source sentences so its length is multiple of this integer.

class xnmt.batchers.SrcTrgBatcher(batch_size, break_ties_randomly=True, pad_src_to_multiple=1)[source]¶

Bases: xnmt.batchers.SortBatcher, xnmt.persistence.Serializable

A batcher that creates fixed-size batches, grouped by src len, then trg len.

Sentences inside each batch are sorted by reverse trg length.

Parameters:	batch_size (`Integral`) – batch size break_ties_randomly (`bool`) – if True, randomly shuffle sentences of the same src length before creating batches. pad_src_to_multiple (`Integral`) – pad source sentences so its length is multiple of this integer.

class xnmt.batchers.TrgSrcBatcher(batch_size, break_ties_randomly=True, pad_src_to_multiple=1)[source]¶

Bases: xnmt.batchers.SortBatcher, xnmt.persistence.Serializable

A batcher that creates fixed-size batches, grouped by trg len, then src len.

Sentences inside each batch are sorted by reverse trg length.

Parameters:	batch_size (`Integral`) – batch size break_ties_randomly (`bool`) – if True, randomly shuffle sentences of the same src length before creating batches. pad_src_to_multiple (`Integral`) – pad source sentences so its length is multiple of this integer.

class xnmt.batchers.SentShuffleBatcher(batch_size, pad_src_to_multiple=1)[source]¶

Bases: xnmt.batchers.ShuffleBatcher, xnmt.persistence.Serializable

A batcher that creates fixed-size batches of random order.

Sentences inside each batch are sorted by reverse trg length.

Parameters:	batch_size (`Integral`) – batch size pad_src_to_multiple (`Integral`) – pad source sentences so its length is multiple of this integer.

class xnmt.batchers.WordShuffleBatcher(words_per_batch, pad_src_to_multiple=1)[source]¶

Bases: xnmt.batchers.ShuffleBatcher, xnmt.persistence.Serializable

A batcher that creates fixed-size batches, grouped by src len.

Sentences inside each batch are sorted by reverse trg length.

Parameters:	words_per_batch (`Integral`) – number of src+trg words in each batch pad_src_to_multiple (`Integral`) – pad source sentences so its length is multiple of this integer.

class xnmt.batchers.WordSortBatcher(words_per_batch, avg_batch_size, sort_key, break_ties_randomly=True, pad_src_to_multiple=1)[source]¶

Bases: xnmt.batchers.SortBatcher

Base class for word sort-based batchers.

Sentences inside each batch are sorted by reverse trg length.

Parameters:

words_per_batch (Optional[Integral]) – number of src+trg words in each batch
avg_batch_size (Optional[Real]) – avg number of sentences in each batch (if words_per_batch not given)
sort_key (Callable) –
break_ties_randomly (bool) – if True, randomly shuffle sentences of the same src length before creating batches.
pad_src_to_multiple (Integral) – pad source sentences so its length is multiple of this integer.

class xnmt.batchers.WordSrcBatcher(words_per_batch=None, avg_batch_size=None, break_ties_randomly=True, pad_src_to_multiple=1)[source]¶

Bases: xnmt.batchers.WordSortBatcher, xnmt.persistence.Serializable

A batcher that creates variable-sized batches with given average (src+trg) words per batch, grouped by src len.

Sentences inside each batch are sorted by reverse trg length.

Parameters:

words_per_batch (Optional[Integral]) – number of src+trg words in each batch
avg_batch_size (Optional[Real]) – avg number of sentences in each batch (if words_per_batch not given)
break_ties_randomly (bool) – if True, randomly shuffle sentences of the same src length before creating batches.
pad_src_to_multiple (Integral) – pad source sentences so its length is multiple of this integer.

class xnmt.batchers.WordTrgBatcher(words_per_batch=None, avg_batch_size=None, break_ties_randomly=True, pad_src_to_multiple=1)[source]¶

Bases: xnmt.batchers.WordSortBatcher, xnmt.persistence.Serializable

A batcher that creates variable-sized batches with given average (src+trg) words per batch, grouped by trg len.

Sentences inside each batch are sorted by reverse trg length.

Parameters:

words_per_batch (Optional[Integral]) – number of src+trg words in each batch
avg_batch_size (Optional[Real]) – avg number of sentences in each batch (if words_per_batch not given)
break_ties_randomly (bool) – if True, randomly shuffle sentences of the same src length before creating batches.
pad_src_to_multiple (Integral) – pad source sentences so its length is multiple of this integer.

class xnmt.batchers.WordSrcTrgBatcher(words_per_batch=None, avg_batch_size=None, break_ties_randomly=True, pad_src_to_multiple=1)[source]¶

Bases: xnmt.batchers.WordSortBatcher, xnmt.persistence.Serializable

A batcher that creates variable-sized batches with given average number of src + trg words per batch, grouped by src len, then trg len.

Sentences inside each batch are sorted by reverse trg length.

Parameters:

words_per_batch (Optional[Integral]) – number of src+trg words in each batch
avg_batch_size (Optional[Real]) – avg number of sentences in each batch (if words_per_batch not given)
break_ties_randomly (bool) – if True, randomly shuffle sentences of the same src length before creating batches.
pad_src_to_multiple (Integral) – pad source sentences so its length is multiple of this integer.

class xnmt.batchers.WordTrgSrcBatcher(words_per_batch=None, avg_batch_size=None, break_ties_randomly=True, pad_src_to_multiple=1)[source]¶

Bases: xnmt.batchers.WordSortBatcher, xnmt.persistence.Serializable

A batcher that creates variable-sized batches with given average number of src + trg words per batch, grouped by trg len, then src len.

Sentences inside each batch are sorted by reverse trg length.

Parameters:

words_per_batch (Optional[Integral]) – number of src+trg words in each batch
avg_batch_size (Optional[Real]) – avg number of sentences in each batch (if words_per_batch not given)
break_ties_randomly (bool) – if True, randomly shuffle sentences of the same src length before creating batches.
pad_src_to_multiple (Integral) – pad source sentences so its length is multiple of this integer.

xnmt.batchers.truncate_batches(*xl)[source]¶

Truncate a list of batched items so that all items have the batch size of the input with the smallest batch size.

Inputs can be of various types and would usually correspond to a single time step. Assume that the batch elements with index 0 correspond across the inputs, so that batch elements will be truncated from the top, i.e. starting with the highest-indexed batch elements. Masks are not considered even if attached to a input of Batch type.

Parameters:	*xl – batched timesteps of various types
Return type:	`Sequence`[`Union`[`Expression`, `Batch`, `Mask`, `UniLSTMState`]]
Returns:	Copies of the inputs, truncated to consistent batch size.

Preprocessing¶

class xnmt.preproc.PreprocRunner(tasks=None, overwrite=False)[source]¶

Bases: xnmt.persistence.Serializable

Preprocess and filter the input files, and create the vocabulary.

Parameters:

tasks (Optional[List[PreprocTask]]) – A list of preprocessing steps, usually parametrized by in_files (the input files), out_files (the output files), and spec for that particular preprocessing type The types of arguments that preproc_spec expects: * Option(“in_files”, help_str=”list of paths to the input files”), * Option(“out_files”, help_str=”list of paths for the output files”), * Option(“spec”, help_str=”The specifications describing which type of processing to use. For normalize and vocab, should consist of the ‘lang’ and ‘spec’, where ‘lang’ can either be ‘all’ to apply the same type of processing to all languages, or a zero-indexed integer indicating which language to process.”),
overwrite (bool) – Whether to overwrite files if they already exist.

class xnmt.preproc.PreprocExtract(in_files, out_files, specs)[source]¶: Bases: xnmt.preproc.PreprocTask, xnmt.persistence.Serializable

class xnmt.preproc.PreprocTokenize(in_files, out_files, specs)[source]¶: Bases: xnmt.preproc.PreprocTask, xnmt.persistence.Serializable

class xnmt.preproc.PreprocNormalize(in_files, out_files, specs)[source]¶: Bases: xnmt.preproc.PreprocTask, xnmt.persistence.Serializable

class xnmt.preproc.PreprocFilter(in_files, out_files, specs)[source]¶: Bases: xnmt.preproc.PreprocTask, xnmt.persistence.Serializable

class xnmt.preproc.PreprocVocab(in_files, out_files, specs)[source]¶: Bases: xnmt.preproc.PreprocTask, xnmt.persistence.Serializable

class xnmt.preproc.Normalizer[source]¶

Bases: object

A type of normalization to perform to a file. It is initialized first, then expanded.

normalize(sent)[source]¶

Takes a plain text string and converts it into another plain text string after preprocessing.

Return type:	`str`

class xnmt.preproc.NormalizerLower[source]¶

Bases: xnmt.preproc.Normalizer, xnmt.persistence.Serializable

Lowercase the text.

normalize(sent)[source]¶

Takes a plain text string and converts it into another plain text string after preprocessing.

Return type:	`str`

class xnmt.preproc.NormalizerRemovePunct(remove_inside_word=False, allowed_chars='')[source]¶

Bases: xnmt.preproc.Normalizer, xnmt.persistence.Serializable

Remove punctuation from the text.

Parameters:	remove_inside_word (`bool`) – If `False`, only remove punctuation appearing adjacent to white space. allowed_chars (`str`) – Specify punctuation that is allowed and should not be removed.

normalize(sent)[source]¶

Takes a plain text string and converts it into another plain text string after preprocessing.

Return type:	`str`

class xnmt.preproc.Tokenizer[source]¶

Bases: xnmt.preproc.Normalizer

Pass the text through an internal or external tokenizer.

TODO: only StreamTokenizers are supported by the preproc runner right now.

tokenize_stream(stream)[source]¶

Tokenize a file-like text stream.

Parameters:	stream – A file-like stream of untokenized text
Returns:	A file-like stream of tokenized text

class xnmt.preproc.BPETokenizer(vocab_size, train_files)[source]¶

Bases: xnmt.preproc.Tokenizer, xnmt.persistence.Serializable

Class for byte-pair encoding tokenizer.

TODO: Unimplemented

tokenize(sent)[source]¶: Tokenizes a single sentence according to the determined BPE.

class xnmt.preproc.CharacterTokenizer[source]¶

Bases: xnmt.preproc.Tokenizer, xnmt.persistence.Serializable

Tokenize into characters, with __ indicating blank spaces

tokenize(sent)[source]¶

Tokenizes a single sentence into characters.

Return type:	`str`

class xnmt.preproc.UnicodeTokenizer(use_merge_symbol=True, merge_symbol='↹', reverse=False)[source]¶

Bases: xnmt.preproc.Tokenizer, xnmt.persistence.Serializable

Tokenizer that inserts whitespace between words and punctuation.

This tokenizer is language-agnostic and (optionally) reversible, and is based on unicode character categories. See appendix of https://arxiv.org/pdf/1804.08205

Parameters:	use_merge_symbol (`bool`) – whether to prepend a merge-symbol so that the tokenization becomes reversible merge_symbol (`str`) – the merge symbol to use reverse (`bool`) – whether to reverse tokenization (assumes use_merge_symbol=True was used in forward direction)

tokenize(sent)[source]¶

Tokenizes a single sentence.

Parameters:	sent (`str`) – input sentence
Return type:	`str`
Returns:	output sentence

class xnmt.preproc.ExternalTokenizer(path, tokenizer_args=None, arg_separator=' ')[source]¶

Bases: xnmt.preproc.Tokenizer, xnmt.persistence.Serializable

Class for arbitrary external tokenizer that accepts untokenized text to stdin and emits tokenized tezt to stdout, with passable parameters.

It is assumed that in general, external tokenizers will be more efficient when run once per file, so are run as such (instead of one-execution-per-line.)

Parameters:	path (`str`) – tokenizer_args (`Optional`[`Sequence`[`str`]]) – arg_separator (`str`) –

tokenize(sent)[source]¶

Pass the sentence through the external tokenizer.

Parameters:	sent (`str`) – An untokenized sentence
Return type:	`str`
Returns:	A tokenized sentence

class xnmt.preproc.SentencepieceTokenizer(train_files, vocab_size, overwrite=False, model_prefix='sentpiece', output_format='piece', model_type='bpe', hard_vocab_limit=True, encode_extra_options=None, decode_extra_options=None)[source]¶

Bases: xnmt.preproc.Tokenizer, xnmt.persistence.Serializable

Sentencepiece tokenizer The options supported by the SentencepieceTokenizer are almost exactly those presented in the Sentencepiece readme, namely:

Parameters:

train_files (Sequence[str]) –
vocab_size (Integral) – fixes the vocabulary size
overwrite (bool) –
model_prefix (str) – The trained bpe model will be saved under {model_prefix}.model/.vocab
output_format (str) –
model_type (str) – Either unigram (default), bpe, char or word. Please refer to the sentencepiece documentation for more details
hard_vocab_limit (bool) – setting this to False will make the vocab size a soft limit. Useful for small datasets. This is True by default.
encode_extra_options (Optional[str]) –
decode_extra_options (Optional[str]) –

tokenize(sent)[source]¶

Tokenizes a single sentence into pieces.

Return type:	`str`

class xnmt.preproc.SentenceFilterer(spec)[source]¶

Bases: object

Filters sentences that don’t match a criterion.

keep(sents)[source]¶

Takes a list of inputs/outputs for a single sentence and decides whether to keep them.

In general, these inputs/outpus should already be segmented into words, so len() will return the number of words, not the number of characters.

Parameters:	sents (`list`) – A list of parallel sentences.
Return type:	`bool`
Returns:	True if they should be used or False if they should be filtered.

class xnmt.preproc.SentenceFiltererMatchingRegex(regex_src, regex_trg, regex_all)[source]¶

Bases: xnmt.preproc.SentenceFilterer

Filters sentences via regular expressions. A sentence must match the expression to be kept.

keep(sents)[source]¶

Keep only sentences that match the regex.

Return type:	`bool`

class xnmt.preproc.SentenceFiltererLength(min_src=None, max_src=None, min_trg=None, max_trg=None, min_all=None, max_all=None)[source]¶

Bases: xnmt.preproc.SentenceFilterer, xnmt.persistence.Serializable

Filters sentences by length

keep(sents)[source]¶

Filter sentences by length.

Return type:	`bool`

class xnmt.preproc.VocabFilterer(spec)[source]¶

Bases: object

Filters vocabulary by some criterion

filter(vocab)[source]¶

Filter a vocabulary.

Parameters:	vocab (`Dict`[`str`, `Integral`]) – A dictionary of vocabulary words with their frequecies.
Return type:	`Dict`[`str`, `Integral`]
Returns:	A new dictionary with frequencies containing only the words to leave in the vocabulary.

class xnmt.preproc.VocabFiltererFreq(min_freq)[source]¶

Bases: xnmt.preproc.VocabFilterer, xnmt.persistence.Serializable

Filter the vocabulary, removing words below a particular minimum frequency

filter(vocab)[source]¶

Filter a vocabulary.

Parameters:	vocab – A dictionary of vocabulary words with their frequecies.
Returns:	A new dictionary with frequencies containing only the words to leave in the vocabulary.

class xnmt.preproc.VocabFiltererRank(max_rank)[source]¶

Bases: xnmt.preproc.VocabFilterer, xnmt.persistence.Serializable

Filter the vocabulary, removing words above a particular frequency rank

filter(vocab)[source]¶

Filter a vocabulary.

Parameters:	vocab – A dictionary of vocabulary words with their frequecies.
Returns:	A new dictionary with frequencies containing only the words to leave in the vocabulary.

class xnmt.preproc.Extractor[source]¶

Bases: object

A type of extraction task to perform.

class xnmt.preproc.MelFiltExtractor(nfilt=40, delta=False)[source]¶

Bases: xnmt.preproc.Extractor, xnmt.persistence.Serializable

extract_to(in_file, out_file)[source]¶

Parameters:	in_file (`str`) – yaml file that contains a list of dictionaries. Each dictionary contains: - wav (str): path to wav file - offset (float): start time stamp (optional) - duration (float): stop time stamp (optional) - speaker: speaker id for normalization (optional; if not given, the filename is used as speaker id) out_file (`str`) – a filename ending in “.h5”
Return type:	`None`

class xnmt.preproc.LatticeFromPlfExtractor[source]¶

Bases: xnmt.preproc.Extractor, xnmt.persistence.Serializable

Creates node-labeled lattices that can be read by the LatticeInputReader.

The input to this extractor is a list of edge-labeled lattices in PLF format. The PLF format is described here: http://www.statmt.org/moses/?n=Moses.WordLattices It is used, among others, in the Fisher/Callhome Spanish-to-English Speech Translation Corpus (Post et al, 2013).

Persistence¶

This module takes care of loading and saving YAML files. Both configuration files and saved models are stored in the same YAML file format.

The main objects to be aware of are:

Serializable: must be subclassed by all components that are specified in a YAML file.
Ref: a reference that points somewhere in the object hierarchy, for both convenience and to realize parameter sharing.
Repeat: a syntax for creating a list components with same configuration but without parameter sharing.
YamlPreloader: pre-loads YAML contents so that some infrastructure can be set up, but does not initialize components.
initialize_if_needed(), initialize_object(): initialize a preloaded YAML tree, taking care of resolving references etc.
save_to_file(): saves a YAML file along with registered DyNet parameters
LoadSerialized: can be used to load, modify, and re-assemble pretrained models.
bare(): create uninitialized objects, usually for the purpose of specifying them as default arguments.
RandomParam: a special Serializable subclass that realizes random parameter search.

class xnmt.persistence.Serializable[source]¶

Bases: yaml.YAMLObject

All model components that appear in a YAML file must inherit from Serializable. Implementing classes must specify a unique yaml_tag class attribute, e.g. yaml_tag = "!Serializable"

shared_params()[source]¶

Return the shared parameters of this Serializable class.

This can be overwritten to specify what parameters of this component and its subcomponents are shared. Parameter sharing is performed before any components are initialized, and can therefore only include basic data types that are already present in the YAML file (e.g. # dimensions, etc.) Sharing is performed if at least one parameter is specified and multiple shared parameters don’t conflict. In case of conflict a warning is printed, and no sharing is performed. The ordering of shared parameters is irrelevant. Note also that if a submodule is replaced by a reference, its shared parameters are ignored.

Return type:	`List`[`Set`[`Union`[`str`, `Path`]]]
Returns:	objects referencing params of this component or a subcompononent e.g.: return [set([".input_dim", ".sub_module.input_dim", ".submodules_list.0.input_dim"])]

save_processed_arg(key, val)[source]¶

Save a new value for an init argument (call from within __init__()).

Normally, the serialization mechanism makes sure that the same arguments are passed when creating the class initially based on a config file, and when loading it from a saved model. This method can be called from inside __init__() to save a new value that will be passed when loading the saved model. This can be useful when one doesn’t want to recompute something every time (like a vocab) or when something has been passed via implicit referencing which might yield inconsistent result when loading the model to assemble a new model of different structure.

Parameters:	key (`str`) – name of property, must match an argument of `__init__()` val (`Any`) – new value; a `Serializable` or basic Python type or list or dict of these
Return type:	`None`

add_serializable_component(name, passed, create_fct)[source]¶

Create a Serializable component, or a container component with several Serializable-s.

Serializable sub-components should always be created using this helper to make sure DyNet parameters are assigned properly and serialization works properly. The components must also be accepted as init arguments, defaulting to None. The helper makes sure that components are only created if None is passed, otherwise the passed component is reused.

The idiom for using this for an argument named my_comp would be:

def __init__(self, my_comp=None, other_args, ...):
  ...
  my_comp = self.add_serializable_component("my_comp", my_comp, lambda: SomeSerializable(other_args))
  # now, do something with my_comp
  ...

Parameters:	name (`str`) – name of the object passed (`Any`) – object as passed in the constructor. If `None`, will be created using create_fct. create_fct (`Callable`[[], `Any`]) – a callable with no arguments that returns a `Serializable` or a collection of `Serializable`-s. When loading a saved model, this same object will be passed via the `passed` argument, and `create_fct` is not invoked.
Return type:	`Any`
Returns:	reused or newly created object(s).

class xnmt.persistence.UninitializedYamlObject(data)[source]¶

Bases: object

Wrapper class to indicate an object created by the YAML parser that still needs initialization.

Parameters:	data (`Any`) – uninitialized object

xnmt.persistence.bare(class_type, **kwargs)[source]¶

Create an uninitialized object of arbitrary type.

This is useful to specify XNMT components as default arguments. __init__() commonly requires DyNet parameters, component referencing, etc., which are not yet set up at the time the default arguments are loaded. In this case, a bare class can be specified with the desired arguments, and will be properly initialized when passed as arguments into a component.

Parameters:	class_type (`Type`[~T]) – class type (must be a subclass of `Serializable`) kwargs (`Any`) – will be passed to class’s `__init__()`
Return type:	~T
Returns:	uninitialized object

class xnmt.persistence.Ref(path=None, name=None, default=1928437192847)[source]¶

Bases: xnmt.persistence.Serializable

A reference to somewhere in the component hierarchy.

Components can be referenced by path or by name.

Parameters:	path (`Union`[`None`, `Path`, `str`]) – reference by path name (`Optional`[`str`]) – reference by name. The name refers to a unique `_xnmt_id` property that must be set in exactly one component.

get_name()[source]¶

Return name, or None if this is not a named reference

Return type:	`str`

get_path()[source]¶

Return path, or None if this is a named reference

Return type:	`Optional`[`Path`]

is_required()[source]¶

Return True iff there exists no default value and it is mandatory that this reference be resolved.

Return type:	`bool`

get_default()[source]¶

Return default value, or Ref.NO_DEFAULT if no default value is set (i.e., this is a required reference).

Return type:	`Any`

resolve_path(named_paths)[source]¶

Get path, resolving paths properly in case this is a named reference.

Return type:	`Path`

class xnmt.persistence.Path(path_str='')[source]¶

Bases: object

A relative or absolute path in the component hierarchy.

Paths are immutable: Operations that change the path always return a new Path object.

Parameters:	path_str (`str`) – path string, with period `.` as separator. If prefixed by `.`, marks a relative path, otherwise absolute.

append(link)[source]¶

Return a new path by appending a link.

Parameters:	link (`str`) – link to append

Returns: new path

Return type:	`Path`

add_path(path_to_add)[source]¶

Concatenates a path

Parameters:	path_to_add (`Path`) – path to concatenate

Returns: concatenated path

Return type:	`Path`

class xnmt.persistence.Repeat(times, content)[source]¶

Bases: xnmt.persistence.Serializable

A special object that is replaced by a list of components with identical configuration but not with shared params.

This can be specified anywhere in the config hierarchy where normally a list is expected. A common use case is a multi-layer neural architecture, where layer configurations are repeated many times. It is replaced in the preloader and cannot be instantiated directly.

exception xnmt.persistence.PathError(message)[source]¶: Bases: Exception

class xnmt.persistence.SavedFormatString(value, unformatted_value)[source]¶: Bases: str, xnmt.persistence.Serializable

class xnmt.persistence.FormatString(value, serialize_as)[source]¶

Bases: str, yaml.YAMLObject

Used to handle the {EXP} string formatting syntax. When passed around it will appear like the properly resolved string, but writing it back to YAML will use original version containing {EXP}

class xnmt.persistence.RandomParam(values)[source]¶: Bases: yaml.YAMLObject

class xnmt.persistence.LoadSerialized(filename, path='', overwrite=None)[source]¶

Bases: xnmt.persistence.Serializable

Load content from an external YAML file.

This object points to an object in an external YAML file and will be replaced by the corresponding content by the YAMLPreloader.

Parameters:

filename (str) – YAML file name to load from
path (str) – path inside the YAML file to load from, with . separators. Empty string denotes root.
overwrite (Optional[List[Dict[str, Any]]]) –
allows overwriting parts of the loaded model with new content. A list of path/val dictionaries, where path is a path string relative to the loaded sub-object following the syntax of Path, and val is a Yaml-serializable specifying the new content. E.g.:
```
[{"path" : "model.trainer", "val":AdamTrainer()},
 {"path" : ..., "val":...}]
```
It is possible to specify the path to point to a new key to a dictionary. If path points to a list, it’s possible append to that list by using append_val instead of val.

class xnmt.persistence.YamlPreloader[source]¶

Bases: object

Loads experiments from YAML and performs basic preparation, but does not initialize objects.

Has the following responsibilities:

takes care of extracting individual experiments from a YAML file
replaces !LoadSerialized by loading the corresponding content
resolves kwargs syntax (items from a kwargs dictionary are moved to the owner where they become object attributes)
implements random search (draws proper random values when !RandomParam is encountered)
finds and replaces placeholder strings such as {EXP}, {EXP_DIR}, {GIT_REV}, and {PID}
copies bare default arguments into the corresponding objects where appropriate.

Typically, initialize_object() would be invoked by passing the result from the YamlPreloader.

static experiment_names_from_file(filename)[source]¶

Return list of experiment names.

Parameters:	filename (`str`) – path to YAML file
Return type:	`List`[`str`]
Returns:	experiment names occuring in the given file in lexicographic order.

static preload_experiment_from_file(filename, exp_name, resume=False)[source]¶

Preload experiment from YAML file.

Parameters:	filename (`str`) – YAML config file name exp_name (`str`) – experiment name to load resume (`bool`) – set to True if we are loading a saved model file directly and want to restore all formatted strings.
Return type:	`UninitializedYamlObject`
Returns:	Preloaded but uninitialized object.

static preload_obj(root, exp_name, exp_dir, resume=False)[source]¶

Preload a given object.

Preloading a given object, usually an xnmt.experiment.Experiment or LoadSerialized object as parsed by pyyaml, includes replacing !LoadSerialized, resolving kwargs syntax, and instantiating random search.

Parameters:	root (`Any`) – object to preload exp_name (`str`) – experiment name, needed to replace `{EXP}` exp_dir (`str`) – directory of the corresponding config file, needed to replace `{EXP_DIR}` resume (`bool`) – if True, keep the formatted strings, e.g. set `{EXP}` to the value of the previous run if possible
Return type:	`UninitializedYamlObject`
Returns:	Preloaded but uninitialized object.

xnmt.persistence.save_to_file(fname, mod)[source]¶

Save a component hierarchy and corresponding DyNet parameter collection to disk.

Parameters:	fname (`str`) – Filename to save to. mod (`Any`) – Component hierarchy.
Return type:	`None`

xnmt.persistence.initialize_if_needed(root)[source]¶

Initialize if obj has not yet been initialized.

This includes parameter sharing and resolving of references.

Parameters:	root (`Union`[`Any`, `UninitializedYamlObject`]) – object to be potentially serialized
Return type:	`Any`
Returns:	initialized object

xnmt.persistence.initialize_object(root)[source]¶

Initialize an uninitialized object.

This includes parameter sharing and resolving of references.

Parameters:	root (`UninitializedYamlObject`) – object to be serialized
Return type:	`Any`
Returns:	initialized object

exception xnmt.persistence.ComponentInitError[source]¶: Bases: Exception

xnmt.persistence.check_type(obj, desired_type)[source]¶

Checks argument types using isinstance, or some custom logic if type hints from the ‘typing’ module are given.

Regarding type hints, only a few major ones are supported. This should cover almost everything that would be expected in a YAML config file, but might miss a few special cases. For unsupported types, this function evaluates to True. Most notably, forward references such as ‘SomeType’ (with apostrophes around the type) are not supported. Note also that typing.Tuple is among the unsupported types because tuples aren’t supported by the XNMT serializer.

Parameters:	obj – object whose type to check desired_type – desired type of obj
Returns:	False if types don’t match or desired_type is unsupported, True otherwise.

Reportable¶

Reports gather inputs, outputs, and intermediate computations in a nicely formatted way for convenient manual inspection.

To support reporting, the models providing the data to be reported must subclass Reportable and call self.report_sent_info(d) with key/value pairs containing the data to be reported at the appropriate times. If this causes a computational overhead, the boolean compute_report field should queried and extra computations skipped if this field is False.

Next, a Reporter needs to be specified that supports reports based on the previously created key/value pairs. Reporters are passed to inference classes, so it’s possible e.g. to report only at the final test decoding, or specify a special reporting inference object that only looks at a handful of sentences, etc.

Note that currently reporting is only supported at test-time, not at training time.

class xnmt.reports.ReportInfo(sent_info=[], glob_info={})[source]¶

Bases: object

Info to pass to reporter

Parameters:	sent_info (`Sequence`[`Dict`[`str`, `Any`]]) – list of dicts, one dict per sentence glob_info (`Dict`[`str`, `Any`]) – a global dict applicable to each sentence

class xnmt.reports.Reportable[source]¶

Bases: object

Base class for classes that contribute information to a report.

Making an arbitrary class reportable requires to do the following:

specify Reportable as base class
call this super class’s __init__(), or do @register_xnmt_handler manually
pass either global info or per-sentence info or both: - call self.report_sent_info(d) for each sentence, where d is a dictionary containing info to pass on to the

reporter
- call self.report_corpus_info(d) once, where d is a dictionary containing info to pass on to the reporter

report_sent_info(sent_info)[source]¶

Add key/value pairs belonging to the current sentence for reporting.

This should be called consistently for every sentence and in order.

Parameters:	sent_info (`Dict`[`str`, `Any`]) – A dictionary of key/value pairs. The keys must match (be a subset of) the arguments in the reporter’s `create_sent_report()` method, and the values must be of the corresponding types.
Return type:	`None`

report_corpus_info(glob_info)[source]¶

Add key/value pairs for reporting that are relevant to all reported sentences.

Parameters:	glob_info (`Dict`[`str`, `Any`]) – A dictionary of key/value pairs. The keys must match (be a subset of) the arguments in the reporter’s `create_sent_report()` method, and the values must be of the corresponding types.
Return type:	`None`

class xnmt.reports.Reporter[source]¶

Bases: object

A base class for a reporter that collects reportable information, formats it and writes it to disk.

create_sent_report(**kwargs)[source]¶

Create the report.

The reporter should specify the arguments it needs explicitly, and should specify kwargs in addition to handle extra (unused) arguments without crashing.

Parameters:	**kwargs – additional arguments
Return type:	`None`

class xnmt.reports.ReferenceDiffReporter(match_size=3, alt_norm=False, report_path='{EXP_DIR}/reports/{EXP}')[source]¶

Bases: xnmt.reports.Reporter, xnmt.persistence.Serializable

Reporter that uses the CharCut tool for nicely displayed difference highlighting between outputs and references.

The stand-alone tool can be found at https://github.com/alardill/CharCut

Parameters:	match_size (`Integral`) – min match size in characters (set < 3 e.g. for Japanese or Chinese) alt_norm (`bool`) – alternative normalization scheme: use only the candidate’s length for normalization report_path (`str`) – Path of directory to write HTML files to

create_sent_report(src, output, ref_file=None, **kwargs)[source]¶

Create report.

Parameters:	src (`Sentence`) – source-side input output (`ReadableSentence`) – generated output ref_file (`Optional`[`str`]) – path to reference file **kwargs – arguments to be ignored
Return type:	`None`

class xnmt.reports.CompareMtReporter(out2_file=None, train_file=None, train_counts=None, alpha=1.0, ngram=4, ngram_size=50, sent_size=10, report_path='{EXP_DIR}/reports/{EXP}')[source]¶

Bases: xnmt.reports.Reporter, xnmt.persistence.Serializable

Reporter that uses the compare-mt.py script to analyze and compare MT results.

The stand-alone tool can be found at https://github.com/neubig/util-scripts

Parameters:

out2_file (Optional[str]) – A path to another system output. Add only if you want to compare outputs from two systems.
train_file (Optional[str]) – A link to the training corpus target file
train_counts (Optional[str]) – A link to the training word frequency counts as a tab-separated “wordtfreq” file
alpha (Real) – A smoothing coefficient to control how much the model focuses on low- and high-frequency events. 1.0 should be fine most of the time.
ngram (Integral) – Maximum length of n-grams.
sent_size (Integral) – How many sentences to print.
ngram_size (Integral) – How many n-grams to print.
report_path (str) – Path of directory to write report files to

create_sent_report(output, ref_file, **kwargs)[source]¶

Create report.

Parameters:	output (`ReadableSentence`) – generated output ref_file (`str`) – path to reference file **kwargs – arguments to be ignored
Return type:	`None`

class xnmt.reports.HtmlReporter(report_name, report_path='{EXP_DIR}/reports/{EXP}')[source]¶

Bases: xnmt.reports.Reporter

A base class for reporters that produce HTML outputs that takes care of some common functionality.

Parameters:	report_name (`str`) – prefix for report files report_path (`str`) – Path of directory to write HTML and image files to

class xnmt.reports.AttentionReporter(max_num_sents=100, report_name='attention', report_path='{EXP_DIR}/reports/{EXP}')[source]¶

Bases: xnmt.reports.HtmlReporter, xnmt.persistence.Serializable

Reporter that writes attention matrices to HTML.

Parameters:	max_num_sents (`Optional`[`Integral`]) – create attention report for only the first n sentences report_name (`str`) – prefix for output files report_path (`str`) – Path of directory to write HTML and image files to

create_sent_report(src, output, attentions, ref_file, **kwargs)[source]¶

Create report.

Parameters:	src (`Sentence`) – source-side input output (`ReadableSentence`) – generated output attentions (`ndarray`) – attention matrices ref_file (`Optional`[`str`]) – path to reference file **kwargs – arguments to be ignored
Return type:	`None`

add_atts(attentions, src_tokens, trg_tokens, idx, desc='Attentions')[source]¶

Add attention matrix to HTML code.

Parameters:	attentions (`ndarray`) – numpy array of dimensions (src_len x trg_len) src_tokens (`Union`[`Sequence`[`str`], `ndarray`]) – list of strings (case of src text) or numpy array of dims (nfeat x speech_len) (case of src speech) trg_tokens (`Sequence`[`str`]) – list of string tokens idx (`Integral`) – sentence no desc (`str`) – readable description
Return type:	`None`

class xnmt.reports.SegmentationReporter(report_path='{EXP_DIR}/reports/{EXP}')[source]¶

Bases: xnmt.reports.Reporter, xnmt.persistence.Serializable

A reporter to be used with the segmenting encoder.

Parameters:	report_path (`str`) – Path of directory to write text files to

create_sent_report(segment_actions, src, **kwargs)[source]¶

Create the report.

The reporter should specify the arguments it needs explicitly, and should specify kwargs in addition to handle extra (unused) arguments without crashing.

Parameters:	**kwargs – additional arguments

class xnmt.reports.OOVStatisticsReporter(train_trg_file, report_path='{EXP_DIR}/reports/{EXP}')[source]¶

Bases: xnmt.reports.Reporter, xnmt.persistence.Serializable

A reporter that prints OOV statistics: recovered OOVs, fantasized new words, etc.

Some models such as character- or subword-based models can produce words that are not in the training. This is desirable when we produce a correct word that would have been an OOV with a word-based model but undesirable when we produce something that’s not a correct word. The reporter prints some statistics that help analyze the OOV behavior of the model.

Parameters:	train_trg_file (`str`) – path to word-tokenized training target file report_path (`str`) – Path of directory to write text files to

create_sent_report(output, ref_file, **kwargs)[source]¶

Create the report.

The reporter should specify the arguments it needs explicitly, and should specify kwargs in addition to handle extra (unused) arguments without crashing.

Parameters:	**kwargs – additional arguments
Return type:	`None`

Settings¶

Global settings that control the overall behavior of XNMT.

Currently, settings control the following:

OVERWRITE_LOG: whether logs should be overwritten (not overwriting helps when copy-pasting config files and forgetting to change the output location)
IMMEDIATE_COMPUTE: whether to execute DyNet in eager mode
CHECK_VALIDITY: configure xnmt and DyNet to perform checks of validity
RESOURCE_WARNINGS: whether to show resource warnings
LOG_LEVEL_CONSOLE: verbosity of console output (DEBUG | INFO | WARNING | ERROR | CRITICAL)
LOG_LEVEL_FILE: verbosity of file output (DEBUG | INFO | WARNING | ERROR | CRITICAL)
DEFAULT_MOD_PATH: default location to write models to
DEFAULT_LOG_PATH: default location to write out logs

There are several predefined configurations (Standard, Debug, Unittest), with Standard being used by default. Settings are specified from the command line using --settings={standard|debug|unittest} and should not be changed during execution.

It is possible to control individual settings by setting an environment variable of the same name, e.g. like this: OVERWRITE_LOG=1 python -m xnmt.xnmt_run_experiments my_config.yaml

To specify a custom configuration, subclass settings.Standard accordinly and add an alias to settings._aliases.

class xnmt.settings.Standard[source]¶

Bases: object

Standard configuration, used by default.

class xnmt.settings.Debug[source]¶

Bases: xnmt.settings.Standard

Adds checks and verbosity to help debugging code or configuration files.

class xnmt.settings.Unittest[source]¶

Bases: xnmt.settings.Standard

More checks and less verbosity, activated automatically when running the unit tests from the “test” package.