nlpboost.augmentation package

Submodules

nlpboost.augmentation.TextAugmenterPipeline module

class nlpboost.augmentation.TextAugmenterPipeline.NLPAugPipeline(steps, text_field: str = 'text')[source]

Bases: object

Augment text data, with various forms of augmenting. It uses nlpaug in the background.

The configuration of the augmentation pipeline is done with nlpboost.augmentation.augmenter_config.NLPAugConfig. NLPAugPipeline receives a list of configs of that type, where each config defines a type of augmentation technique to use, as well as the proportion of the train dataset that is to be augmented.

Parameters:
  • steps (List[nlpboost.augmentation.augmenter_config.NLPAugConfig]) – List of steps. Each step must be a NLPAugConfig instance.

  • text_field (str) – Name of the field in the dataset where texts are.

augment(samples)[source]

Augment data for datasets samples following the configuration defined at init.

Parameters:

samples – Samples from a datasets.Dataset

Returns:

Samples from a datasets.Dataset but processed.

Return type:

samples

nlpboost.augmentation.augmenter_config module

class nlpboost.augmentation.augmenter_config.NLPAugConfig(name: str, augmenter_cls: Optional[Any] = None, proportion: float = 0.1, aug_kwargs: Optional[Dict] = None)[source]

Bases: object

Configuration for augmenters.

Parameters:
  • name (str) – Name of the data augmentation technique. Possible values currently are ocr (for OCR augmentation), contextual_w_e for Contextual Word Embedding augmentation, synonym, backtranslation, contextual_s_e for Contextual Word Embeddings for Sentence Augmentation, abstractive_summ. If using a custom augmenter class this can be a random name.

  • augmenter_cls (Any) – An optional augmenter class, from nlpaug library. Can be used instead of using an identifier name for loading the class (see param name of this class).

  • proportion (float) – Proportion of data augmentation.

  • aug_kwargs (Dict) – Arguments for the data augmentation class. See https://github.com/makcedward/nlpaug/blob/master/example/textual_augmenter.ipynb

aug_kwargs: Dict = None
augmenter_cls: Any = None
name: str
proportion: float = 0.1

Module contents