nlpboost.augmentation package
Submodules
nlpboost.augmentation.TextAugmenterPipeline module
- class nlpboost.augmentation.TextAugmenterPipeline.NLPAugPipeline(steps, text_field: str = 'text')[source]
Bases:
objectAugment text data, with various forms of augmenting. It uses nlpaug in the background.
The configuration of the augmentation pipeline is done with nlpboost.augmentation.augmenter_config.NLPAugConfig. NLPAugPipeline receives a list of configs of that type, where each config defines a type of augmentation technique to use, as well as the proportion of the train dataset that is to be augmented.
- Parameters:
steps (List[nlpboost.augmentation.augmenter_config.NLPAugConfig]) – List of steps. Each step must be a NLPAugConfig instance.
text_field (str) – Name of the field in the dataset where texts are.
nlpboost.augmentation.augmenter_config module
- class nlpboost.augmentation.augmenter_config.NLPAugConfig(name: str, augmenter_cls: Optional[Any] = None, proportion: float = 0.1, aug_kwargs: Optional[Dict] = None)[source]
Bases:
objectConfiguration for augmenters.
- Parameters:
name (str) – Name of the data augmentation technique. Possible values currently are ocr (for OCR augmentation), contextual_w_e for Contextual Word Embedding augmentation, synonym, backtranslation, contextual_s_e for Contextual Word Embeddings for Sentence Augmentation, abstractive_summ. If using a custom augmenter class this can be a random name.
augmenter_cls (Any) – An optional augmenter class, from nlpaug library. Can be used instead of using an identifier name for loading the class (see param name of this class).
proportion (float) – Proportion of data augmentation.
aug_kwargs (Dict) – Arguments for the data augmentation class. See https://github.com/makcedward/nlpaug/blob/master/example/textual_augmenter.ipynb
- aug_kwargs: Dict = None
- augmenter_cls: Any = None
- name: str
- proportion: float = 0.1