nlpboost.augmentation package

Submodules

nlpboost.augmentation.TextAugmenterPipeline module

class nlpboost.augmentation.TextAugmenterPipeline.NLPAugPipeline(steps, text_field: str = 'text')[source]

Bases: object

Augment text data, with various forms of augmenting. It uses nlpaug in the background.

The configuration of the augmentation pipeline is done with nlpboost.augmentation.augmenter_config.NLPAugConfig. NLPAugPipeline receives a list of configs of that type, where each config defines a type of augmentation technique to use, as well as the proportion of the train dataset that is to be augmented.

Parameters:

steps (List[nlpboost.augmentation.augmenter_config.NLPAugConfig]) – List of steps. Each step must be a NLPAugConfig instance.
text_field (str) – Name of the field in the dataset where texts are.

augment(samples)[source]

Augment data for datasets samples following the configuration defined at init.

Parameters:: samples – Samples from a datasets.Dataset
Returns:: Samples from a datasets.Dataset but processed.
Return type:: samples

nlpboost.augmentation.augmenter_config module

class nlpboost.augmentation.augmenter_config.NLPAugConfig(name: str, augmenter_cls: Optional[Any] = None, proportion: float = 0.1, aug_kwargs: Optional[Dict] = None)[source]