Generating a data set

In order to empirically assess the accuracy and computation time of the labelling algorithms, one requires a data set. We added various options for data set generation to our repository.

Generating an argumentation system

Instead of manually designing an argumentation system and loading it (using an ArgumentationSystemXLSXReader) we also provide functionality to automatically generate an argumentation system. The repository currently holds two types of generators: a random and a layered argumentation system generator.

Random argumentation system generator

class RandomArgumentationSystemGenerator(argumentation_system_generation_parameters)

Bases: modules.dataset_generator.argumentation_system_generator.argumentation_system_generator_interface.ArgumentationSystemGeneratorInterface

generate()

Randomly generate a new ArgumentationSystem based on the RandomArgumentationSystemGeneratorParameters.

Return type

ArgumentationSystem

Returns

The generated ArgumentationSystem.

class RandomArgumentationSystemGeneratorParameters(language_size, rule_size, rule_antecedent_distribution, queryable_size=None, queryable_ratio=None, allow_rules_for_queryables=True, allow_conclusion_in_antecedents=True, allow_inconsistent_antecedents=True)

Bases: object

__init__(language_size, rule_size, rule_antecedent_distribution, queryable_size=None, queryable_ratio=None, allow_rules_for_queryables=True, allow_conclusion_in_antecedents=True, allow_inconsistent_antecedents=True)

Parameters for randomly generating an ArgumentationSystem.

Parameters
  • language_size (int) – Number of Literals (including negations)

  • rule_size (Optional[int]) – Number of Rules

  • rule_antecedent_distribution (Dict[int, float]) – Number of Rules with a specific number of antecedents.

  • queryable_size (Optional[int]) – Number of Queryables.

  • queryable_ratio (Optional[float]) – Fraction of Queryables by the number of Literals.

  • allow_rules_for_queryables (bool) – Boolean indicating if there can be Rules for Queryables.

  • allow_conclusion_in_antecedents (bool) – Boolean indicating if a Rule can have its conclusion as an antecedent.

  • allow_inconsistent_antecedents (bool) – Boolean indicating if a Rule can have inconsistent antecedents.

Layered argumentation system generator

class LayeredArgumentationSystemGenerator(argumentation_system_generation_parameters)

Bases: modules.dataset_generator.argumentation_system_generator.argumentation_system_generator_interface.ArgumentationSystemGeneratorInterface

generate()

Generate an ArgumentationSystem with a layered structure according to the LayeredArgumentationSystemGeneratorParameters.

Return type

ArgumentationSystem

Returns

The generated ArgumentationSystem.

class LayeredArgumentationSystemGeneratorParameters(language_size, rule_size, rule_antecedent_distribution, literal_layer_distribution)

Bases: object

__init__(language_size, rule_size, rule_antecedent_distribution, literal_layer_distribution)

Parameters for randomly generating an ArgumentationSystem with a layered structure.

Parameters
  • language_size (int) – The number of Literals (including negations).

  • rule_size (int) – The number of Rules.

  • rule_antecedent_distribution (Dict[int, int]) – The number of Rules having a specific number of antecedents.

  • literal_layer_distribution (Dict[int, int]) – The number of Literals in a specific layer.

Computing properties for an ArgumentationTheory or ArgumentationSystem

compute_argumentation_theory_properties(argumentation_theory, verbose=False)

Compute some properties of the given ArgumentationTheory, such as the corresponding incomplete argumentation framework or the number of future ArgumentationTheories.

Parameters
  • argumentation_theory (ArgumentationTheory) – ArgumentationTheory for which properties are needed.

  • verbose – Boolean indicating if information should be printed.

Return type

ArgumentationTheoryProperties

Returns

ArgumentationTheoryProperties of the ArgumentationTheory.

enumerate_future_argumentation_theories(argumentation_theory, verbose=False)

Enumerate all future ArgumentationTheories of this ArgumentationTheory.

Parameters
  • argumentation_theory (ArgumentationTheory) – ArgumentationTheory for which future ArgumentationTheories should be enumerated.

  • verbose (bool) – Boolean indicating if information should be printed.

Return type

List[ArgumentationTheory]

Returns

All future ArgumentationTheories of this ArgumentationTheory.

compute_argumentation_system_properties(argumentation_system)

Compute some properties of the given ArgumentationSystem, such as the number of literals or rule antecedents.

Parameters

argumentation_system (ArgumentationSystem) – ArgumentationSystem for which properties are needed.

Return type

ArgumentationSystemProperties

Returns

ArgumentationSystemProperties of the ArgumentationSystem.

Generating a Dataset for a specific ArgumentationTheory

class DatasetGenerator(argumentation_system, argumentation_system_custom_name=None)

Bases: object

classmethod from_file(argumentation_system_file_name)

Generate a Dataset for an ArgumentationSystem that must still be read from a file.

Parameters

argumentation_system_file_name (str) – Name of ArgumentationSystem for which a Dataset should be generated.

Returns

Dataset for specified ArgumentationSystem.

generate_dataset(custom_dataset_name=None, include_ground_truth=True, verbose=True)

Generate a Dataset, where all possible ArgumentationTheories for the given ArgumentationSystem are generated. Note: for ArgumentationSystems with many Queryables, this takes a lot of time.

Parameters
  • custom_dataset_name (Optional[str]) – Optional, name of the Dataset. Otherwise a name based on the timestamp is chosen.

  • include_ground_truth (bool) – Boolean indicating if the ground truth should be computed. Note: this takes time!

  • verbose (bool) – Boolean indicating if information should be printed.

Return type

Dataset

Returns

The resulting Dataset.

generate_dataset_sample(custom_dataset_name=None, include_ground_truth=True, sample_size=1000, verbose=True)

Generate a Dataset, where the number of DatasetItems for each number of items in the knowledge base is specified. For example, if there are 4 Queryables in the ArgumentationSystem, then a knowledge base can contain either 0, 1, or 2 (=4/2) items. If, for example, sample_size = 10 then for each knowledge base size 10 ArgumentationTheories are generated, so the total number of DatasetItems is 30.

Parameters
  • custom_dataset_name (Optional[str]) – Optional, name of the Dataset. Otherwise a name based on the timestamp is chosen.

  • include_ground_truth (bool) – Boolean indicating if the ground truth should be computed. Note: this takes time!

  • sample_size (int) – Number of DatasetItems for each number of items in the knowledge base.

  • verbose (bool) – Boolean indicating if information should be printed.

Return type

Dataset

Returns

The resulting Dataset.

DataSet classes

class Dataset(name, argumentation_system_name, dataset_items)

Bases: object

__init__(name, argumentation_system_name, dataset_items)

A Dataset has a name, the name of its ArgumentationSystem and a list of DatasetItems.

Parameters
  • name (str) – Name of the Dataset.

  • argumentation_system_name (str) – Name of the ArgumentationSystem on which the Dataset is based.

  • dataset_items (List[DatasetItem]) – Items in the Dataset.

class DatasetItem(argumentation_system, argumentation_system_name, knowledge_base)

Bases: object

__init__(argumentation_system, argumentation_system_name, knowledge_base)

A DatasetItem has an ArgumentationSystem, its name and a knowledge base.

Parameters
  • argumentation_system (ArgumentationSystem) – The ArgumentationSystem on which the DatasetItem is based.

  • argumentation_system_name (str) – The name of the ArgumentationSystem.

  • knowledge_base (List[Queryable]) – The knowledge base (list of Queryables).

classmethod from_str(dataset_item_str)

Read the DatasetItem from a string.

Parameters

dataset_item_str (str) – String representation of the DatasetItem.

Returns

DatasetItem represented by the input string.

class AnnotatedDatasetItem(argumentation_system, argumentation_system_name, knowledge_base, topic_literal, gt_acceptability_label, gt_stability_label)

Bases: stability_label_algorithm.modules.dataset_generator.dataset_item.DatasetItem

An AnnotatedDatasetItem is a specific type of DatasetItem that also has a ground truth acceptability and stability label for a topic literal.

__init__(argumentation_system, argumentation_system_name, knowledge_base, topic_literal, gt_acceptability_label, gt_stability_label)

Create an AnnotatedDatasetItem.

Parameters
  • argumentation_system (ArgumentationSystem) – The ArgumentationSystem on which the DatasetItem is based.

  • argumentation_system_name (str) – The name of the ArgumentationSystem.

  • knowledge_base (List[Queryable]) – The knowledge base (list of Queryables).

  • topic_literal (Literal) – The Literal for which the ground truth is given.

  • gt_acceptability_label (StabilityLabel) – Ground truth acceptability label for the topic Literal.

  • gt_stability_label (StabilityLabel) – Ground truth stability label for the topic Literal.

classmethod from_str(dataset_item_str)

Read the DatasetItem from a string.

Parameters

dataset_item_str (str) – String representation of the DatasetItem.

Returns

DatasetItem represented by the input string.