Generating a data set¶
In order to empirically assess the accuracy and computation time of the labelling algorithms, one requires a data set. We added various options for data set generation to our repository.
Generating an argumentation system¶
Instead of manually designing an argumentation system and loading it (using an
ArgumentationSystemXLSXReader)
we also provide functionality to automatically generate an argumentation system.
The repository currently holds two types of generators: a random and a layered argumentation system generator.
Random argumentation system generator¶
- class RandomArgumentationSystemGenerator(argumentation_system_generation_parameters)¶
Bases:
modules.dataset_generator.argumentation_system_generator.argumentation_system_generator_interface.ArgumentationSystemGeneratorInterface- generate()¶
Randomly generate a new ArgumentationSystem based on the RandomArgumentationSystemGeneratorParameters.
- Return type
- Returns
The generated ArgumentationSystem.
- class RandomArgumentationSystemGeneratorParameters(language_size, rule_size, rule_antecedent_distribution, queryable_size=None, queryable_ratio=None, allow_rules_for_queryables=True, allow_conclusion_in_antecedents=True, allow_inconsistent_antecedents=True)¶
Bases:
object- __init__(language_size, rule_size, rule_antecedent_distribution, queryable_size=None, queryable_ratio=None, allow_rules_for_queryables=True, allow_conclusion_in_antecedents=True, allow_inconsistent_antecedents=True)¶
Parameters for randomly generating an ArgumentationSystem.
- Parameters
language_size (
int) – Number of Literals (including negations)rule_size (
Optional[int]) – Number of Rulesrule_antecedent_distribution (
Dict[int,float]) – Number of Rules with a specific number of antecedents.queryable_size (
Optional[int]) – Number of Queryables.queryable_ratio (
Optional[float]) – Fraction of Queryables by the number of Literals.allow_rules_for_queryables (
bool) – Boolean indicating if there can be Rules for Queryables.allow_conclusion_in_antecedents (
bool) – Boolean indicating if a Rule can have its conclusion as an antecedent.allow_inconsistent_antecedents (
bool) – Boolean indicating if a Rule can have inconsistent antecedents.
Layered argumentation system generator¶
- class LayeredArgumentationSystemGenerator(argumentation_system_generation_parameters)¶
Bases:
modules.dataset_generator.argumentation_system_generator.argumentation_system_generator_interface.ArgumentationSystemGeneratorInterface- generate()¶
Generate an ArgumentationSystem with a layered structure according to the LayeredArgumentationSystemGeneratorParameters.
- Return type
- Returns
The generated ArgumentationSystem.
- class LayeredArgumentationSystemGeneratorParameters(language_size, rule_size, rule_antecedent_distribution, literal_layer_distribution)¶
Bases:
object- __init__(language_size, rule_size, rule_antecedent_distribution, literal_layer_distribution)¶
Parameters for randomly generating an ArgumentationSystem with a layered structure.
- Parameters
language_size (
int) – The number of Literals (including negations).rule_size (
int) – The number of Rules.rule_antecedent_distribution (
Dict[int,int]) – The number of Rules having a specific number of antecedents.literal_layer_distribution (
Dict[int,int]) – The number of Literals in a specific layer.
Computing properties for an ArgumentationTheory or ArgumentationSystem¶
- compute_argumentation_theory_properties(argumentation_theory, verbose=False)¶
Compute some properties of the given ArgumentationTheory, such as the corresponding incomplete argumentation framework or the number of future ArgumentationTheories.
- Parameters
argumentation_theory (
ArgumentationTheory) – ArgumentationTheory for which properties are needed.verbose – Boolean indicating if information should be printed.
- Return type
ArgumentationTheoryProperties- Returns
ArgumentationTheoryProperties of the ArgumentationTheory.
- enumerate_future_argumentation_theories(argumentation_theory, verbose=False)¶
Enumerate all future ArgumentationTheories of this ArgumentationTheory.
- Parameters
argumentation_theory (
ArgumentationTheory) – ArgumentationTheory for which future ArgumentationTheories should be enumerated.verbose (
bool) – Boolean indicating if information should be printed.
- Return type
List[ArgumentationTheory]- Returns
All future ArgumentationTheories of this ArgumentationTheory.
- compute_argumentation_system_properties(argumentation_system)¶
Compute some properties of the given ArgumentationSystem, such as the number of literals or rule antecedents.
- Parameters
argumentation_system (
ArgumentationSystem) – ArgumentationSystem for which properties are needed.- Return type
ArgumentationSystemProperties- Returns
ArgumentationSystemProperties of the ArgumentationSystem.
Generating a Dataset for a specific ArgumentationTheory¶
- class DatasetGenerator(argumentation_system, argumentation_system_custom_name=None)¶
Bases:
object- classmethod from_file(argumentation_system_file_name)¶
Generate a Dataset for an ArgumentationSystem that must still be read from a file.
- Parameters
argumentation_system_file_name (
str) – Name of ArgumentationSystem for which a Dataset should be generated.- Returns
Dataset for specified ArgumentationSystem.
- generate_dataset(custom_dataset_name=None, include_ground_truth=True, verbose=True)¶
Generate a Dataset, where all possible ArgumentationTheories for the given ArgumentationSystem are generated. Note: for ArgumentationSystems with many Queryables, this takes a lot of time.
- Parameters
custom_dataset_name (
Optional[str]) – Optional, name of the Dataset. Otherwise a name based on the timestamp is chosen.include_ground_truth (
bool) – Boolean indicating if the ground truth should be computed. Note: this takes time!verbose (
bool) – Boolean indicating if information should be printed.
- Return type
Dataset- Returns
The resulting Dataset.
- generate_dataset_sample(custom_dataset_name=None, include_ground_truth=True, sample_size=1000, verbose=True)¶
Generate a Dataset, where the number of DatasetItems for each number of items in the knowledge base is specified. For example, if there are 4 Queryables in the ArgumentationSystem, then a knowledge base can contain either 0, 1, or 2 (=4/2) items. If, for example, sample_size = 10 then for each knowledge base size 10 ArgumentationTheories are generated, so the total number of DatasetItems is 30.
- Parameters
custom_dataset_name (
Optional[str]) – Optional, name of the Dataset. Otherwise a name based on the timestamp is chosen.include_ground_truth (
bool) – Boolean indicating if the ground truth should be computed. Note: this takes time!sample_size (
int) – Number of DatasetItems for each number of items in the knowledge base.verbose (
bool) – Boolean indicating if information should be printed.
- Return type
Dataset- Returns
The resulting Dataset.
DataSet classes¶
- class Dataset(name, argumentation_system_name, dataset_items)¶
Bases:
object- __init__(name, argumentation_system_name, dataset_items)¶
A Dataset has a name, the name of its ArgumentationSystem and a list of DatasetItems.
- Parameters
name (
str) – Name of the Dataset.argumentation_system_name (
str) – Name of the ArgumentationSystem on which the Dataset is based.dataset_items (
List[DatasetItem]) – Items in the Dataset.
- class DatasetItem(argumentation_system, argumentation_system_name, knowledge_base)¶
Bases:
object- __init__(argumentation_system, argumentation_system_name, knowledge_base)¶
A DatasetItem has an ArgumentationSystem, its name and a knowledge base.
- Parameters
argumentation_system (
ArgumentationSystem) – The ArgumentationSystem on which the DatasetItem is based.argumentation_system_name (
str) – The name of the ArgumentationSystem.knowledge_base (
List[Queryable]) – The knowledge base (list of Queryables).
- classmethod from_str(dataset_item_str)¶
Read the DatasetItem from a string.
- Parameters
dataset_item_str (
str) – String representation of the DatasetItem.- Returns
DatasetItem represented by the input string.
- class AnnotatedDatasetItem(argumentation_system, argumentation_system_name, knowledge_base, topic_literal, gt_acceptability_label, gt_stability_label)¶
Bases:
stability_label_algorithm.modules.dataset_generator.dataset_item.DatasetItemAn AnnotatedDatasetItem is a specific type of DatasetItem that also has a ground truth acceptability and stability label for a topic literal.
- __init__(argumentation_system, argumentation_system_name, knowledge_base, topic_literal, gt_acceptability_label, gt_stability_label)¶
Create an AnnotatedDatasetItem.
- Parameters
argumentation_system (
ArgumentationSystem) – The ArgumentationSystem on which the DatasetItem is based.argumentation_system_name (
str) – The name of the ArgumentationSystem.knowledge_base (
List[Queryable]) – The knowledge base (list of Queryables).topic_literal (
Literal) – The Literal for which the ground truth is given.gt_acceptability_label (
StabilityLabel) – Ground truth acceptability label for the topic Literal.gt_stability_label (
StabilityLabel) – Ground truth stability label for the topic Literal.
- classmethod from_str(dataset_item_str)¶
Read the DatasetItem from a string.
- Parameters
dataset_item_str (
str) – String representation of the DatasetItem.- Returns
DatasetItem represented by the input string.