Generating a data set¶
In order to empirically assess the accuracy and computation time of the labelling algorithms, one requires a data set. We added various options for data set generation to our repository.
Generating an argumentation system¶
Instead of manually designing an argumentation system and loading it (using an
ArgumentationSystemXLSXReader
)
we also provide functionality to automatically generate an argumentation system.
The repository currently holds two types of generators: a random and a layered argumentation system generator.
Random argumentation system generator¶
- class RandomArgumentationSystemGenerator(argumentation_system_generation_parameters)¶
Bases:
modules.dataset_generator.argumentation_system_generator.argumentation_system_generator_interface.ArgumentationSystemGeneratorInterface
- generate()¶
Randomly generate a new ArgumentationSystem based on the RandomArgumentationSystemGeneratorParameters.
- Return type
- Returns
The generated ArgumentationSystem.
- class RandomArgumentationSystemGeneratorParameters(language_size, rule_size, rule_antecedent_distribution, queryable_size=None, queryable_ratio=None, allow_rules_for_queryables=True, allow_conclusion_in_antecedents=True, allow_inconsistent_antecedents=True)¶
Bases:
object
- __init__(language_size, rule_size, rule_antecedent_distribution, queryable_size=None, queryable_ratio=None, allow_rules_for_queryables=True, allow_conclusion_in_antecedents=True, allow_inconsistent_antecedents=True)¶
Parameters for randomly generating an ArgumentationSystem.
- Parameters
language_size (
int
) – Number of Literals (including negations)rule_size (
Optional
[int
]) – Number of Rulesrule_antecedent_distribution (
Dict
[int
,float
]) – Number of Rules with a specific number of antecedents.queryable_size (
Optional
[int
]) – Number of Queryables.queryable_ratio (
Optional
[float
]) – Fraction of Queryables by the number of Literals.allow_rules_for_queryables (
bool
) – Boolean indicating if there can be Rules for Queryables.allow_conclusion_in_antecedents (
bool
) – Boolean indicating if a Rule can have its conclusion as an antecedent.allow_inconsistent_antecedents (
bool
) – Boolean indicating if a Rule can have inconsistent antecedents.
Layered argumentation system generator¶
- class LayeredArgumentationSystemGenerator(argumentation_system_generation_parameters)¶
Bases:
modules.dataset_generator.argumentation_system_generator.argumentation_system_generator_interface.ArgumentationSystemGeneratorInterface
- generate()¶
Generate an ArgumentationSystem with a layered structure according to the LayeredArgumentationSystemGeneratorParameters.
- Return type
- Returns
The generated ArgumentationSystem.
- class LayeredArgumentationSystemGeneratorParameters(language_size, rule_size, rule_antecedent_distribution, literal_layer_distribution)¶
Bases:
object
- __init__(language_size, rule_size, rule_antecedent_distribution, literal_layer_distribution)¶
Parameters for randomly generating an ArgumentationSystem with a layered structure.
- Parameters
language_size (
int
) – The number of Literals (including negations).rule_size (
int
) – The number of Rules.rule_antecedent_distribution (
Dict
[int
,int
]) – The number of Rules having a specific number of antecedents.literal_layer_distribution (
Dict
[int
,int
]) – The number of Literals in a specific layer.
Computing properties for an ArgumentationTheory or ArgumentationSystem¶
- compute_argumentation_theory_properties(argumentation_theory, verbose=False)¶
Compute some properties of the given ArgumentationTheory, such as the corresponding incomplete argumentation framework or the number of future ArgumentationTheories.
- Parameters
argumentation_theory (
ArgumentationTheory
) – ArgumentationTheory for which properties are needed.verbose – Boolean indicating if information should be printed.
- Return type
ArgumentationTheoryProperties
- Returns
ArgumentationTheoryProperties of the ArgumentationTheory.
- enumerate_future_argumentation_theories(argumentation_theory, verbose=False)¶
Enumerate all future ArgumentationTheories of this ArgumentationTheory.
- Parameters
argumentation_theory (
ArgumentationTheory
) – ArgumentationTheory for which future ArgumentationTheories should be enumerated.verbose (
bool
) – Boolean indicating if information should be printed.
- Return type
List
[ArgumentationTheory
]- Returns
All future ArgumentationTheories of this ArgumentationTheory.
- compute_argumentation_system_properties(argumentation_system)¶
Compute some properties of the given ArgumentationSystem, such as the number of literals or rule antecedents.
- Parameters
argumentation_system (
ArgumentationSystem
) – ArgumentationSystem for which properties are needed.- Return type
ArgumentationSystemProperties
- Returns
ArgumentationSystemProperties of the ArgumentationSystem.
Generating a Dataset for a specific ArgumentationTheory¶
- class DatasetGenerator(argumentation_system, argumentation_system_custom_name=None)¶
Bases:
object
- classmethod from_file(argumentation_system_file_name)¶
Generate a Dataset for an ArgumentationSystem that must still be read from a file.
- Parameters
argumentation_system_file_name (
str
) – Name of ArgumentationSystem for which a Dataset should be generated.- Returns
Dataset for specified ArgumentationSystem.
- generate_dataset(custom_dataset_name=None, include_ground_truth=True, verbose=True)¶
Generate a Dataset, where all possible ArgumentationTheories for the given ArgumentationSystem are generated. Note: for ArgumentationSystems with many Queryables, this takes a lot of time.
- Parameters
custom_dataset_name (
Optional
[str
]) – Optional, name of the Dataset. Otherwise a name based on the timestamp is chosen.include_ground_truth (
bool
) – Boolean indicating if the ground truth should be computed. Note: this takes time!verbose (
bool
) – Boolean indicating if information should be printed.
- Return type
Dataset
- Returns
The resulting Dataset.
- generate_dataset_sample(custom_dataset_name=None, include_ground_truth=True, sample_size=1000, verbose=True)¶
Generate a Dataset, where the number of DatasetItems for each number of items in the knowledge base is specified. For example, if there are 4 Queryables in the ArgumentationSystem, then a knowledge base can contain either 0, 1, or 2 (=4/2) items. If, for example, sample_size = 10 then for each knowledge base size 10 ArgumentationTheories are generated, so the total number of DatasetItems is 30.
- Parameters
custom_dataset_name (
Optional
[str
]) – Optional, name of the Dataset. Otherwise a name based on the timestamp is chosen.include_ground_truth (
bool
) – Boolean indicating if the ground truth should be computed. Note: this takes time!sample_size (
int
) – Number of DatasetItems for each number of items in the knowledge base.verbose (
bool
) – Boolean indicating if information should be printed.
- Return type
Dataset
- Returns
The resulting Dataset.
DataSet classes¶
- class Dataset(name, argumentation_system_name, dataset_items)¶
Bases:
object
- __init__(name, argumentation_system_name, dataset_items)¶
A Dataset has a name, the name of its ArgumentationSystem and a list of DatasetItems.
- Parameters
name (
str
) – Name of the Dataset.argumentation_system_name (
str
) – Name of the ArgumentationSystem on which the Dataset is based.dataset_items (
List
[DatasetItem
]) – Items in the Dataset.
- class DatasetItem(argumentation_system, argumentation_system_name, knowledge_base)¶
Bases:
object
- __init__(argumentation_system, argumentation_system_name, knowledge_base)¶
A DatasetItem has an ArgumentationSystem, its name and a knowledge base.
- Parameters
argumentation_system (
ArgumentationSystem
) – The ArgumentationSystem on which the DatasetItem is based.argumentation_system_name (
str
) – The name of the ArgumentationSystem.knowledge_base (
List
[Queryable
]) – The knowledge base (list of Queryables).
- classmethod from_str(dataset_item_str)¶
Read the DatasetItem from a string.
- Parameters
dataset_item_str (
str
) – String representation of the DatasetItem.- Returns
DatasetItem represented by the input string.
- class AnnotatedDatasetItem(argumentation_system, argumentation_system_name, knowledge_base, topic_literal, gt_acceptability_label, gt_stability_label)¶
Bases:
stability_label_algorithm.modules.dataset_generator.dataset_item.DatasetItem
An AnnotatedDatasetItem is a specific type of DatasetItem that also has a ground truth acceptability and stability label for a topic literal.
- __init__(argumentation_system, argumentation_system_name, knowledge_base, topic_literal, gt_acceptability_label, gt_stability_label)¶
Create an AnnotatedDatasetItem.
- Parameters
argumentation_system (
ArgumentationSystem
) – The ArgumentationSystem on which the DatasetItem is based.argumentation_system_name (
str
) – The name of the ArgumentationSystem.knowledge_base (
List
[Queryable
]) – The knowledge base (list of Queryables).topic_literal (
Literal
) – The Literal for which the ground truth is given.gt_acceptability_label (
StabilityLabel
) – Ground truth acceptability label for the topic Literal.gt_stability_label (
StabilityLabel
) – Ground truth stability label for the topic Literal.
- classmethod from_str(dataset_item_str)¶
Read the DatasetItem from a string.
- Parameters
dataset_item_str (
str
) – String representation of the DatasetItem.- Returns
DatasetItem represented by the input string.