Your resource for web content, online publishing
and the distribution of digital products.

A Formalization of the SylloBio-NLI Resource Generation Process

DATE POSTED:December 11, 2024
Table of Links
  1. Abstract and Introduction

  2. SylloBio-NLI

  3. Empirical Evaluation

  4. Related Work

  5. Conclusions

  6. Limitations and References


A. Formalization of the SylloBio-NLI Resource Generation Process

B. Formalization of Tasks 1 and 2

C. Dictionary of gene and pathway membership

D. Domain-specific pipeline for creating NL instances and E Accessing LLMs

F. Experimental Details

G. Evaluation Metrics

H. Prompting LLMs - Zero-shot prompts

I. Prompting LLMs - Few-shot prompts

J. Results: Misaligned Instruction-Response

K. Results: Ambiguous Impact of Distractors on Reasoning

L. Results: Models Prioritize Contextual Knowledge Over Background Knowledge

M Supplementary Figures and N Supplementary Tables

A Formalization of the SylloBio-NLI Resource Generation Process

This appendix formalises the generation process of the syllogistic inference patterns.

\ We start by defining the mains constructs (formal and linguistic artefacts and functions) of the underlying framework:


  1. Syllogistic Scheme (S): A logical inference pattern consisting of premises and a conclusion, S = {P1, P2, . . . , Pn, C}, where Pi is premise i and C is the conclusion.


  2. Formal Argument Scheme (σ): Representation of a syllogistic scheme in first-order logic (FOL), σ(S) = {ϕ1, ϕ2, . . . , ϕn, ψ}, where ϕi corresponds to Pi and ψ corresponds to C.


  3. Natural Language Template (τ ): A natural language schema mapping each formula in σ(S) to a sentence template, τ (σ(S)) = {τ1, τ2, . . . , τn, σ}, where τi is the sentence template for ϕi and σ is the sentence template for ψ.


  4. Ontology (O): A domain-specific knowledge base containing entities E and predicates Π, O = {E, Π}, where E = {e1, e2, . . . , ek} and Π = {π1, π2, . . . , πl}.


  5. Instantiation Function (I): A function that replaces placeholders in τ with entities and predicates from O, I : τ (σ(S)) × O → NL, where NL is the set of natural language sentences.


  6. Expert Mapping Function (µExpert): A function provided by a domain expert to map placeholders to appropriate ontology terms, µExpert : Placeholders → E ∪ Π.


  7. Knowledge Base (KB): A collection of instantiated syllogistic arguments, KB = {A1, A2, . . . , Am}, where Ai = {P ′ 1 , P′ 2 , . . . , P′ n , C′} and P ′ i , C′ are instantiated natural language sentences.

A.1 Process Formalisation

The process formalisation defines a systematic process for generating domain-specific syllogistic arguments by:

\ 1. Defining formal representations of syllogistic schemes in first-order logic.


  1. Generating natural language templates from these formal representations.


  2. Mapping placeholders to domain-specific entities and predicates using an ontology and expert knowledge.


  3. Instantiating the templates to produce logically valid and semantically sound arguments.


  4. Constructing a knowledge base for evaluating NLI models.

\ This ensures that the generated arguments are both logically valid and contextually relevant to the biomedical domain.

\ Input: A set of syllogistic schemes: S = {S1, S2, . . . , Sm}, an ontology: O = {E, Π}, an expert mapping function: µExpert.

\ Output: A knowledge base of instantiated arguments: KB.

\ Step 1: Formal Argument Scheme Selection: For each syllogistic scheme Si ∈ S, define its formal argument scheme in first-order logic:


\ Step 2: Natural Language Template Generation: Transform each formula in σ(Si) into a natural language template:


\ Step 3: Ontology Mapping and Instantiation: Apply the expert mapping function to select appropriate entities and predicates from the ontology:


\ Instantiate the templates:


\ under the following constraints:

\ • Logical Validity: The instantiated arguments must preserve the logical structure of σ(Si).

\ • Domain Soundness: The selected entities and predicates must be semantically coherent within the targeted subdomain.

\ These constraints can be further formalised as:

\ Logical Validity Constraint: The instantiated argument Ai must be logically valid:

\ {ϕ ′ 1 , ϕ′ 2 , . . . , ϕ′ n} |= ψ ′ ,

\ where ϕ ′ j corresponds to the logical form of P ′ j .

\ Domain Soundness Constraint: The entities and predicates used must be semantically valid within the domain:

\ ∀e ∈ E ′ , π ∈ Π ′ , DomainValid(e, π) = True,

\ where E′ ⊆ E and Π′ ⊆ Π are entities and predicates used in Ai

\ Verification of Logical Validity: Ensure that the instantiated premises logically entail the conclusion:

\ {ϕ ′ 1 , ϕ′ 2 , . . . , ϕ′ n} |= ψ ′ ,

\ using logical inference rules.

\ Verification of Domain Soundness: Confirm that:

\ • All entities and predicates are correctly used.

\ • There are no semantic contradictions.

\ Step 4: Knowledge Base Construction: Aggregate all instantiated arguments into the knowledge base:


\ This is summarised with the following algorithmic outline:



:::info Authors:

(1) Magdalena Wysocka, National Biomarker Centre, CRUK-MI, Univ. of Manchester, United Kingdom;

(2) Danilo S. Carvalho, National Biomarker Centre, CRUK-MI, Univ. of Manchester, United Kingdom and Department of Computer Science, Univ. of Manchester, United Kingdom;

(3) Oskar Wysocki, National Biomarker Centre, CRUK-MI, Univ. of Manchester, United Kingdom and ited Kingdom 3 I;

(4) Marco Valentino, Idiap Research Institute, Switzerland;

(5) André Freitas, National Biomarker Centre, CRUK-MI, Univ. of Manchester, United Kingdom, Department of Computer Science, Univ. of Manchester, United Kingdom and Idiap Research Institute, Switzerland.


:::info This paper is available on arxiv under CC BY-NC-SA 4.0 license.

