Skip to search boxSkip to navigationSkip to main content

Towards Trustworthy AI Results using Evidence Structures: From Certificates to Argumentation Frameworks

Research Output: Contribution to journal Conference article Peer-review

Abstract

High-stakes domains such as law, medicine, and scientific inference require AI systems to deliver results that are not merely explained, but are independently verifiable. We propose TAIR (Trustworthy AI Results), a framework that treats AI systems as evidence-carrying answer generators: Given a user query Q, the system produces answers A together with evidence structures E. An evidence structure is a domain-appropriate artifact used to verify that A is valid, or at least rationally defensible, via domain expertise and/or external tools. Which evidence is appropriate depends not only on the domain, but also on the user's role and evidential needs (e.g., lay user, domain expert, or auditor). By focusing on evidence rather than model internals, TAIR synthesizes ideas from certifying algorithms, proof-carrying code, provenance systems, and computational argumentation into a unified evidence-first architecture. The framework provides a three-phase iterative workflow pattern (generation, verification, gap detection) that externalizes trust to independent checks and uses feedback to iteratively strengthen (Q, A, E) triples. Evidence structures can range from algorithmic witnesses and proof certificates to argumentation frameworks and warrant chains. TAIR treats evidential standards as domain-dependent: Mathematical domains require formal proofs or proof certificates, computational problems use algorithmic certificates, and legal/scientific domains employ structured argumentation. We illustrate TAIR with case studies spanning formal proof artifacts and defeasible legal argumentation, and outline a multi-agent meta-workflow for generating, verifying, and refining evidence-carrying AI results.