Your resource for web content, online publishing
and the distribution of digital products.
«  
  »
S M T W T F S
 
 
 
 
 
 
1
 
2
 
3
 
4
 
5
 
6
 
7
 
8
 
9
 
 
 
 
 
14
 
15
 
16
 
17
 
18
 
19
 
20
 
21
 
22
 
23
 
24
 
25
 
26
 
27
 
28
 
29
 
30
 
31
 
 
 
 
 
 

How We Generated XML Documents: Tree Creation and Node Generation

DATE POSTED:March 10, 2025
Table of Links

Abstract and 1 Introduction

2 Background

3 Approach and 3.1 Differential Testing for XML Processors

3.2 XPath Expression Generation

3.3 XML Generation

4 Evaluation

4.1 Effectiveness

4.2 Efficiency

4.3 Comparison to the State of the Art

4.4 Analysis of BaseX Historical Bug Reports

5 Related Work

6 Conclusion, Acknowledgments, and References

3.3 XML Generation

In this section, we outline how we generate XML documents (step 1), which we do not consider part of our core contribution.

\ Tree creation. We use a bottom-up approach to generate XML documents. We first generate a number of node templates, which we use to generate XML nodes that have overlaps in terms of structure, as detailed below. We select one of these nodes as a root element. For the remaining nodes, we randomly assign each node to a parent. As XML documents support recursive structure, we allow cyclic relationships. In Section 4, we provide details on how we configured the number of nodes in a document.

\

\ Node generation. We introduce how each element node is instantiated. By default, XML documents do not have to adhere to a specific schema, which is unlike, for example, relational DBMSs. Nevertheless, we want to generate element nodes that have overlaps in terms of structure, to test for more interesting behaviors. To that end, we generate element nodes based on so-called node templates that we randomly generate. A node template represents a type of node. For example, in Figure 1, Book is a node template whose tag name is Book, has attributes id and year, and has text content of string data type. To instantiate the template, we fill in values for the attributes and text contents. For each node we created in the aforementioned XML tree, we instantiate it with a randomly assigned template. In the example of Figure 1, we generated three nodes using the Book template. We assign random values for element nodes and their attributes according to the associated data types except id, to which we assign a unique identifier, which we use to unambiguously identify the processors’ outputs (see Section 3.1). For the node with id = 1, we assign the random integer value 2020 to year and the random string value "A fairy tale" as its text content. Similar strategies have been applied also to other schema-less systems such as graph DBMSs [26, 29].

\

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

:::info Authors:

(1) Shuxin Li, Southern University of Science and Technology China and Work done during an internship at the National University of Singapore ([email protected]);

(2) Manuel Rigger, National University of Singapore Singapore ([email protected]).

:::

\