2025-08-09 04:58:00
github.com
- Jens Dietrich, Behnaz Hassanshahi: DALEQ – Explainable Equivalence for Java Bytecode (preprint)
- Jens Dietrich, Tim White, Behnaz Hassanshahi, Paddy Krishnan: Levels of Binary Equivalence for the Comparison of Binaries from Alternative Builds. ICSME’25
- Jens Dietrich, Tim White, Valerio Terragni, Behnaz Hassanshahi: Towards Cross-Build Differential Testing. ICST’25
DALEQ has been developed and tested with Java 17.
DALEQ can be built with Maven as follows:
mvn clean package dependency:copy-dependencies
Then run DALEQ as follows (for windows: change the classpath separator accordingly):
java -DSOUFFLE= -cp "target/classes:target/dependency/*" io.github.bineq.daleq.cli.Main \
-j1,--jar1 the first jar file to compare (required)
-j2,--jar2 the second jar file to compare (required)
-o,--out the output folder where the report will be generated (required)
-s1,--src1 the first jar file with source code to compare (optional)
-s2,--src2 the second jar file with source code to compare (optional)
-a,--autoopen if set, the generated html report will be opened automatically (dont use for CLI integration)
The jar built can also be used directly as an executable superjar.
java -DSOUFFLE= -jar target/daleq-.jar \
-j1,--jar1 the first jar file to compare (required)
-j2,--jar2 the second jar file to compare (required)
-o,--out the output folder where the report will be generated (required)
-s1,--src1 the first jar file with source code to compare (optional)
-s2,--src2 the second jar file with source code to compare (optional)
-a,--autoopen if set the generated html report will be opened automatically (dont use for CLI integration)
Running the program will create a report report.html
in the specified output folder.
The program returns with the following exit codes:
0 - all classes and resources are equal, source codes (if analysed) are equivalent
1 - some classes are not equal but still equivalent, resources are equal, source-codes (if analysed) are equivalent
2 - some classes are not equivalent (wrt daleq) or some resources (e.g. metadata) are not equal or some sources (if analysed) are not equivalent
This can be used to integrate the tool into CI processes.
Daleq takes Java bytecode as input and produces a relational database.
This is done in two steps.
flowchart LR
bytecode["bytecode (.class)"] --> asm(("asm")) --> edb["EDB"] --> souffle(("souffle")) --> idb["IDB"]
Loading
A database representing the bytecode is extracted. An asm-based static analysis is used for this purpose.
The result is the EDB (extensional database), a folder with tsv files, each tsv file corresponding to a predicate.
Those facts represent the bytecode, including class name, bytecode version, super types and interfaces,
access flags for method, fields and classes, bytecode instructions in methods, etc.
The representation is low-level, however, there is one significant abstraction (provided by asm): constant pool references are
resolved.
Each fact has a unique generated id that is used to provide provenance.
The IDB computation applies rules to compute a second set of facts (IDB files) that normalises
the EBD to establish whether two bytecode compared are equivalent or not.
Rules that are applied for this purpose are defined in
src/main/resources/rules/
, advanced.souffle
is the default set.
The API to generate the IDB is provided by io.github.bineq.daleq.Souffle::createIDB
.
This requires souffle to be installed.
Any application that uses this API must provide the location of souffle via an argument
that is passed to the JVM:
The first slot (term) for each fact is an id.
For EDB facts, those ids are generated during the bytecode analysis / extracyion phase.
Example (from AALOAD.facts
):
F1428629 org/apache/hadoop/hdfs/HAUtil::getConfForOtherNodes(Lorg/apache/hadoop/conf/Configuration;)Ljava/util/List; 3200
When the IDB is created, each IDB fact also has this slot.
The values encode a derivation tree. Example:
This fact has been computed by applying rule R_AALOAD
to fact F1428629
.
I.e. those ids encode a derivation tree:
graph TD;
R_AALOAD --> F1428629;
Loading
A less trivial example is R_42[F1,R_43[F2,F3]]
, this encodes the following derivation tree:
graph TD;
R_42 --> F1;
R_42 --> R_43;
R_43 --> F2;
R_43 --> F3;
Loading
The project contains a parser to transform those strings into
trees (io.github.bineq.daleq.souffle.provenance.ProvenanceParser
).
The grammar of the encoded proofs is defined here:
src/main/antlr4/io/github/bineq/daleq/souffle/provenance/Proof.g4
.
When trying to establish equivalence between two (compiled) classes, IDBs cannot be compared directly for the following reasons:
- The provenance (id) terms — this reflects the process of normalisation that might be different for each of the classes compared
- the IDB contains additional facts for predicates to keep track of facts that have been removed (i.e. EDB facts that have no counterpart in the IDB), or moved.
- facts corresponding to bytecode instruction have an
instructioncounter
slot to define the order of instructions within a method. Those might change in the IDB as normalisation may move/remove/insert facts corresponding to instructions.
The purpose of projection is to remove the aforementioned parts (facts and slots) from the database.
At the moment normalisation is implemented in Java.
The classes io.github.bineq.daleq.idb.IDBReader
and io.github.bineq.daleq.idb.IDBPrinter
are used to read an IDB,
and convert it into a textual representation suitable for comparison and diffing (using a standard text fiff tool such as diff or kdiff3). This also supports projection.
flowchart LR
bytecode1["class1"] --> asm1(("asm")) --> edb1["EDB1"] --> souffle1(("souffle")) --> idb1["IDB1"] --> IDBReader1(("project")) --> file1 --> diff --> result
bytecode2["class2"] --> asm2(("asm")) --> edb2["EDB2"] --> souffle2(("souffle")) --> idb2["IDB2"] --> IDBReader2(("project")) --> file2 --> diff
Loading
This is an example of a generated report. Here, we have compared the jar file for the artifact
javax.transaction:jta:1.1 from Maven Central with the corresponding artifact rebuilt by Google’s assured open source.
Daleq employs various analysers to compare classes, meta-data and resources within the jars.
The results are displayed in a table, the rows correspond to files within the jar(s), and the columns correspond to the various analysers.
Analysers compare the content of files, specialised analysers are used to compare source code and bytecode. The actual daleq analyser results are displayed in the last column. Please have a look at our paper for more details.
Possible analysis result states are:
- PASS the comparison yields true
- FAIL the comparison yields false
- N/A the comparison cannot be applied (e.g., a source Code comparison cannot be used to compare bytecode)
- ERROR the comparison has resulted in an error
The markers next to each result are hyperlinks to generated pages with additional provenance.
For non-equivalence statements, those pages are usually diff reports rendered in HTML.
For daleq equivalence statements, advanced diff reports are generated based on the derivations recorded when datalog rules are applied.
This is an example:
usage: java -cp io.github.bineq.daleq.edb.FactExtractor
-cl the location of compiled classes, a jar file or folder
-f a folder where to create the extension database (input
.facts files), the folder will be created if it does not
exist
-s a file where to create the souffle program (.souffle
file) containing imports and input predicate declarations
-v whether to verify the datalog facts against the schema,
must be true or false, default is true
Note that tests requires souffle to be set up (see also IDB Computation section).
This has to be done locally at the moment, see also issue 11.
Tests that require souffle will not fail but will be skipped if souffle is not available.
Souffle occasionally fails.
When souffle is used, a Thread::sleep
instruction is used to pause DALEQ.
This reduces the number of errors (perhaps a souffle race condition?), but makes DALEQ slower.
Souffle issues will be reported by the appplication, and flagged as error .
Additional analysers can be easily added by following those steps:
- Write an analyser (say
MyAnalyser
) implementingio.github.bineq.daleq.cli.Analyser
- In your project, add a file
src/main/resources/META-INF/services/io.github.bineq.daleq.cli.Analyser
- Add a line with the fully qualified name of
MyAnalyser
to this file - Build a jar file
- When running DALEQ, add this jar file to the classpath
- Note that the order in which columns are displayed in reports is determined by the property
io.github.bineq.daleq.cli.Analyser::positionHint
, a values between 0 and 100.
To contribute to this project, please follow the steps below to set up your development environment.
First, create a virtual environment to isolate dependencies:
Activate the virtual environment and install the pre-commit
hooks.
source .venv/bin/activate
pip install -r requirements.txt
pre-commit install
Now, the hooks will automatically run on each commit to check for any issues in the code. You can manually run the pre-commit hooks on all files at any time with:
pre-commit run --all-files
Keep your files stored safely and securely with the SanDisk 2TB Extreme Portable SSD. With over 69,505 ratings and an impressive 4.6 out of 5 stars, this product has been purchased over 8K+ times in the past month. At only $129.99, this Amazon’s Choice product is a must-have for secure file storage.
Help keep private content private with the included password protection featuring 256-bit AES hardware encryption. Order now for just $129.99 on Amazon!
Help Power Techcratic’s Future – Scan To Support
If Techcratic’s content and insights have helped you, consider giving back by supporting the platform with crypto. Every contribution makes a difference, whether it’s for high-quality content, server maintenance, or future updates. Techcratic is constantly evolving, and your support helps drive that progress.
As a solo operator who wears all the hats, creating content, managing the tech, and running the site, your support allows me to stay focused on delivering valuable resources. Your support keeps everything running smoothly and enables me to continue creating the content you love. I’m deeply grateful for your support, it truly means the world to me! Thank you!
BITCOIN bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge Scan the QR code with your crypto wallet app |
DOGECOIN D64GwvvYQxFXYyan3oQCrmWfidf6T3JpBA Scan the QR code with your crypto wallet app |
ETHEREUM 0xe9BC980DF3d985730dA827996B43E4A62CCBAA7a Scan the QR code with your crypto wallet app |
Please read the Privacy and Security Disclaimer on how Techcratic handles your support.
Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.