SourceDiff is an advanced code analysis tool designed to measure and minimize structural distance between codebases. It helps developers, educators, and code reviewers by analysing how similar codebases are.
By leveraging static analysis, parse trees (PTs), and pattern recognition technique, SourceDiff provides helpful refactoring suggestions to align codebases more closely or to highlight significant divergences. This makes it particularly useful for:
- Plagiarism or collusion in academic environments
- AI-generated or unoriginal code that may be copy-pasted or generated by an AI agent
- Redundant code that may produce unused compilation artifacts
- Code transposition for duplicate logic that can be transposed into methods/functions
- Academia
- Detecting plagiarism through copy-pasting or AI-generated code
- Evaluation of code quality - sudden differences in code quality may hint at cheating
- Professional
- Identify AI-generated code in code reviews
- Maintain consistent programming styles across the codebase
SourceDiff uses the Tree Sitter API for parsing source code into parse trees, and they provide a large database of officially-supported, pre-compiled incremental parsers. This project also embeds the TinyCC project for the compilation backend for Tree Sitter parsers.
If your programming language does not have an officially-recognised Tree Sitter parser, you can always create your own. See the Tree Sitter documentation for how to create your own.
To build SourceDiff from source, you are required to have make installed on your system. Invoke the command:
~ > makeThis will produce a build directory that will contain all the neccessary binaries to run SourceDiff. You can now either:
setup your PATH variable if you intend to properly use this software; or navigate to the build directory. After, just invoke
the executable using the command-line:
~ > ./sd