diff options
author | Miguel Molina <miguel@erizocosmi.co> | 2020-04-23 16:51:56 +0200 |
---|---|---|
committer | Miguel Molina <miguel@erizocosmi.co> | 2020-04-23 17:22:39 +0200 |
commit | 90696f07f3a64243c6f33963f5267ab98622db9f (patch) | |
tree | 29124a96a4a34927fc48038e2961c4461129c8db /utils/merkletrie | |
parent | bb3a1bfcc62224001d577b933355661259b80854 (diff) | |
download | go-git-90696f07f3a64243c6f33963f5267ab98622db9f.tar.gz |
plumbing: detect renames by hash and similar content in diff tree
This commit implements the rename detection algorithms used in the
JGit implementation.
Given a list of changes, additions and deletions are extracted and
matched in two ways:
- By exact hash content: all additions and deletions are grouped by
the content hash and paired with the best match based on the file
mode and file path. All the files that cannot be paired are kept
as regular deletions and additions.
- By similar content: a matrix of addition and deletion pairs with
all possible combinations is created and scored by how similar the
content is between both files as well as how similar the file path
is. The pairs with the best score and whose score is equal or
greater than a threshold are paired and turned into a rename.
All the files that cannot be paired are kept as regular deletions
and additions.
DiffTree and DiffTreeContext will not return the changes with renames
detected for compatibility reasons, although this will change in v6
so that detecting renames is the default behaviour.
A new function DiffTreeWithOptions has been added to configure the
parameters for the rename detection to control the score threshold,
the limit of renames and whether to use similar content detection in
the detection.
More information:
- https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/diff/RenameDetector.java
- https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/diff/SimilarityRenameDetector.java
- https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/diff/SimilarityIndex.java
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
Diffstat (limited to 'utils/merkletrie')
-rw-r--r-- | utils/merkletrie/difftree.go | 12 |
1 files changed, 8 insertions, 4 deletions
diff --git a/utils/merkletrie/difftree.go b/utils/merkletrie/difftree.go index 90d9c95..bd084b2 100644 --- a/utils/merkletrie/difftree.go +++ b/utils/merkletrie/difftree.go @@ -23,7 +23,7 @@ package merkletrie // # Cases // -// When comparing noders in both trees you will found yourself in +// When comparing noders in both trees you will find yourself in // one of 169 possible cases, but if we ignore moves, we can // simplify a lot the search space into the following table: // @@ -256,17 +256,21 @@ import ( ) var ( + // ErrCanceled is returned whenever the operation is canceled. ErrCanceled = errors.New("operation canceled") ) // DiffTree calculates the list of changes between two merkletries. It // uses the provided hashEqual callback to compare noders. -func DiffTree(fromTree, toTree noder.Noder, - hashEqual noder.Equal) (Changes, error) { +func DiffTree( + fromTree, + toTree noder.Noder, + hashEqual noder.Equal, +) (Changes, error) { return DiffTreeContext(context.Background(), fromTree, toTree, hashEqual) } -// DiffTree calculates the list of changes between two merkletries. It +// DiffTreeContext calculates the list of changes between two merkletries. It // uses the provided hashEqual callback to compare noders. // Error will be returned if context expires // Provided context must be non nil |