| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
objects being put into cache; The hash (which is the cache key) returned by the object at the previous put-into-cache point is always zero, so subsequent loads of the same object were never resolved from the cache.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, as part of building the index representation, the resolveObject
func would create an interim plumbing.MemoryObject, which would then be
saved into storage via storage.SetEncodedObject. This meant that objects
would be unnecessarily loaded into memory, to then be saved into disk.
The changes streamlines this process by:
- Introducing the LazyObjectWriter interface which enables the write
operation to take places directly against the filesystem-based storage.
- Leverage multi-writers to process the input data once, while targeting
multiple writers (e.g. hasher and storage).
An additional change relates to the caching of object info children within
Parser.get. The cache is now skipped when a seekable filesystem is being
used.
The impact of the changes can be observed when using seekable filesystem
storages, especially when cloning large repositories.
The stats below were captured by adapting the BenchmarkPlainClone test
to clone https://github.com/torvalds/linux.git:
pkg: github.com/go-git/go-git/v5
cpu: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
│ /tmp/old │ /tmp/new │
│ sec/op │ sec/op vs base │
PlainClone-16 41.68 ± 17% 48.04 ± 9% +15.27% (p=0.015 n=6)
│ /tmp/old │ /tmp/new │
│ B/op │ B/op vs base │
PlainClone-16 1127.8Mi ± 7% 256.7Mi ± 50% -77.23% (p=0.002 n=6)
│ /tmp/old │ /tmp/new │
│ allocs/op │ allocs/op vs base │
PlainClone-16 3.125M ± 0% 3.800M ± 0% +21.60% (p=0.002 n=6)
Notice that on average the memory consumption per operation is over 75%
smaller. The time per operation increased by 15%, which may actual be less
on long running applications, due to the decreased GC pressure and the
garbage collection costs.
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
|
|
|
|
| |
Signed-off-by: Arieh Schneier <15041913+AriehSchneier@users.noreply.github.com>
|
|
|
|
| |
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
completely (#330)
This PR adds code to prevent large objects from being read into memory
from packfiles or the filesystem.
Objects greater than 1Mb are now no longer directly stored in the cache
or read completely into memory.
This PR differs and improves the previous broken #323 by fixing several
bugs in the reader and transparently wrapping ReaderAt as a Reader.
Signed-off-by: Andrew Thornton <art27@cantab.net>
|
|
|
|
|
| |
into memory completely (#303)" (#329)
This reverts commit 720c192831a890d0a36b4c6720b60411fa4a0159.
|
|
|
|
|
|
|
|
|
|
|
| |
completely (#303)
This PR adds code to prevent large objects from being read into memory from packfiles or the filesystem.
Objects greater than 1Mb are now no longer directly stored in the cache
or read completely into memory.
Signed-off-by: Andrew Thornton <art27@cantab.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Like `git rev-parse <prefix>`, this enumerates the hashes of objects
with the given prefix and adds them to the list of candidates for
resolution.
This has an exhaustive slow path, which requires enumerating all objects
and filtering each one, but also a couple of fast paths for common
cases. There's room for future work to make this faster; TODOs have been
left for that.
Fixes #135.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
The MaxOpenDescriptors option provides a middle ground solution between keeping
all packfiles open (as offered by the KeepDescriptors option) and keeping none
open.
Signed-off-by: Arran Walker <arran.walker@fiveturns.org>
|
|
|
|
|
|
|
|
|
|
| |
If the cache is shared between several repositories getFromUnpacked can
erroneously return an object from other repository.
This decreases performance a little bit as there's an extra fs operation
when the object is in the cache but is correct when the cache is shared.
Signed-off-by: Javi Fontan <jfontan@gmail.com>
|
|
|
|
|
|
| |
commit histories (#963)
Signed-off-by: Filip Navara <navara@emclient.com>
|
|
|
|
| |
Signed-off-by: Javier Peletier <jm@epiclabs.io>
|
|
|
|
|
|
| |
Suggested by taruti.
Signed-off-by: Jeremy Stribling <strib@alum.mit.edu>
|
|
|
|
| |
Signed-off-by: Jeremy Stribling <strib@alum.mit.edu>
|
|
|
|
| |
Signed-off-by: Javi Fontan <jfontan@gmail.com>
|
|
|
|
|
|
|
|
| |
PackfileIter was not taking into account the option KeepDescriptors
and was always closing the file. This caused "file already closed"
errors when iterating packfiles in with KeepDescriptors active.
Signed-off-by: Javi Fontan <jfontan@gmail.com>
|
|
|
|
| |
Signed-off-by: kuba-- <kuba@sourced.tech>
|
|
|
|
|
|
|
|
|
|
| |
This option maintains packfile file descriptors opened after reading
objects from them. It improves performance as it does not have to be
opening packfiles each time an object is needed.
Also adds Close to EncodedObjectStorer to close all the files manualy.
Signed-off-by: Javi Fontan <jfontan@gmail.com>
|
|
|
|
| |
Signed-off-by: Javi Fontan <jfontan@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
After clone only resolved deltas were added to the cache. This caused
slowdowns in small repositories where most objects can be held in cache.
It also makes packfiles reuse delta cache from the store. Previously it
created a new delta cache each time a packfile object was created. This
also slowed down a bit accessing objects and had an impact on memory
consumption when bases are added to the cache.
Signed-off-by: Javi Fontan <jfontan@gmail.com>
|
|
|
|
| |
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
|
|
|
|
| |
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
|
|
|
|
| |
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
|
|
|
|
| |
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
|
|
|
|
|
|
| |
Now dotgit.PackWriter uses the new packfile.Parser and index.
Signed-off-by: Javi Fontan <jfontan@gmail.com>
|
|
|
|
| |
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
|
|
|
|
| |
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
|
|
|
|
| |
Signed-off-by: Antonio Jesus Navarro Perez <antnavper@gmail.com>
|
|
|
|
|
|
|
|
| |
Previously some close errors were losts. This is specially problematic
in go-git as lots of work is done here like generating indexes and
moving packfiles.
Signed-off-by: Javi Fontan <jfontan@gmail.com>
|
|
|
|
|
|
|
|
| |
* do not store extra bool values in the seen map
* open packfile iterators lazily
Signed-off-by: Denys Smirnov <denys@sourced.tech>
|
|
|
|
| |
Signed-off-by: Javi Fontan <jfontan@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Decoder object can make use of an object cache to speed up processing.
Previously the only way to specify it was changing manually the struct
generated by NewDecodeForFile. This lead to some instances to be created
without it and penalized performance.
Now the cache should be explicitly passed to the constructor function.
NewDecoder now creates objects with a cache using the default size.
A new helper function was added to create cache objects with the default
size as this becomes a common task now:
cache.NewObjectLRUDefault()
Signed-off-by: Javi Fontan <jfontan@gmail.com>
|
|
|
|
| |
This change adds a new method Alternates() in DotGit to check and
query alternate source.
|
|
|
|
|
|
| |
Suggested by mcuadros.
Issue: #669
|
|
|
|
|
|
|
| |
This allows the user to check whether an object exists, without
reading all the object data from storage.
Issue: KBFS-2445
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
- no length for map initialization
- don't check for boolean/error return
- don't format string
- use string method of bytes buffer instead of converting bytes to
string
- use `strings.Contains` instead of `strings.Index`
- use `bytes.Equal` instead of `bytes.Compare`
|
|
|
|
| |
Signed-off-by: Máximo Cuadros <mcuadros@gmail.com>
|
|\
| |
| | |
storage: reuse deltas from packfiles
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* plumbing: add DeltaObject interface for EncodedObjects that
are deltas and hold additional information about them, such
as the hash of the base object.
* plumbing/storer: add DeltaObjectStorer interface for object
storers that can return DeltaObject. Note that calls to
EncodedObject will never return instances of DeltaObject.
That requires explicit calls to DeltaObject.
* storage/filesystem: implement DeltaObjectStorer interface.
* plumbing/packfile: packfile encoder now supports reusing
deltas that are already computed (e.g. from an existing
packfile) if the storage implements DeltaObjectStorer.
Reusing deltas boosts performance of packfile generation
(e.g. on push).
|
|/ |
|
| |
|
|
|
|
|
| |
Reuse delta base object cache for packfile decoders
across multiple instances.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There was an internal type (i.e. storage/filesystem.idx) to
use as in-memory index for packfiles. This was not convenient
to reuse in the packfile.
This commit creates a new representation (format/packfile.Index)
that can be converted to and from idxfile.Idxfile.
A packfile.Index now contains the functionality that was scattered
on storage/filesystem.idx and packfile.Decoder's internals.
storage/filesystem now reuses packfile.Index instances and this
also results in higher cache hit ratios when resolving deltas.
|
| |
|
| |
|