aboutsummaryrefslogtreecommitdiffstats
path: root/storage/filesystem/object.go
Commit message (Collapse)AuthorAgeFilesLines
* plumbing: Optimise memory consumption for filesystem storagePaulo Gomes2023-10-281-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, as part of building the index representation, the resolveObject func would create an interim plumbing.MemoryObject, which would then be saved into storage via storage.SetEncodedObject. This meant that objects would be unnecessarily loaded into memory, to then be saved into disk. The changes streamlines this process by: - Introducing the LazyObjectWriter interface which enables the write operation to take places directly against the filesystem-based storage. - Leverage multi-writers to process the input data once, while targeting multiple writers (e.g. hasher and storage). An additional change relates to the caching of object info children within Parser.get. The cache is now skipped when a seekable filesystem is being used. The impact of the changes can be observed when using seekable filesystem storages, especially when cloning large repositories. The stats below were captured by adapting the BenchmarkPlainClone test to clone https://github.com/torvalds/linux.git: pkg: github.com/go-git/go-git/v5 cpu: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz │ /tmp/old │ /tmp/new │ │ sec/op │ sec/op vs base │ PlainClone-16 41.68 ± 17% 48.04 ± 9% +15.27% (p=0.015 n=6) │ /tmp/old │ /tmp/new │ │ B/op │ B/op vs base │ PlainClone-16 1127.8Mi ± 7% 256.7Mi ± 50% -77.23% (p=0.002 n=6) │ /tmp/old │ /tmp/new │ │ allocs/op │ allocs/op vs base │ PlainClone-16 3.125M ± 0% 3.800M ± 0% +21.60% (p=0.002 n=6) Notice that on average the memory consumption per operation is over 75% smaller. The time per operation increased by 15%, which may actual be less on long running applications, due to the decreased GC pressure and the garbage collection costs. Signed-off-by: Paulo Gomes <pjbgf@linux.com>
* storage: filesystem, Populate index before use. Fixes #148Arieh Schneier2023-05-041-0/+10
| | | | Signed-off-by: Arieh Schneier <15041913+AriehSchneier@users.noreply.github.com>
* Use Sync.Pool pointers to optimise memory usagePaulo Gomes2022-11-071-1/+13
| | | | Signed-off-by: Paulo Gomes <pjbgf@linux.com>
* plumbing: format/packfile, prevent large objects from being read into memory ↵zeripath2021-06-301-6/+15
| | | | | | | | | | | | | | | completely (#330) This PR adds code to prevent large objects from being read into memory from packfiles or the filesystem. Objects greater than 1Mb are now no longer directly stored in the cache or read completely into memory. This PR differs and improves the previous broken #323 by fixing several bugs in the reader and transparently wrapping ReaderAt as a Reader. Signed-off-by: Andrew Thornton <art27@cantab.net>
* Revert "plumbing: format/packfile, prevent large objects from being read ↵v5.4.2zeripath2021-06-021-8/+1
| | | | | into memory completely (#303)" (#329) This reverts commit 720c192831a890d0a36b4c6720b60411fa4a0159.
* plumbing: format/packfile, prevent large objects from being read into memory ↵v5.4.0zeripath2021-05-121-1/+8
| | | | | | | | | | | completely (#303) This PR adds code to prevent large objects from being read into memory from packfiles or the filesystem. Objects greater than 1Mb are now no longer directly stored in the cache or read completely into memory. Signed-off-by: Andrew Thornton <art27@cantab.net>
* Support partial hashes in Repository.ResolveRevision.David Symonds2020-07-161-0/+31
| | | | | | | | | | | | | Like `git rev-parse <prefix>`, this enumerates the hashes of objects with the given prefix and adds them to the list of candidates for resolution. This has an exhaustive slow path, which requires enumerating all objects and filtering each one, but also a couple of fast paths for common cases. There's room for future work to make this faster; TODOs have been left for that. Fixes #135.
* Close Reader & Writer of EncodedObject after useKyungmin Bae2020-05-241-0/+2
|
* *: migration from gopkg to go modulesMáximo Cuadros2020-03-101-10/+10
|
* filesystem: ObjectStorage, MaxOpenDescriptors optionArran Walker2019-04-221-39/+106
| | | | | | | | The MaxOpenDescriptors option provides a middle ground solution between keeping all packfiles open (as offered by the KeepDescriptors option) and keeping none open. Signed-off-by: Arran Walker <arran.walker@fiveturns.org>
* storage/filesystem: check file object before using cacheJavi Fontan2019-01-301-5/+4
| | | | | | | | | | If the cache is shared between several repositories getFromUnpacked can erroneously return an object from other repository. This decreases performance a little bit as there's an extra fs operation when the object is in the cache but is correct when the cache is shared. Signed-off-by: Javi Fontan <jfontan@gmail.com>
* plumbing: format/packfile, performance optimizations for reading large ↵Filip Navara2018-11-281-23/+36
| | | | | | commit histories (#963) Signed-off-by: Filip Navara <navara@emclient.com>
* storage/filesystem: Added reindex method to reindex packfilesJavier Peletier2018-11-121-0/+5
| | | | Signed-off-by: Javier Peletier <jm@epiclabs.io>
* filesystem: add a new test for EncodedObjectSizeJeremy Stribling2018-10-121-3/+1
| | | | | | Suggested by taruti. Signed-off-by: Jeremy Stribling <strib@alum.mit.edu>
* object: get object size without reading whole objectJeremy Stribling2018-10-111-0/+75
| | | | Signed-off-by: Jeremy Stribling <strib@alum.mit.edu>
* storage/filesystem: add more doc to NewPackfileIterJavi Fontan2018-09-211-4/+7
| | | | Signed-off-by: Javi Fontan <jfontan@gmail.com>
* storage/filesystem: keep packs open in PackfileIterJavi Fontan2018-09-201-10/+23
| | | | | | | | PackfileIter was not taking into account the option KeepDescriptors and was always closing the file. This caused "file already closed" errors when iterating packfiles in with KeepDescriptors active. Signed-off-by: Javi Fontan <jfontan@gmail.com>
* Expose Storage cache.kuba--2018-09-071-19/+10
| | | | Signed-off-by: kuba-- <kuba@sourced.tech>
* storage/dotgit: add KeepDescriptors optionJavi Fontan2018-09-041-1/+9
| | | | | | | | | | This option maintains packfile file descriptors opened after reading objects from them. It improves performance as it does not have to be opening packfiles each time an object is needed. Also adds Close to EncodedObjectStorer to close all the files manualy. Signed-off-by: Javi Fontan <jfontan@gmail.com>
* storage/filesystem: move Options to filesytem and dotgitJavi Fontan2018-09-031-0/+12
| | | | Signed-off-by: Javi Fontan <jfontan@gmail.com>
* plumbing, storage: add bases to the common cacheJavi Fontan2018-08-221-2/+16
| | | | | | | | | | | | After clone only resolved deltas were added to the cache. This caused slowdowns in small repositories where most objects can be held in cache. It also makes packfiles reuse delta cache from the store. Previously it created a new delta cache each time a packfile object was created. This also slowed down a bit accessing objects and had an impact on memory consumption when bases are added to the cache. Signed-off-by: Javi Fontan <jfontan@gmail.com>
* plumbing: packfile, open and close packfile on FSObject readsMiguel Molina2018-08-091-9/+6
| | | | Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
* storage: filesystem, close Packfile after iterating objectsMiguel Molina2018-08-091-1/+10
| | | | Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
* storage: filesystem, benchmark PackfileIterMiguel Molina2018-08-091-4/+26
| | | | Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
* *: use parser to populate non writable storages and bug fixesMiguel Molina2018-08-071-47/+30
| | | | Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
* plumbing, storage: integrate new indexJavi Fontan2018-07-261-17/+29
| | | | | | Now dotgit.PackWriter uses the new packfile.Parser and index. Signed-off-by: Javi Fontan <jfontan@gmail.com>
* plumbing/format/idxfile: add new Index and MemoryIndexMiguel Molina2018-07-191-1/+1
| | | | Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
* storage: filesystem, make ObjectStorage constructor publicMiguel Molina2018-06-081-2/+3
| | | | Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
* dotgit: Move package outside internal.Antonio Jesus Navarro Perez2018-06-051-1/+1
| | | | Signed-off-by: Antonio Jesus Navarro Perez <antnavper@gmail.com>
* *: Use CheckClose with named returnsJavi Fontan2018-03-271-4/+4
| | | | | | | | Previously some close errors were losts. This is specially problematic in go-git as lots of work is done here like generating indexes and moving packfiles. Signed-off-by: Javi Fontan <jfontan@gmail.com>
* storage/filesystem: optimize packfile iteratorDenys Smirnov2018-03-031-22/+61
| | | | | | | | * do not store extra bool values in the seen map * open packfile iterators lazily Signed-off-by: Denys Smirnov <denys@sourced.tech>
* Make DeltaBaseCache privateJavi Fontan2017-12-201-6/+6
| | | | Signed-off-by: Javi Fontan <jfontan@gmail.com>
* Enforce the use of cache in packfile decoderJavi Fontan2017-12-201-5/+2
| | | | | | | | | | | | | | | | | Decoder object can make use of an object cache to speed up processing. Previously the only way to specify it was changing manually the struct generated by NewDecodeForFile. This lead to some instances to be created without it and penalized performance. Now the cache should be explicitly passed to the constructor function. NewDecoder now creates objects with a cache using the default size. A new helper function was added to create cache objects with the default size as this becomes a common task now: cache.NewObjectLRUDefault() Signed-off-by: Javi Fontan <jfontan@gmail.com>
* storage: filesystem, add support for git alternates (#663)Sunny2017-12-061-0/+21
| | | | This change adds a new method Alternates() in DotGit to check and query alternate source.
* storage: some minor code cleanupJeremy Stribling2017-11-291-6/+3
| | | | | | Suggested by mcuadros. Issue: #669
* plumbing: add `HasEncodedObject` method to StorerJeremy Stribling2017-11-291-0/+26
| | | | | | | This allows the user to check whether an object exists, without reading all the object data from storage. Issue: KBFS-2445
* Make object repacking more configurableTaru Karttunen2017-11-291-2/+2
|
* Support for repacking objectsTaru Karttunen2017-11-291-0/+8
|
* First pass of prune designTaru Karttunen2017-11-291-0/+24
|
* all: simplificationferhat elmas2017-11-291-2/+2
| | | | | | | | | | - no length for map initialization - don't check for boolean/error return - don't format string - use string method of bytes buffer instead of converting bytes to string - use `strings.Contains` instead of `strings.Index` - use `bytes.Equal` instead of `bytes.Compare`
* update to go-billy.v4 and go-git-fixtures.v3Máximo Cuadros2017-11-231-1/+1
| | | | Signed-off-by: Máximo Cuadros <mcuadros@gmail.com>
* Merge pull request #515 from smola/reuse-packed-objectsMáximo Cuadros2017-07-271-7/+97
|\ | | | | storage: reuse deltas from packfiles
| * storage: reuse deltas from packfilesSantiago M. Mola2017-07-271-7/+97
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * plumbing: add DeltaObject interface for EncodedObjects that are deltas and hold additional information about them, such as the hash of the base object. * plumbing/storer: add DeltaObjectStorer interface for object storers that can return DeltaObject. Note that calls to EncodedObject will never return instances of DeltaObject. That requires explicit calls to DeltaObject. * storage/filesystem: implement DeltaObjectStorer interface. * plumbing/packfile: packfile encoder now supports reusing deltas that are already computed (e.g. from an existing packfile) if the storage implements DeltaObjectStorer. Reusing deltas boosts performance of packfile generation (e.g. on push).
* | filesystem: reuse cache for packfile iteratorSantiago M. Mola2017-07-271-3/+4
|/
* plumbing/cache: change FIFO to LRU cacheSantiago M. Mola2017-07-271-1/+1
|
* storage/filesystem: reuse delta cacheSantiago M. Mola2017-07-271-1/+9
| | | | | Reuse delta base object cache for packfile decoders across multiple instances.
* packfile: create packfile.Index and reuse itSantiago M. Mola2017-07-261-33/+18
| | | | | | | | | | | | | | | There was an internal type (i.e. storage/filesystem.idx) to use as in-memory index for packfiles. This was not convenient to reuse in the packfile. This commit creates a new representation (format/packfile.Index) that can be converted to and from idxfile.Idxfile. A packfile.Index now contains the functionality that was scattered on storage/filesystem.idx and packfile.Decoder's internals. storage/filesystem now reuses packfile.Index instances and this also results in higher cache hit ratios when resolving deltas.
* storage/filesystem: check all Close errorsSantiago M. Mola2017-07-191-9/+12
|
* *: upgrade to go-billy.v3, mergeMáximo Cuadros2017-06-181-1/+1
|
* Lazily load object index.JP Sugarbroad2017-04-061-6/+22
| | | | fixes #327