diff options
author | Paulo Gomes <pjbgf@linux.com> | 2023-07-20 21:19:28 +0100 |
---|---|---|
committer | Paulo Gomes <pjbgf@linux.com> | 2023-10-28 03:51:35 +0100 |
commit | 1c361adbc1f4b0e3d0743d11f187fb0b3ac4cb4d (patch) | |
tree | b7eecbf517a8e1a59047cb98d414bf00ba1f4653 /storage/filesystem/object.go | |
parent | 814abc098d033f77315d3bfb89ae5991aae10457 (diff) | |
download | go-git-1c361adbc1f4b0e3d0743d11f187fb0b3ac4cb4d.tar.gz |
plumbing: Optimise memory consumption for filesystem storage
Previously, as part of building the index representation, the resolveObject
func would create an interim plumbing.MemoryObject, which would then be
saved into storage via storage.SetEncodedObject. This meant that objects
would be unnecessarily loaded into memory, to then be saved into disk.
The changes streamlines this process by:
- Introducing the LazyObjectWriter interface which enables the write
operation to take places directly against the filesystem-based storage.
- Leverage multi-writers to process the input data once, while targeting
multiple writers (e.g. hasher and storage).
An additional change relates to the caching of object info children within
Parser.get. The cache is now skipped when a seekable filesystem is being
used.
The impact of the changes can be observed when using seekable filesystem
storages, especially when cloning large repositories.
The stats below were captured by adapting the BenchmarkPlainClone test
to clone https://github.com/torvalds/linux.git:
pkg: github.com/go-git/go-git/v5
cpu: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
│ /tmp/old │ /tmp/new │
│ sec/op │ sec/op vs base │
PlainClone-16 41.68 ± 17% 48.04 ± 9% +15.27% (p=0.015 n=6)
│ /tmp/old │ /tmp/new │
│ B/op │ B/op vs base │
PlainClone-16 1127.8Mi ± 7% 256.7Mi ± 50% -77.23% (p=0.002 n=6)
│ /tmp/old │ /tmp/new │
│ allocs/op │ allocs/op vs base │
PlainClone-16 3.125M ± 0% 3.800M ± 0% +21.60% (p=0.002 n=6)
Notice that on average the memory consumption per operation is over 75%
smaller. The time per operation increased by 15%, which may actual be less
on long running applications, due to the decreased GC pressure and the
garbage collection costs.
Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Diffstat (limited to 'storage/filesystem/object.go')
-rw-r--r-- | storage/filesystem/object.go | 13 |
1 files changed, 13 insertions, 0 deletions
diff --git a/storage/filesystem/object.go b/storage/filesystem/object.go index 846a7b8..e812fe9 100644 --- a/storage/filesystem/object.go +++ b/storage/filesystem/object.go @@ -146,6 +146,19 @@ func (s *ObjectStorage) SetEncodedObject(o plumbing.EncodedObject) (h plumbing.H return o.Hash(), err } +// LazyWriter returns a lazy ObjectWriter that is bound to a DotGit file. +// It first write the header passing on the object type and size, so +// that the object contents can be written later, without the need to +// create a MemoryObject and buffering its entire contents into memory. +func (s *ObjectStorage) LazyWriter() (w io.WriteCloser, wh func(typ plumbing.ObjectType, sz int64) error, err error) { + ow, err := s.dir.NewObject() + if err != nil { + return nil, nil, err + } + + return ow, ow.WriteHeader, nil +} + // HasEncodedObject returns nil if the object exists, without actually // reading the object data from storage. func (s *ObjectStorage) HasEncodedObject(h plumbing.Hash) (err error) { |