diff options
author | Alberto Cortés <alcortesm@gmail.com> | 2016-07-04 17:09:22 +0200 |
---|---|---|
committer | Máximo Cuadros <mcuadros@gmail.com> | 2016-07-04 17:09:22 +0200 |
commit | 5e73f01cb2e027a8f02801635b79d3a9bc866914 (patch) | |
tree | c0e7eb355c9b8633d99bab9295cb72b6c3a9c0e1 /formats/packfile/doc.go | |
parent | 808076af869550a200a3a544c9ee2fa22a8b6a85 (diff) | |
download | go-git-5e73f01cb2e027a8f02801635b79d3a9bc866914.tar.gz |
Adds support to open local repositories and to use file-based object storage (#55)v3.1.0
* remove some comments
* idx writer/reader
* Shut up ssh tests, they are annoying
* Add file scheme test to clients
* Add dummy file client
* Add test fot file client
* Make tests use fixture endpoint
* add parser for packed-refs format
* add parser for packed-refs format
* WIP adding dir.Refs() tests
* Add test for fixture refs
* refs parser for the refs directory
* Documentation
* Add Capabilities to file client
* tgz.Exatract now accpets a path instead of a Reader
* fix bug in idxfile fanout calculation
* remove dead code
* packfile documentation
* clean packfile parser code
* add core.Object.Content() and returns errors for core.ObjectStorage.Iter()
* add seekable storage
* add dir repos to NewRepository
* clean prints
* Add dir client documentation to README
* Organize the README
* README
* Clean tgz package
* Clean temp dirs after tgz tests
* Gometalinter on gitdir
* Clean pattern function
* metalinter tgz
* metalinter gitdir
* gitdir coverage and remove seekable packfile filedescriptor leak
* gitdir Idxfile tests and remove file descriptor leak
* gitdir Idxfile tests when no idx is found
* clean storage/seekable/internal/index and some formats/idxfile API issues
* clean storage/seekable
* clean formats/idx
* turn packfile/doc.go into packfile/doc.txt
* move formats/packfile/reader to decoder
* fix packfile decoder error names
* improve documentation
* comment packfile decoder errors
* comment public API (format/packfile)
* remve duplicated code in packfile decoder test
* move tracking_reader into an internal package and clean it
* use iota for packfile format
* rename packfile parse.go to packfile object_at.go
* clean packfile deltas
* fix delta header size bug
* improve delta documentation
* clean packfile deltas
* clean packfiles deltas
* clean repository.go
* Remove go 1.5 from Travis CI
Because go 1.5 does not suport internal packages.
* change local repo scheme to local://
* change "local://" to "file://" as the local scheme
* fix broken indentation
* shortens names of variables in short scopes
* more shortening of variable names
* more shortening of variable names
* Rename git dir client to "file", as the scheme used for it
* Fix file format ctor name, now that the package name has change
* Sortcut local repo constructor to not use remotes
The object storage is build directly in the repository ctor, instead
of creating a remote and waiting for the user to pull it.
* update README and fix some errors in it
* remove file scheme client
* Local respositories has now a new ctor
This is, they are no longer identified by the scheme of the URL, but are
created different from inception.
* remove unused URL field form Repository
* move all git dir logic to seekable sotrage ctor
* fix documentation
* Make formats/file/dir an internal package to storage/seekable
* change package storage/seekable to storage/fs
* clean storage/fs
* overall storage/fs clean
* more cleaning
* some metalinter fixes
* upgrade cshared to last changes
* remove dead code
* fix test error info
* remove file scheme check from clients
* fix test error message
* fix test error message
* fix error messages
* style changes
* fix comments everywhere
* style changes
* style changes
* scaffolding and tests for local packfiles without ifx files
* outsource index building from packfile to the packfile decoder
* refactor packfile header reading into a new function
* move code to generate index from packfile back to index package
* add header parsing
* fix documentation errata
* add undeltified and OFS delta support for index building from the packfile
* add tests for packfile with ref-deltas
* support for packfiles with ref-deltas and no idx
* refactor packfile format parser to reuse code
* refactor packfile format parser to reuse code
* refactor packfile format parser to reuse code
* refactor packfile format parser to reuse code
* refactor packfile format parser to reuse code
* WIP refactor packfile format parser to reuse code
* refactor packfile format parser to reuse code
* remove prints from tests
* remove prints from tests
* refactor packfile.core into packfile.parser
* rename packfile reader to something that shows it is a recaller
* rename cannot recall error
* rename packfile.Reader to packfile.ReadRecaller and document
* speed up test by using StreamReader instead of SeekableReader when possible
* clean packfile StreamReader
* stream_reader tests
* refactor packfile.StreamReader into packfile.StreamReadRecaller
* refactor packfile.SeekableReader into packfile.SeekableReadRecaller and document it
* generalize packfile.StreamReadRecaller test to all packfile.ReadRecaller implementations
* speed up storage/fs tests
* speed up tests in . by loading packfiles in memory
* speed up repository tests by using and smaller fixture
* restore doc.go files
* rename packfile.ReadRecaller implementations to shorter names
* update comments to type changes
* packfile.Parser test (WIP)
* packfile.Parser tests and add ForgetAll() to packfile.ReadRecaller
* add test for packfile.ReadRecaller.ForgetAll()
* clarify seekable being able to recallByOffset forgetted objects
* use better names for internal maps
* metalinter packfile package
* speed up some tests
* documentation fixes
* change storage.fs package name to storage.proxy to avoid confusion with new filesystem support
* New fs package and os transparent implementation
Now NewRepositoryFromFS receives a fs and a path and tests are
modified accordingly, but it is still not using for anything.
* add fs to gitdir and proxy.store
* reduce fs interface for easier implementation
* remove garbage dirs from tgz tests
* change file name gitdir/dir.go to gitdir/gitdir.go
* fs.OS tests
* metalinter utils/fs
* add NewRepositoryFromFS documentation to README
* Readability fixes to README
* move tgz to an external dependency
* move filesystem impl. example to example dir
* rename proxy/store.go to proxy/storage.go for coherence with memory/storage.go
* rename proxy package to seekable
Diffstat (limited to 'formats/packfile/doc.go')
-rw-r--r-- | formats/packfile/doc.go | 331 |
1 files changed, 167 insertions, 164 deletions
diff --git a/formats/packfile/doc.go b/formats/packfile/doc.go index cb3f542..c79c180 100644 --- a/formats/packfile/doc.go +++ b/formats/packfile/doc.go @@ -1,165 +1,168 @@ -package packfile +// Package packfile documentation: +/* + +GIT pack format +=============== + +== pack-*.pack files have the following format: + + - A header appears at the beginning and consists of the following: + + 4-byte signature: + The signature is: {'P', 'A', 'C', 'K'} + + 4-byte version number (network byte order): + GIT currently accepts version number 2 or 3 but + generates version 2 only. + + 4-byte number of objects contained in the pack (network byte order) + + Observation: we cannot have more than 4G versions ;-) and + more than 4G objects in a pack. + + - The header is followed by number of object entries, each of + which looks like this: + + (undeltified representation) + n-byte type and length (3-bit type, (n-1)*7+4-bit length) + compressed data + + (deltified representation) + n-byte type and length (3-bit type, (n-1)*7+4-bit length) + 20-byte base object name + compressed delta data + + Observation: length of each object is encoded in a variable + length format and is not constrained to 32-bit or anything. + + - The trailer records 20-byte SHA1 checksum of all of the above. + +== Original (version 1) pack-*.idx files have the following format: + + - The header consists of 256 4-byte network byte order + integers. N-th entry of this table records the number of + objects in the corresponding pack, the first byte of whose + object name is less than or equal to N. This is called the + 'first-level fan-out' table. + + - The header is followed by sorted 24-byte entries, one entry + per object in the pack. Each entry is: + + 4-byte network byte order integer, recording where the + object is stored in the packfile as the offset from the + beginning. + + 20-byte object name. + + - The file is concluded with a trailer: + + A copy of the 20-byte SHA1 checksum at the end of + corresponding packfile. -// GIT pack format -// =============== -// -// == pack-*.pack files have the following format: -// -// - A header appears at the beginning and consists of the following: -// -// 4-byte signature: -// The signature is: {'P', 'A', 'C', 'K'} -// -// 4-byte version number (network byte order): -// GIT currently accepts version number 2 or 3 but -// generates version 2 only. -// -// 4-byte number of objects contained in the pack (network byte order) -// -// Observation: we cannot have more than 4G versions ;-) and -// more than 4G objects in a pack. -// -// - The header is followed by number of object entries, each of -// which looks like this: -// -// (undeltified representation) -// n-byte type and length (3-bit type, (n-1)*7+4-bit length) -// compressed data -// -// (deltified representation) -// n-byte type and length (3-bit type, (n-1)*7+4-bit length) -// 20-byte base object name -// compressed delta data -// -// Observation: length of each object is encoded in a variable -// length format and is not constrained to 32-bit or anything. -// -// - The trailer records 20-byte SHA1 checksum of all of the above. -// -// == Original (version 1) pack-*.idx files have the following format: -// -// - The header consists of 256 4-byte network byte order -// integers. N-th entry of this table records the number of -// objects in the corresponding pack, the first byte of whose -// object name is less than or equal to N. This is called the -// 'first-level fan-out' table. -// -// - The header is followed by sorted 24-byte entries, one entry -// per object in the pack. Each entry is: -// -// 4-byte network byte order integer, recording where the -// object is stored in the packfile as the offset from the -// beginning. -// -// 20-byte object name. -// -// - The file is concluded with a trailer: -// -// A copy of the 20-byte SHA1 checksum at the end of -// corresponding packfile. -// -// 20-byte SHA1-checksum of all of the above. -// -// Pack Idx file: -// -// -- +--------------------------------+ -// fanout | fanout[0] = 2 (for example) |-. -// table +--------------------------------+ | -// | fanout[1] | | -// +--------------------------------+ | -// | fanout[2] | | -// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | -// | fanout[255] = total objects |---. -// -- +--------------------------------+ | | -// main | offset | | | -// index | object name 00XXXXXXXXXXXXXXXX | | | -// table +--------------------------------+ | | -// | offset | | | -// | object name 00XXXXXXXXXXXXXXXX | | | -// +--------------------------------+<+ | -// .-| offset | | -// | | object name 01XXXXXXXXXXXXXXXX | | -// | +--------------------------------+ | -// | | offset | | -// | | object name 01XXXXXXXXXXXXXXXX | | -// | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | -// | | offset | | -// | | object name FFXXXXXXXXXXXXXXXX | | -// --| +--------------------------------+<--+ -// trailer | | packfile checksum | -// | +--------------------------------+ -// | | idxfile checksum | -// | +--------------------------------+ -// .-------. -// | -// Pack file entry: <+ -// -// packed object header: -// 1-byte size extension bit (MSB) -// type (next 3 bit) -// size0 (lower 4-bit) -// n-byte sizeN (as long as MSB is set, each 7-bit) -// size0..sizeN form 4+7+7+..+7 bit integer, size0 -// is the least significant part, and sizeN is the -// most significant part. -// packed object data: -// If it is not DELTA, then deflated bytes (the size above -// is the size before compression). -// If it is REF_DELTA, then -// 20-byte base object name SHA1 (the size above is the -// size of the delta data that follows). -// delta data, deflated. -// If it is OFS_DELTA, then -// n-byte offset (see below) interpreted as a negative -// offset from the type-byte of the header of the -// ofs-delta entry (the size above is the size of -// the delta data that follows). -// delta data, deflated. -// -// offset encoding: -// n bytes with MSB set in all but the last one. -// The offset is then the number constructed by -// concatenating the lower 7 bit of each byte, and -// for n >= 2 adding 2^7 + 2^14 + ... + 2^(7*(n-1)) -// to the result. -// -// -// -// == Version 2 pack-*.idx files support packs larger than 4 GiB, and -// have some other reorganizations. They have the format: -// -// - A 4-byte magic number '\377tOc' which is an unreasonable -// fanout[0] value. -// -// - A 4-byte version number (= 2) -// -// - A 256-entry fan-out table just like v1. -// -// - A table of sorted 20-byte SHA1 object names. These are -// packed together without offset values to reduce the cache -// footprint of the binary search for a specific object name. -// -// - A table of 4-byte CRC32 values of the packed object data. -// This is new in v2 so compressed data can be copied directly -// from pack to pack during repacking without undetected -// data corruption. -// -// - A table of 4-byte offset values (in network byte order). -// These are usually 31-bit pack file offsets, but large -// offsets are encoded as an index into the next table with -// the msbit set. -// -// - A table of 8-byte offset entries (empty for pack files less -// than 2 GiB). Pack files are organized with heavily used -// objects toward the front, so most object references should -// not need to refer to this table. -// -// - The same trailer as a v1 pack file: -// -// A copy of the 20-byte SHA1 checksum at the end of -// corresponding packfile. -// -// 20-byte SHA1-checksum of all of the above. -// -// From: -// https://www.kernel.org/pub/software/scm/git/docs/v1.7.5/technical/pack-protocol.txt + 20-byte SHA1-checksum of all of the above. + +Pack Idx file: + + -- +--------------------------------+ +fanout | fanout[0] = 2 (for example) |-. +table +--------------------------------+ | + | fanout[1] | | + +--------------------------------+ | + | fanout[2] | | + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | + | fanout[255] = total objects |---. + -- +--------------------------------+ | | +main | offset | | | +index | object name 00XXXXXXXXXXXXXXXX | | | +table +--------------------------------+ | | + | offset | | | + | object name 00XXXXXXXXXXXXXXXX | | | + +--------------------------------+<+ | + .-| offset | | + | | object name 01XXXXXXXXXXXXXXXX | | + | +--------------------------------+ | + | | offset | | + | | object name 01XXXXXXXXXXXXXXXX | | + | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | + | | offset | | + | | object name FFXXXXXXXXXXXXXXXX | | + --| +--------------------------------+<--+ +trailer | | packfile checksum | + | +--------------------------------+ + | | idxfile checksum | + | +--------------------------------+ + .-------. + | +Pack file entry: <+ + + packed object header: + 1-byte size extension bit (MSB) + type (next 3 bit) + size0 (lower 4-bit) + n-byte sizeN (as long as MSB is set, each 7-bit) + size0..sizeN form 4+7+7+..+7 bit integer, size0 + is the least significant part, and sizeN is the + most significant part. + packed object data: + If it is not DELTA, then deflated bytes (the size above + is the size before compression). + If it is REF_DELTA, then + 20-byte base object name SHA1 (the size above is the + size of the delta data that follows). + delta data, deflated. + If it is OFS_DELTA, then + n-byte offset (see below) interpreted as a negative + offset from the type-byte of the header of the + ofs-delta entry (the size above is the size of + the delta data that follows). + delta data, deflated. + + offset encoding: + n bytes with MSB set in all but the last one. + The offset is then the number constructed by + concatenating the lower 7 bit of each byte, and + for n >= 2 adding 2^7 + 2^14 + ... + 2^(7*(n-1)) + to the result. + + + +== Version 2 pack-*.idx files support packs larger than 4 GiB, and + have some other reorganizations. They have the format: + + - A 4-byte magic number '\377tOc' which is an unreasonable + fanout[0] value. + + - A 4-byte version number (= 2) + + - A 256-entry fan-out table just like v1. + + - A table of sorted 20-byte SHA1 object names. These are + packed together without offset values to reduce the cache + footprint of the binary search for a specific object name. + + - A table of 4-byte CRC32 values of the packed object data. + This is new in v2 so compressed data can be copied directly + from pack to pack during repacking without undetected + data corruption. + + - A table of 4-byte offset values (in network byte order). + These are usually 31-bit pack file offsets, but large + offsets are encoded as an index into the next table with + the msbit set. + + - A table of 8-byte offset entries (empty for pack files less + than 2 GiB). Pack files are organized with heavily used + objects toward the front, so most object references should + not need to refer to this table. + + - The same trailer as a v1 pack file: + + A copy of the 20-byte SHA1 checksum at the end of + corresponding packfile. + + 20-byte SHA1-checksum of all of the above. + +From: +https://www.kernel.org/pub/software/scm/git/docs/v1.7.5/technical/pack-protocol.txt +*/ +package packfile |