Normally, the content of files in the annex is prevented from being modified. (Unless your repository is using direct mode.)
That's a good thing, because it might be the only copy, you wouldn't want to lose it in a fumblefingered mistake.
# echo oops > my_cool_big_file
bash: my_cool_big_file: Permission denied
In order to modify a file, it should first be unlocked.
# git annex unlock my_cool_big_file
unlock my_cool_big_file (copying...) ok
That replaces the symlink that normally points at its content with a copy of the content. You can then modify the file like any regular file. Because it is a regular file.
(If you decide you don't need to modify the file after all, or want to discard
modifications, just use git annex lock
.)
When you git commit
, git-annex's pre-commit hook will automatically
notice that you are committing an unlocked file, and add its new content
to the annex. The file will be replaced with a symlink to the new content,
and this symlink is what gets committed to git in the end.
# echo "now smaller, but even cooler" > my_cool_big_file
# git commit my_cool_big_file -m "changed an annexed file"
add my_cool_big_file ok
[master 64cda67] changed an annexed file
1 files changed, 1 insertions(+), 1 deletions(-)
There is one problem with using git commit
like this: Git wants to first
stage the entire contents of the file in its index. That can be slow for
big files (sorta why git-annex exists in the first place). So, the
automatic handling on commit is a nice safety feature, since it prevents
the file content being accidentally committed into git. But when working with
big files, it's faster to explicitly add them to the annex yourself
before committing.
# echo "now smaller, but even cooler yet" > my_cool_big_file
# git annex add my_cool_big_file
add my_cool_big_file ok
# git commit my_cool_big_file -m "changed an annexed file"
I think that git-annex's usefulness is not only because of the Git's index overhead: I like its idea because it will help track the copies in the "special remotes", which are not Git because
ATM unlock copies original file for modifications, so that original copy remains under annex and possibly to-be-edited copy created. But if I am "unlock"ing file I might simply not be interested in a previous copy and want to maintain only a single (possibly edited) new copy. What if there was a mode where the actual file is simply moved into "unlocked" location for editing, thus effectively dropping the actual content from git annex. That would allow to not inquire lengthy copying/wasting local space. If then I would need a previous copy I would just "get" it again.