Name
cvs-fast-export — fast-export history from a CVS repository or RCS collection.
Synopsis
cvs-fast-export
[-h] [-C] [-F] [-a] [-w fuzz] [-g] [-l] [-v] [-q] [-V] [-T] [-p] [-P]
[-i date] [-k expansion] [-A authormap] [-t threads]
[-R revmap] [--reposurgeon] [-e remote] [-s stripprefix]
DESCRIPTION
cvs-fast-export tries to group the per-file commits and tags in a RCS file
collection or CVS project repository into per-project changeset
commits with common metadata, in the style of Subversion and later
version-control systems.
This tool is best used in conjunction with reposurgeon(1). Plain
cvs-fast-export conversions contain various sorts of fossils that
reposurgeon is good for cleaning up. See the
DVCS Migration
HOWTO to learn about the sanity-checking and polishing steps
required for a really high-quality conversion, including reference
lifting and various sorts of artifact cleanup.
If arguments are supplied, the program assumes all ending with the
extension ",v" are master files and reads them in. If no arguments are
supplied, the program reads filenames from stdin, one per
line. Directories and files not ending in ",v" are skipped. (But see
the description of the -P option for how to change this behavior.)
Files from either Unix CVS or CVS-NT are handled. If a collection of
files has commitid fields, changesets will be constructed reliably
using those.
In the default mode, which generates a git-style fast-export stream to
standard output:
-
The prefix given using the -s option or, if the option is omitted, the
longest common prefix of the paths is discarded from each path.
-
Files in CVS Attic and RCS directories are treated as though the "Attic/"
or "RCS/" portion of the path were absent. This usually restores the
history of files that were deleted.
-
Permissions on all fileops related to a particular file will be
controlled by the permissions on the corresponding master. If the
executable bit on the master is on, all its fileops will have 100755
permissions; otherwise 100644.
-
A set of file operations is coalesced into a changeset if either (a) they
all share the same commitid, or (b) all have no commitid but
identical change comments, authors, and modification dates within
the window defined by the time-fuzz parameter. Unlike some other
exporters, no attempt is made to derive changesets from shared tags.
-
Commits are issued in time order unless the cvs-fast-export detects
that some parent is younger than its child (this is unlikely but
possible in cases of severe clock skew). In that case you will see a
warning on standard error and the emission order is guaranteed
topologically correct, but otherwise not specified (and is subject to
change in future versions of this program).
-
CVS tags become git lightweight tags when they can be unambiguously
associated with a changeset. If the same tag is attached to file
deltas that resolve to multiple changesets, it is reported as if
attached to the last of them.
-
The HEAD branch is renamed to master.
-
Other tag and branch names are sanitized to be legal for git;
the characters ~^\*? are removed.
-
Since .cvsignore files have a syntax upward-compatible with that of
.gitignore files, they’re renamed. In order to simulate the default
ignore behavior of CVS, those defaults are prepended to root
.cvsignore blobs renamed to .gitignore, and a root .gitignore
containing the defaults is generated if no such blobs exist.
See the later section on RCS/CVS LIMITATIONS for more information on
edge cases and conversion problems.
This program does not depend on any of the CVS metadata held outside
the individual content files (e.g. under CVSROOT).
The variable TMPDIR is honored and used when generating a temporary
directory in which to store file content during processing.
This program treats the file contents of the source CVS or RCS
repository, and their filenames. as uninterpreted byte sequences to be
passed through to the git conversion without re-encoding. In
particular, it makes no attempt to fix up line endings (Unix \n vs,
Windows \r\n vs. Macintosh \r), nor does it know about what repository
filenames might collide with special filenames on any given platform.
Optionally it may expand CVS $-keywords, but this is not recommended.
This program treats change comments as uninterpreted byte sequences to
be passed through to the git conversion without change or
re-encoding. If you need to re-encode (e.g, from Latin-1 to UTF-8) or
remap CVS version IDs to something useful, use cvs-fast-export
in conjunction with the transcode and references lift commands
of reposurgeon(1).
OPTIONS
-
-h
-
Display usage summary.
-
-w fuzz
-
Set the timestamp fuzz factor for identifying patch sets in seconds.
The default is 300 seconds. This option is irrelevant for changesets
with commitids.
-
-c
-
Don’t trust commit-IDs; match by ordinary metatada. Will be useful if
you have something like a CVS-NT repository in which per-file commits
were made in such a way that the cliques don’t have matching IDs.
-
-k
-
Specify RCS/CVS keyword expansion. You can specify any of the CVS
keyword expansion types: "kv" (keyword-value), "kvl"
(keyword-value-locker), "k" (keyword only), "v" (value only), "o" (no
expansion), or "b" (binary - no exansion, no line ending
conversion). CVS’s rules are: (1) if the master has -kb set in its
internal keyword field, do that, otherwise (2) if an expansion type
was set on the command line, do that, otherwise (3) if the file’s
internal keyword field is set, do that, otherwise use kv.
cvs-fast-export behaves slightly differently; the default is kb rather
than kkv, kvl is treated like kv, and ko is treated like kb (that
is, no end-of-line normalization is ever performed).
-
-g
-
generate a picture of the commit graph in the DOT markup language
used by the graphviz tools, rather than fast-exporting.
-
-l
-
Warnings normally go to standard error. This option, which takes a
filename, allows you to redirect them to a file. Convenient
with the -p option.
-
-a
-
Dump a list of author IDs found in the repository, rather than fast-exporting.
-
-C
-
Force canonical order (same as git-fast-export’s) in the emitted
stream. Blobs are emitted as late as possible before the commits that
require them. This option reduces throughput by about a factor of two. Repositories
in which the time order of commits is inconsistent with their
topological order will produce malformed fast-export streams in -C
mode; these will crash Git’s importer.
-
-F
-
Force fast order. Blobs are emitted first, then commits.
-
-A authormap
Apply an author-map file to the attribution lines. Each line must be
of the form
ferd = Ferd J. Foonly <foonly@foo.com> America/Chicago
and will be applied to map the Unix username ferd to the DVCS-style
user identity specified after the equals sign. The timezone field
(after > and whitespace) is optional and (if present) is used to set
the timezone offset to be attached to the date; acceptable formats for
the timezone field are anything that can be in the TZ environment
variable, including a [+-]hhmm offset. Whitespace around the equals
sign is stripped. Lines beginning with a # or not containing an
equals sign are silently ignored.
-
-R revmap
-
Write a revision map to the specified argument filename. Each line of
the revision map consists of three whitespace-separated fields: a
filename, an RCS revision number, and the mark of the commit to which
that filename-revision pair was assigned. Doesn’t work with -g.
-
-v
-
Show verbose progress messages mainly of interest to developers.
-
-q
-
Run quietly, suppressing warning messages about absence of commitids
and other minor problems for which the program can usually compensate but
which may indicate conversion problems. Meant to be used with
cvsconvert, which does its own correctness checking.
-
-T
-
Force deterministic dates for regression testing. Each patchset will
have a monotonic-increasing attributed date computed from its mark in
the output stream - the mark value times the commit time window times two.
-
--reposurgeon
-
Emit for each commit a list of the CVS file:revision pairs composing it as a
bzr-style commit property named "cvs-revisions". From version 2.12
onward, reposurgeon(1) can interpret these and use them as hints for
reference-lifting. Also, suppresses emission of "done" trailer.
-
--embed-id
-
Append to each commit comment identification of the CVS commits that
contributed to it.
-
-V
-
Emit the program version and exit.
-
-e remote
-
Exported branch names are prefixed with refs/remotes/remote instead of
refs/heads, making the import appear to come from the named remote.
-
-s stripprefix
-
Strip the given prefix instead of longest common prefix
-
-t threadcount
-
Running multithreaded increases the program’s memory footprint
proportionally to the number of threads, but means the conversion may
run in less total time because an I/O operation involving one master
file will not block compute-intensive processing of others. By
default, the program conservatively assumes it can use two threads per
processor available. You can use this option to set the number of threads;
the value 0 forces sequential processing with no threading.
-
-p
-
Enable progress reporting. This also dumps statistics (elapsed time
and size of maximum resident set) for several points in the conversion
run.
-
-P
-
Normally cvs-fast-export will skip any filename presented as an argument
or on stdin that does not end with the RCS/CVS extension ",v", and
will also ignore a pathname containing the string CVSROOT (this
avoids annoyances when running from or above a top-level CVS directory).
A strict reading of RCS allows masters without the ,v extension. This
option sets promiscuous mode, disabling both checks.
-
-i date
-
Enable incremental-dump mode. Only commits with a date after that
specified by the argument are emitted. Each branch root in the
incremental dump is decorated with git-stream magic which, when
interpreted in context of a live repository, will connect that branch
to any branch of the same name. The date is expected to be RFC3339
conformant (e.g. yy-mm-ddThh:mm:ssZ) or else an integer Unix time
in seconds.
If neither -F nor -C is specified, cvs-fast-export will choose a mode
based on the repository size - canonical order for small repositories,
fast for large ones. Tools that consume git-fast-import streams should not
care; this behavior is for backward compatibility.
EXAMPLE
A very typical invocation would look like this:
find . | cvs-fast-export >stream.fi
Your cvs-fast-export distribution should also supply cvssync(1), a
tool for fetching CVS masters from a remote repository. Using
them together will look something like this:
cvssync anonymous@cvs.savannah.gnu.org:/sources/groff groff
find groff | cvs-fast-export >groff.fi
Progress reporting can be reassuring if you expect a conversion
to run for some time. It will animate completion percentages
as the conversion proceeds and display timings when done.
The cvs-fast-export suite contains a wrapper script called
cvsconvert that is useful for running a conversion and automatically
checking its content against the CVS original.
RCS/CVS LIMITATIONS
Translating RCS/CVS repositories to the generic DVCS model expressed
by import streams is not merely difficult and messy, there are weird
RCS/CVS cases that cannot be correctly translated at all.
cvs-fast-export will try to warn you about these cases rather than
silently producing broken or incomplete translations, but there be
dragons. We recommend some precautions under SANITY CHECKING.
Timestamps from CVS histories are not very reliable - assume any
individual one can be off by up to 24 hours in a random direction,
and do not assume that timestamp order in the converted repository.
CVS made them on the client side rather than at the server; this
makes them subject to local clock skew, timezone, and DST issues.
CVS-NT and versions of GNU CVS after 1.12 (2004) added a changeset
commit-id to file metadata. Older sections of CVS history without
these are vulnerable to various problems caused by clock skew between
clients; this used to be relatively common for multiple reasons,
including less pervasive use of NTP clock synchronization. cvs-fast-export
will warn you ("commits before this date lack commitids") when it sees
such a section in your history. When it does, these caveats apply:
-
If timestamps of commits in the CVS repository were not stable
enough to be used for ordering commits, changes may be reported in the
wrong order.
-
If the timestamp order of different files crosses the revision order
within the commit-matching time window, the order of commits reported
may be wrong.
One more property affected by commitids is the stability of old
changesets under incremental dumping. Under a CVS implementation
issuing commitids, new CVS commits are guaranteed not to change
cvs-fast-export’s changeset derivation from a previous history;
thus, updating a target DVCS repository with incremental dumps
from a live CVS installation will work. Even if older portions
of the history do not have commitids, conversions will be stable.
This stability guarantee is lost if you are using a version of
CVS that does not issue commitids.
Also note that a CVS repository has to be completely reanalyzed
even for incremental dumps; thus, processing time and memory
requirements will rise with the total repository size even when
the requested reporting interval of the incremental dump is small.
These problems cannot be fixed in cvs-fast-export; they are inherent to CVS.
CVS-FAST-EXPORT REQUIREMENTS AND LIMITATIONS
Because the code is designed for dealing with large data sets, it has
been optimized for 64-bit machines and no particular effort has been
made to keep it 32-bit clean. Various counters may overflow if you
try using it to lift a large repository on a 32-bit machine.
ranches occurring in only a subset of the analyzed masters are not
correctly resolved; instead, an entirely disjoint history will be
created containing the branch revisions and all parents back to the
root.
CVS vendor branches are a source of trouble. Sufficiently strange
combinations of imports and local modifications will translate
badly, producing incorrect content on master and elsewhere.
Some other CVS exporters try, or have tried, to deduce changesets from
shared tags even when comment metadata doesn’t match perfectly. This
one does not; the designers judge that to trip over too many
pathological CVS tagging cases.
The program does try to do something useful cases in which a tag
occurs in a set of revisions that does not correspond to any gitspace
commit. In this case a tagged branch containing only one commit is
created, guaranteeing that you can check out a set of files containing
the CVS content for the tag. The commit comment is "Synthetic commit
for incomplete tag XXX", where XXX is the relevant tag. The root of
the branchlet is the gitspace commit where the latest CVS revision in
in the tagged set first occurs; this is the commit the tag would point
at if its incompleteness were ignored. The change in the branchlet
commit is also applied forward in the nearby mainline.
When running multithreaded, there is an edge case in which the
program’s behavior is nondeterministic. If the same tag looks like it
should be assigned to two different gitspace commits with the same
timestamp, which tag it actually lands on will be random.
cvs-fast-export is designed to do translation with all its
intermediate structures in memory, in one pass. This contrasts with
cvs2git(1), which uses multiple passes and journals intermediate
structures to disk. The tradeoffs are that cvs-fast-export is much
faster than cvs2git (by a ratio of over 100:1 on real repositories),
but will fail with an out-of-memory error on CVS repositories large
enough to overflow your physical memory. In practice, you are unlikely
to push this limit on a machine with 32GB of RAM and effectively
certain not to with 64GB. Attempts to do large conversions in only a
32-bit (4GB) address space are, on the other hand, unlikely to end
well.
The program’s transient storage requirements can be quite a bit
larger; it must slurp in each entire master file once in order to
do delta assembly and generate the version snapshots that will
become snapshots. Using the -t option multiplies the expected amount
of transient storage required by the number of threads; use with
care, as it is easy to push memory usage so high that swap overhead
overwhelms the gains from not constantly blocking on I/O.
In -C mode, the program also requires temporary disk space equivalent
to the sum of the sizes of all revisions in all files. This is not so
in -F mode.
On stock PC hardware in 2014, cvs-fast-export achieves processing
speeds upwards of 64K CVS commits per minute on real repositories.
Time performance is primarily I/O bound and can be improved by running
on an SSD.
SANITY CHECKING
After conversion, it is good practice to do the following verification
steps:
-
If you ran the conversion directly with cvs-fast-export rather than
using cvsconvert, use diff(1) with the -r option to compare a CVS head
checkout with a checkout of the converted repository. The only
differences you should see are those due to RCS keyword expansion,
.cvsignore lifting, and manifest mismatches due to CVS not tracking
file deaths quite correctly. If this is not true, you may have found a bug
in cvs-fast-export; please report it with a copy of the CVS repo.
-
Examine the translated repository with reposurgeon(1) looking (in
particular) for misplaced tags or branch joins. Often these can be
manually repaired with little effort. These flaws do not necessarily
imply bugs in cvs-fast-export; they may simply indicate previously
undetected malformations in the CVS history. However, reporting them may
help improve cvs-fast-export.
The above is an abbreviated version of part of
DVCS Migration
HOWTO; browse it for more.
RETURN VALUE
0 if all files were found and successfully converted, 1 otherwise.
ERROR MESSAGES
Most of the messages cvs-fast-export emits are self-explanatory. Here
are a few that aren’t. Where it says "check head", be sure to
sanity-check against the head revision.
-
null branch name, probably from a damaged Attic file
-
The code was unable to deduce a name for a branch and tried to
export a null pointer as a name. The branch is given the name
"null". It is likely this history will need repair.
-
fatal: internal error - duplicate key in red black tree
-
Multiple tags with identical names exist in one of your master
files. This is a sign of a corrupted revision history; you will
need to manually inspect the master and remove one of the duplicates.
-
child commit emitted before parent exists
-
Skew in client timestamps produced a situation in which time
order of parent and child commits is backwards (or, if you are
running multithreaded, timestamps are the same and happen to have
been processed in the wrong order). The -F option prevents such
pairs from being emitted in the wrong order, at the cost of
producing a commit and blob ordering very different from that
of git-fast-export..
-
tag could not be assigned to a commit
-
RCS/CVS tags are per-file, not per revision. If developers are not
careful in their use of tagging, it can be impossible to associate a
tag with any of the changesets that cvs-fast-export resolves. When
this happens, cvs-fast-export will issue this warning and the tag
named will be discarded.
-
discarding dead untagged branch
-
Analysis found a CVS branch with no tag consisting entirely of
dead revisions. These cannot have been visible in the archival
state of the CVS at conversion time; it is possible they may
have been visible as branch content at some point in the
repository’s past, but without an identifying tag that state
is impossible to reconstruct.
-
warning - unnamed branch
-
A CVS branch with a live revision lacks a head label. A label
with "-UNNAMED-BRANCH" suffixed to the name of the parent branch
will be generated.
-
warning - no master branch generated
-
cvs-fast-export could not identify the default (HEAD) branch and
therefore there is no "master" in the conversion; this will
seriously confuse git and probably other VCSes when they try to
import the output stream. You may be able to identify and rename
a master branch using reposurgeon(1).
-
warning - xxx newer than yyy
-
Early in analysis of a CVS master file, time sort order of its
deltas doesn’t match the topological order defined by the
revision numbers. The most likely cause of this is clock skew
between clients in very old CVS versions. The program will attempt
to correct for this by tweaking the revision date of the
out-of-order commit to be that of its parent, but this may not
prevent other time-skew errors later in analysis.
-
warning - skew_vulnerable in file xxx rev yyy set to zzz
-
This warning is emitted when verbose is on and only on commits
with no commit ID. It calls out commits that coause the date
before which coalescence is unreliable to be set forward.
-
tip commit older than imputed branch join
-
A similar problem to "newer than" being reported at a later
stage, when file branches are being knit into changeset branches.
One CVS branch in a collection about to be collated into a gitspace
branch has a tip commit older than the earliest commit that is a
a parent on some (other) tip in the collection. The adventious
branch is snipped off.
-
some parent commits are younger than children
-
May indicate that cvs-fast-export aggregated some changesets in
the wrong order; probably harmless, but check head.
-
warning - branch point later than branch
-
Late in the analysis, when connecting branches to their parents
in the changeset DAG, the commit date of the root commit of a
branch is earlier than the date of the parent it gets connected
to. Could be yet another clock-skew symptom, or might point to
an error in the program’s topological analysis. Examine commits
near the join with reposurgeon(1); the branch may need to be
reparented by hand.
-
more than one delta with number X.Y.Z
-
The CVS history contained duplicate file delta numbers. Should
never happen, and may indice a corrupted CVS archive if it does;
check head.
-
{revision|patch} with odd depth
-
Should never happen; only branch numbers are supposed to have odd
depth, not file delta or patch numbers. May indicate a corrupted
CVS archive; check head.
-
duplicate tag in CVS master, ignoring
-
A CVS master has multiple instances of the same tag pointing at
different file deltas. Probably a CVS operator error and relatively
harmless, but check that the tag’s referent in the conversion
makes sense.
-
tag or branch name was empty after sanitization
-
Fatal error: tag name was empty after all characters illegal for git
were removed. Probably indicates a corrupted RCS file.
-
revision number too long, increase CVS_MAX_DEPTH
-
Fatal error: internal buffers are too short to handle a CVS
revision in a repo. Increase this constant in cvs.h and rebuild.
Warning: this will increase memory usage and slow down the tests
a lot.
-
snapshot sequence number too large, widen serial_t
-
Fatal error: the number of file snapshots in the CVS repo
overruns an internal counter. Rebuild cvs-fast-export from
source with a wider serial_t patched into cvs.h. Warning: this
will significantly increase the working-set size
-
too many branches, widen branchcount_t
-
Fatal error: the number of branches descended from some single
commit overruns an internal counter. Rebuild cvs-fast-export from
source with a wider branchcount_t patched into cvs.h. Warning:
this will significantly increase the working-set size
-
corrupt delta in
-
The text of a delta is expected to be led with d (delete) and a
(append) lines describing line-oriented changes at that delta.
When you see this message, these are garbled.
-
edit script tried to delete beyond eof
-
Indicates a corrupted RCS file. An edit line count was wrong,
possibly due to an integer overflow in an old 32-bit version of RCS.
-
internal error - branch cycle
-
cvs-fast-export found a cycle while topologically sorting commits
by parent link. This should never happen and probably indicates
a serious internal error: please file a bug report.
-
internal error - lost tag
-
Late in analysis (after changeset coalescence) a tag lost its
commit reference. This should never happen and probably indicates
an internal error: please file a bug report.
SEE ALSO
rcs(1), cvs(1), cvssync(1), cvsconvert(1), reposurgeon(1), cvs2git(1).