Tabla de contenidos
Se describen herramientas y métodos para convertir formatos de datos en el sistema Debian.
Las herramientas para formatos estándar son muy buenas pero para formatos propietarios son limitadas.
Los siguientes paquetes para la conversión de información de texto llamaron mi atención.
Tabla 11.1. Enumeración de herramientas de conversión de información en formato texto
paquete | popularidad | tamaño | palabra clave | descripción |
---|---|---|---|---|
libc6
|
V:928, I:998 | 10670 | conjunto de carácteres | conversor de la codificación de texto entre configuraciones locales mediante iconv(1) (fundamental) |
recode
|
V:5, I:36 | 608 | conjunto de caracteres+eol | conversor de codificaciones de texto entre configuraciones locales (versátil, con más funcionalidades y alias) |
konwert
|
V:2, I:59 | 122 | conjunto de carácteres | conversor de codificaciones de texto entre configuraciones locales (sofisticado) |
nkf
|
V:1, I:11 | 346 | conjunto de carácteres | traductor del conjunto de caracteres para el japonés |
tcs
|
V:0, I:0 | 544 | conjunto de carácteres | traductor de conjunto de caracteres |
unaccent
|
V:0, I:0 | 76 | conjunto de carácteres | cambia las letras acentuadas por su equivalente sin acentuar |
tofrodos
|
V:3, I:36 | 50 | eol | conversor entre formatos de texto entre DOS y Unix: fromdos(1) y todos(1) |
macutils
|
V:0, I:1 | 320 | eol | conversor de formatos de texto entre Macintosh y Unix: frommac(1) y tomac(1) |
![]() |
Sugerencia |
---|---|
iconv(1)
es parte del paquete |
Puede converitr las codificaciones de los archivos de texto con iconv(1)como es muestra.
$ iconv -f codificación1 -t codificación2 entrada.txt >salida.txt
Los valores de códificaciones para el encaje distinguen entre mayúsculas y
minúsculas y pasan por alto "-
" y
"_
". Puede obtener una lista de las codificaciones
reconocidas mediante la órden "iconv -l
".
Tabla 11.2. Enumeración de valores de codificación y su uso
valor de la codificación | uso |
---|---|
ASCII | Código Estándar Americano para el Intercambio de Información, código de 7 bits sin carácteres acentuados |
UTF-8 | estándar multilenguaje actual en los sistemas operativos modernos |
ISO-8859-1 | estándar antiguo de las lenguas occidentales, ASCII+ caracteres acentuados |
ISO-8859-2 | antiguo estándar de las lenguas occidentales, ASCII + carácteres acentuados |
ISO-8859-15 | antiguo estándar de las lenguas occidentales, ISO-8859-1 con el símbolo del euro |
CP850 | página de códigos 850, caracteres de Microsoft DOS con gráficos para los lenguajes de la Europa occidental, variante de ISO-8859-1 |
CP932 | página de código 932, variante del japonés de Shift-JIS al estilo Microsoft Windows |
CP936 | página de códigos 936,GB2312, GBK o GB18030 variante para chino simplificado al estilo Microsoft Windows |
CP949 | página de código 949, EUC-KR o Código Unificado Hangul par coreano al estilo Microsoft Windows |
CP950 | código de página 950, Big5 variante par chino tradicional al estilo Microsoft Windows |
CP1251 | código de página 1251, codificación del alfabeto cirílico al estilo Microsoft Windows |
CP1252 | código de página 1252, ISO-8859-15 para las lenguas de Europa occidental al estilo Microsoft Windows |
KOI8-R | antiguo estándar ruso UNIX para el alfabeto cirílico |
ISO-2022-JP | estándar de codificación japones para el correo electrónico que solo utiliza códigos de 7 bit |
eucJP | código de 8 bit del antiguo estándar japonés de UNIX, completamente diferente de Shift-JIS |
Shift-JIS | Apéndice 1 para el japonés JIS X 0208 (consulte CP932) |
![]() |
Nota |
---|---|
Algunas codificaciones son únicamente usadas para la conversión de información y no son usables como valores de la configuración local (Sección 8.3.1, “Fundamentos de codificación”). |
Para los conjuntos de caracteres que caben en un único byte como ASCII y ISO-8859, la códificación de caracteres es casi lo mismo que el conjunto de caracteres.
Para los conjuntos de caracteres con muchos elementos como JIS X 0213 en el japonés o Conjunto de Caracteres Universal (UCS, Unicode, ISO-10646-1) en prácticamente cualquier lenguaje, existen muchos esquemas de codificación y encajan como secuencias de bytes de datos.
EUC e ISO/IEC 2022 (también conocido como JIS X 0202) para el japonés
UTF-8, UTF-16/UCS-2 y UTF-32/UCS-4 para Unicode
En este caso existe un diferenciación clara entre el conjunto de caracteres y la códificación de caracteres
Algunos proveedores en algunos casos utilizan la página de códigos como sinónimo de la tabla de codificación de caracteres.
![]() |
Nota |
---|---|
Por favor, tenga en cuenta que la mayor parte de los sistemas de
codificación comparten los mismos códigos con ASCII de 7 bits. Pero existen
algunas excepciones. Si esta convirtiendo programas antiguos japoneses en C
y datos URL de la codificación conocida como formato shift-JIS a formato
UTF-8, utilice " |
![]() |
Sugerencia |
---|---|
recode(1)
también puede ser usado y aporta mayor funcionalidad que la combinación de
iconv(1),
fromdos(1),
todos(1),
frommac(1),
y
tomac(1).
Para más información, consulte " |
Puede comprobar si un archivo de texto está codificado en UTF-8 con iconv(1) como se muestra.
$ iconv -f utf8 -t utf8 entrada.txt >/dev/null || echo "non-UTF-8 found"
![]() |
Sugerencia |
---|---|
Utilice la opción " |
Aquí esta un archivo de órdenes de ejemplo de conversión de los nombres de archivos creados en un sistema operativo antiguo a otro moderno UTF-8 en un único directorio.
#!/bin/sh ENCDN=iso-8859-1 for x in *; do mv "$x" "$(echo "$x" | iconv -f $ENCDN -t utf-8)" done
La variable "$ENCDN
" contiene la codificación original
utilizada por el nombre de archivo en el sistema operativo antiguo como en
Tabla 11.2, “Enumeración de valores de codificación y su uso”.
Para escenarios más complicados, por favor, monte el sistema de archivos
(p. ej. la partición del disco) que contiene los nombres de archivos con la
codificación adecuada mediante la opción correspondiente de
mount(8)
(consulte Sección 8.3.6, “Códificación del nombre del archivo”) y copie el contenido
completo a otro sistema de archivos montado como UTF-8 con la órden
"cp -a
".
El formato de archivo de texto, concretamente el código de final de línea (EOL) depende de la plataforma.
Tabla 11.3. Enumeración de EOL para las diferentes plataformas
plataforma | codificación de EOL | control | decimal | hexadecimal |
---|---|---|---|---|
Debian (unix) | LF |
^J
|
10 | 0A |
MSDOS y Windows | CR-LF |
^M^J
|
13 10 | 0D 0A |
Macintosh | CR |
^M
|
13 | 0D |
,Los porgramas de conversion del formato EOL fromdos(1), todos(1), frommac(1), y tomac(1), son muy útiles. Recode(1) también es muy útil.
![]() |
Nota |
---|---|
Algunos datos del sistema Debian, como las páginas wiki del paquete
|
![]() |
Nota |
---|---|
La mayor parte de los editores (p ej. |
![]() |
Sugerencia |
---|---|
La utilización de " |
There are few popular specialized programs to convert the tab codes.
Tabla 11.4. List of TAB conversion commands from bsdmainutils
and
coreutils
packages
función |
bsdmainutils
|
coreutils
|
---|---|---|
expand tab to spaces |
"col -x "
|
expand
|
unexpand tab from spaces |
"col -h "
|
unexpand
|
indent(1)
from the indent
package completely reformats whitespaces
in the C program.
Editor programs such as vim
and emacs
can be used for TAB conversion, too. For example with
vim
, you can expand TAB with ":set
expandtab
" and ":%retab
" command sequence. You
can revert this with ":set noexpandtab
" and
":%retab!
" command sequence.
Intelligent modern editors such as the vim
program are
quite smart and copes well with any encoding systems and any file formats.
You should use these editors under the UTF-8 locale in the UTF-8 capable
console for the best compatibility.
An old western European Unix text file, "u-file.txt
",
stored in the latin1 (iso-8859-1) encoding can be edited simply with
vim
by the following.
$ vim u-file.txt
This is possible since the auto detection mechanism of the file encoding in
vim
assumes the UTF-8 encoding first and, if it fails,
assumes it to be latin1.
An old Polish Unix text file, "pu-file.txt
", stored in
the latin2 (iso-8859-2) encoding can be edited with vim
by the following.
$ vim '+e ++enc=latin2 pu-file.txt'
An old Japanese unix text file, "ju-file.txt
", stored in
the eucJP encoding can be edited with vim
by the
following.
$ vim '+e ++enc=eucJP ju-file.txt'
An old Japanese MS-Windows text file, "jw-file.txt
",
stored in the so called shift-JIS encoding (more precisely: CP932) can be
edited with vim
by the following.
$ vim '+e ++enc=CP932 ++ff=dos jw-file.txt'
When a file is opened with "++enc
" and
"++ff
" options, ":w
" in the Vim
command line stores it in the original format and overwrite the original
file. You can also specify the saving format and the file name in the Vim
command line, e.g., ":w ++enc=utf8 new.txt
".
Please refer to the mbyte.txt "multi-byte text support" in
vim
on-line help and Tabla 11.2, “Enumeración de valores de codificación y su uso” for locale values used with
"++enc
".
The emacs
family of programs can perform the equivalent
functions.
The following reads a web page into a text file. This is very useful when copying configurations off the Web or applying basic Unix text tools such as grep(1) on the web page.
$ w3m -dump http://www.remote-site.com/help-info.html >textfile
Similarly, you can extract plain text data from other formats using the following.
Tabla 11.5. List of tools to extract plain text data
paquete | popularidad | tamaño | palabra clave | función |
---|---|---|---|---|
w3m
|
V:275, I:835 | 2292 | html→text |
HTML to text converter with the "w3m -dump " command
|
html2text
|
V:28, I:85 | 229 | html→text | advanced HTML to text converter (ISO 8859-1) |
lynx
|
V:37, I:107 | 1901 | html→text |
HTML to text converter with the "lynx -dump " command
|
elinks
|
V:18, I:34 | 1587 | html→text |
HTML to text converter with the "elinks -dump " command
|
links
|
V:21, I:47 | 2135 | html→text |
HTML to text converter with the "links -dump " command
|
links2
|
V:3, I:18 | 5403 | html→text |
HTML to text converter with the "links2 -dump " command
|
antiword
|
V:7, I:15 | 614 | MSWord→text,ps | convert MSWord files to plain text or ps |
catdoc
|
V:24, I:38 | 666 | MSWord→text,TeX | convert MSWord files to plain text or TeX |
pstotext
|
V:4, I:6 | 127 | ps/pdf→text | extract text from PostScript and PDF files |
unhtml
|
V:0, I:0 | 66 | html→text | remove the markup tags from an HTML file |
odt2txt
|
V:3, I:6 | 53 | odt→text | converter from OpenDocument Text to text |
You can highlight and format plain text data by the following.
Tabla 11.6. List of tools to highlight plain text data
paquete | popularidad | tamaño | palabra clave | descripción |
---|---|---|---|---|
vim-runtime
|
V:20, I:431 | 27567 | highlight |
Vim MACRO to convert source code to HTML with ":source
$VIMRUNTIME/syntax/html.vim "
|
cxref
|
V:0, I:0 | 1157 | c→html | converter for the C program to latex and HTML (C language) |
src2tex
|
V:0, I:0 | 612 | highlight | convert many source codes to TeX (C language) |
source-highlight
|
V:1, I:7 | 2008 | highlight | convert many source codes to HTML, XHTML, LaTeX, Texinfo, ANSI color escape sequences and DocBook files with highlight (C++) |
highlight
|
V:1, I:16 | 943 | highlight | convert many source codes to HTML, XHTML, RTF, LaTeX, TeX or XSL-FO files with highlight (C++) |
grc
|
V:0, I:2 | 60 | text→color | generic colouriser for everything (Python) |
txt2html
|
V:0, I:4 | 296 | text→html | text to HTML converter (Perl) |
markdown
|
V:0, I:6 | 56 | text→html | markdown text document formatter to (X)HTML (Perl) |
asciidoc
|
V:1, I:14 | 2442 | text→any | AsciiDoc text document formatter to XML/HTML (Python) |
pandoc
|
V:3, I:23 | 69422 | text→any | general markup converter (Haskell) |
python-docutils
|
V:35, I:554 | 1653 | text→any | ReStructured Text document formatter to XML (Python) |
txt2tags
|
V:0, I:1 | 951 | text→any | document conversion from text to HTML, SGML, LaTeX, man page, MoinMoin, Magic Point and PageMaker (Python) |
udo
|
V:0, I:0 | 548 | text→any | universal document - text processing utility (C language) |
stx2any
|
V:0, I:0 | 264 | text→any | document converter from structured plain text to other formats (m4) |
rest2web
|
V:0, I:0 | 526 | text→html | document converter from ReStructured Text to html (Python) |
aft
|
V:0, I:0 | 235 | text→any | "free form" document preparation system (Perl) |
yodl
|
V:0, I:0 | 522 | text→any | pre-document language and tools to process it (C language) |
sdf
|
V:0, I:0 | 1445 | text→any | simple document parser (Perl) |
sisu
|
V:0, I:0 | 5338 | text→any | document structuring, publishing and search framework (Ruby) |
The Extensible Markup Language (XML) is a markup language for documents containing structured information.
See introductory information at XML.COM.
XML text looks somewhat like HTML. It enables
us to manage multiple formats of output for a document. One easy XML system
is the docbook-xsl
package, which is used here.
Each XML file starts with standard XML declaration as the following.
<?xml version="1.0" encoding="UTF-8"?>
The basic syntax for one XML element is marked up as the following.
<name attribute="value">content</name>
XML element with empty content is marked up in the following short form.
<name attribute="value"/>
The "attribute="value"
" in the above examples are
optional.
The comment section in XML is marked up as the following.
<!-- comment -->
Other than adding markups, XML requires minor conversion to the content using predefined entities for following characters.
Tabla 11.7. List of predefined entities for XML
predefined entity | character to be converted into |
---|---|
"
|
" : quote
|
'
|
' : apostrophe
|
<
|
< : less-than
|
>
|
> : greater-than
|
&
|
& : ampersand
|
![]() |
Atención |
---|---|
" |
![]() |
Nota |
---|---|
When SGML style user defined entities,
e.g. " |
![]() |
Nota |
---|---|
As long as the XML markup are done consistently with certain set of the tag name (either some data as content or attribute value), conversion to another XML is trivial task using Extensible Stylesheet Language Transformations (XSLT). |
There are many tools available to process XML files such as the Extensible Stylesheet Language (XSL).
Basically, once you create well formed XML file, you can convert it to any format using Extensible Stylesheet Language Transformations (XSLT).
The Extensible Stylesheet
Language for Formatting Objects (XSL-FO) is supposed to be solution
for formatting. The fop
package is new to the Debian
main
archive due to its dependence to the Java programing language. So the
LaTeX code is usually generated from XML using XSLT and the LaTeX system is
used to create printable file such as DVI, PostScript, and PDF.
Tabla 11.8. List of XML tools
paquete | popularidad | tamaño | palabra clave | descripción |
---|---|---|---|---|
docbook-xml
|
I:533 | 2131 | xml | XML document type definition (DTD) for DocBook |
xsltproc
|
V:14, I:123 | 148 | xslt | XSLT command line processor (XML→ XML, HTML, plain text, etc.) |
docbook-xsl
|
V:15, I:233 | 14998 | xml/xslt | XSL stylesheets for processing DocBook XML to various output formats with XSLT |
xmlto
|
V:3, I:37 | 121 | xml/xslt | XML-to-any converter with XSLT |
dbtoepub
|
V:0, I:1 | 71 | xml/xslt | DocBook XML to .epub converter |
dblatex
|
V:5, I:25 | 4639 | xml/xslt | convert Docbook files to DVI, PostScript, PDF documents with XSLT |
fop
|
V:3, I:53 | 64 | xml/xsl-fo | convert Docbook XML files to PDF |
Since XML is subset of Standard Generalized Markup Language (SGML), it can be processed by the extensive tools available for SGML, such as Document Style Semantics and Specification Language (DSSSL).
Tabla 11.9. List of DSSSL tools
paquete | popularidad | tamaño | palabra clave | descripción |
---|---|---|---|---|
openjade
|
V:3, I:34 | 921 | dsssl | ISO/IEC 10179:1996 standard DSSSL processor (latest) |
openjade1.3
|
V:0, I:0 | 2199 | dsssl | ISO/IEC 10179:1996 standard DSSSL processor (1.3.x series) |
jade
|
V:0, I:12 | 825 | dsssl | James Clark's original DSSSL processor (1.2.x series) |
docbook-dsssl
|
V:2, I:39 | 2604 | xml/dsssl | DSSSL stylesheets for processing DocBook XML to various output formats with DSSSL |
docbook-utils
|
V:2, I:26 | 281 | xml/dsssl |
utilities for DocBook files including conversion to other formats (HTML,
RTF, PS, man, PDF) with docbook2* commands with DSSSL
|
sgml2x
|
V:0, I:0 | 90 | SGML/dsssl | converter from SGML and XML using DSSSL stylesheets |
You can extract HTML or XML data from other formats using followings.
Tabla 11.10. List of XML data extraction tools
paquete | popularidad | tamaño | palabra clave | descripción |
---|---|---|---|---|
wv
|
V:6, I:9 | 713 | MSWord→any | document converter from Microsoft Word to HTML, LaTeX, etc. |
texi2html
|
V:0, I:11 | 1832 | texi→html | converter from Texinfo to HTML |
man2html
|
V:0, I:3 | 133 | manpage→html | converter from manpage to HTML (CGI support) |
tex4ht
|
V:1, I:24 | 36 | tex↔html | converter between (La)TeX and HTML |
unrtf
|
V:2, I:4 | 137 | rtf→html | document converter from RTF to HTML, etc |
info2www
|
V:3, I:4 | 156 | info→html | converter from GNU info to HTML (CGI support) |
ooo2dbk
|
V:0, I:1 | 217 | sxw→xml | converter from OpenOffice.org SXW documents to DocBook XML |
wp2x
|
V:0, I:0 | 215 | WordPerfect→any | WordPerfect 5.0 and 5.1 files to TeX, LaTeX, troff, GML and HTML |
doclifter
|
V:0, I:0 | 457 | troff→xml | converter from troff to DocBook XML |
For non-XML HTML files, you can convert them to XHTML which is an instance of well formed XML. XHTML can be processed by XML tools.
Tabla 11.11. List of XML pretty print tools
paquete | popularidad | tamaño | palabra clave | descripción |
---|---|---|---|---|
libxml2-utils
|
V:25, I:322 | 177 | xml↔html↔xhtml | command line XML tool with xmllint(1) (syntax check, reformat, lint, …) |
tidy
|
V:2, I:17 | 83 | xml↔html↔xhtml | HTML syntax checker and reformatter |
Once proper XML is generated, you can use XSLT technology to extract data based on the mark-up context etc.
The Unix troff program originally developed by AT&T can be used for simple typesetting. It is usually used to create manpages.
TeX created by Donald Knuth is a very powerful type setting tool and is the de facto standard. LaTeX originally written by Leslie Lamport enables a high-level access to the power of TeX.
Traditionally, roff is the main Unix text
processing system. See
roff(7),
groff(7),
groff(1),
grotty(1),
troff(1),
groff_mdoc(7),
groff_man(7),
groff_ms(7),
groff_me(7),
groff_mm(7),
and "info groff
".
You can read or print a good tutorial and reference on
"-me
" macro in
"/usr/share/doc/groff/
" by installing the
groff
package.
![]() |
Sugerencia |
---|---|
" |
![]() |
Sugerencia |
---|---|
To remove "^H" and "_" from a text file generated by
|
The TeX Live software distribution offers a
complete TeX system. The texlive
metapackage provides a
decent selection of the TeX Live packages
which should suffice for the most common tasks.
There are many references available for TeX and LaTeX.
tex(1)
latex(1)
texdoc(1)
texdoctk(1)
"The TeXbook", by Donald E. Knuth, (Addison-Wesley)
"LaTeX - A Document Preparation System", by Leslie Lamport, (Addison-Wesley)
"The LaTeX Companion", by Goossens, Mittelbach, Samarin, (Addison-Wesley)
This is the most powerful typesetting environment. Many SGML processors use this as their back end text
processor. Lyx provided by the
lyx
package and GNU
TeXmacs provided by the texmacs
package offer
nice WYSIWYG editing environment for LaTeX while many use Emacs and Vim as the choice
for the source editor.
There are many online resources available.
The TEX Live Guide - TEX Live 2007
("/usr/share/doc/texlive-doc-base/english/texlive-en/live.html
")
(texlive-doc-base
package)
When documents become bigger, sometimes TeX may cause errors. You must
increase pool size in "/etc/texmf/texmf.cnf
" (or more
appropriately edit "/etc/texmf/texmf.d/95NonPath
" and run
update-texmf(8))
to fix this.
![]() |
Nota |
---|---|
The TeX source of "The TeXbook" is available at http://tug.ctan.org/tex-archive/systems/knuth/dist/tex/texbook.tex.
This file contains most of the required macros. I heard that you can
process this document with
tex(1)
after commenting lines 7 to 10 and adding " |
You can print a manual page in PostScript nicely by one of the following commands.
$ man -Tps some_manpage | lpr
$ man -Tps some_manpage | mpage -2 | lpr
The second example prints 2 pages on one sheet.
Although writing a manual page (manpage) in the plain troff format is possible, there are few helper packages to create it.
Tabla 11.13. List of packages to help creating the manpage
paquete | popularidad | tamaño | palabra clave | descripción |
---|---|---|---|---|
docbook-to-man
|
V:1, I:17 | 179 | SGML→manpage | converter from DocBook SGML into roff man macros |
help2man
|
V:0, I:9 | 454 | text→manpage | automatic manpage generator from --help |
info2man
|
V:0, I:0 | 134 | info→manpage | converter from GNU info to POD or man pages |
txt2man
|
V:0, I:1 | 65 | text→manpage | convert flat ASCII text to man page format |
Printable data is expressed in the PostScript format on the Debian system. Common Unix Printing System (CUPS) uses Ghostscript as its rasterizer backend program for non-PostScript printers.
The core of printable data manipulation is the Ghostscript PostScript (PS) interpreter which generates raster image.
The latest upstream Ghostscript from Artifex was re-licensed from AFPL to GPL and merged all the latest ESP version changes such as CUPS related ones at 8.60 release as unified release.
Tabla 11.14. List of Ghostscript PostScript interpreters
paquete | popularidad | tamaño | descripción |
---|---|---|---|
ghostscript
|
V:160, I:691 | 224 | The GPL Ghostscript PostScript/PDF interpreter |
ghostscript-x
|
V:32, I:77 | 210 | GPL Ghostscript PostScript/PDF interpreter - X display support |
libpoppler64
|
V:19, I:53 | 3214 | PDF rendering library forked from the xpdf PDF viewer |
libpoppler-glib8
|
V:239, I:526 | 435 | PDF rendering library (GLib-based shared library) |
poppler-data
|
V:103, I:669 | 12123 | CMaps for PDF rendering library (for CJK support: Adobe-*) |
![]() |
Sugerencia |
---|---|
" |
You can merge two PostScript (PS) or Portable Document Format (PDF) files using gs(1) of Ghostscript.
$ gs -q -dNOPAUSE -dBATCH -sDEVICE=pswrite -sOutputFile=bla.ps -f foo1.ps foo2.ps $ gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=bla.pdf -f foo1.pdf foo2.pdf
![]() |
Nota |
---|---|
The PDF, which is a widely used cross-platform printable data format, is essentially the compressed PS format with few additional features and extensions. |
![]() |
Sugerencia |
---|---|
For command line,
psmerge(1)
and other commands from the |
The following packages for the printable data utilities caught my eyes.
Tabla 11.15. List of printable data utilities
paquete | popularidad | tamaño | palabra clave | descripción |
---|---|---|---|---|
poppler-utils
|
V:52, I:492 | 526 | pdf→ps,text,… |
PDF utilities: pdftops , pdfinfo ,
pdfimages , pdftotext ,
pdffonts
|
psutils
|
V:12, I:221 | 219 | ps→ps | PostScript document conversion tools |
poster
|
V:0, I:8 | 49 | ps→ps | create large posters out of PostScript pages |
enscript
|
V:3, I:28 | 2111 | text→ps, html, rtf | convert ASCII text to PostScript, HTML, RTF or Pretty-Print |
a2ps
|
V:2, I:31 | 3624 | text→ps | 'Anything to PostScript' converter and pretty-printer |
pdftk
|
V:9, I:56 | 2959 | pdf→pdf |
PDF document conversion tool: pdftk
|
mpage
|
V:0, I:5 | 141 | text,ps→ps | print multiple pages per sheet |
html2ps
|
V:0, I:6 | 320 | html→ps | converter from HTML to PostScript |
gnuhtml2latex
|
V:0, I:1 | 53 | html→latex | converter from html to latex |
latex2rtf
|
V:0, I:7 | 438 | latex→rtf | convert documents from LaTeX to RTF which can be read by MS Word |
ps2eps
|
V:8, I:114 | 94 | ps→eps | converter from PostScript to EPS (Encapsulated PostScript) |
e2ps
|
V:0, I:0 | 112 | text→ps | Text to PostScript converter with Japanese encoding support |
impose+
|
V:0, I:1 | 180 | ps→ps | PostScript utilities |
trueprint
|
V:0, I:0 | 138 | text→ps | pretty print many source codes (C, C++, Java, Pascal, Perl, Pike, Sh, and Verilog) to PostScript. (C language) |
pdf2svg
|
V:0, I:5 | 50 | ps→svg | converter from PDF to Scalable vector graphics format |
pdftoipe
|
V:0, I:0 | 63 | ps→ipe | converter from PDF to IPE's XML format |
Both lp(1) and lpr(1) commands offered by Common Unix Printing System (CUPS) provides options for customized printing the printable data.
You can print 3 copies of a file collated using one of the following commands.
$ lp -n 3 -o Collate=True filename
$ lpr -#3 -o Collate=True filename
You can further customize printer operation by using printer option such as
"-o number-up=2
", "-o page-set=even
",
"-o page-set=odd
", "-o scaling=200
",
"-o natural-scaling=200
", etc., documented at Command-Line Printing and
Options.
The following packages for the mail data conversion caught my eyes.
Tabla 11.16. List of packages to help mail data conversion
paquete | popularidad | tamaño | palabra clave | descripción |
---|---|---|---|---|
sharutils
|
V:9, I:123 | 1352 | shar(1), unshar(1), uuencode(1), uudecode(1) | |
mpack
|
V:2, I:26 | 91 | MIME | encoding and decoding of MIME messages: mpack(1) and munpack(1) |
tnef
|
V:7, I:11 | 98 | ms-tnef | unpacking MIME attachments of type "application/ms-tnef" which is a Microsoft only format |
uudeview
|
V:0, I:6 | 97 | encoder and decoder for the following formats: uuencode, xxencode, BASE64, quoted printable, and BinHex | |
readpst
|
I:1 | 21 | PST | convert Microsoft Outlook PST files to mbox format |
![]() |
Sugerencia |
---|---|
The Internet Message Access Protocol version 4 (IMAP4) server (see Sección 6.7, “Servidor POP3/IMAP4”) may be used to move mails out from proprietary mail systems if the mail client software can be configured to use IMAP4 server too. |
Mail (SMTP) data should be limited to series of 7 bit data. So binary data and 8 bit text data are encoded into 7 bit format with the Multipurpose Internet Mail Extensions (MIME) and the selection of the charset (see Sección 8.3.1, “Fundamentos de codificación”).
The standard mail storage format is mbox formatted according to RFC2822 (updated RFC822). See
mbox(5)
(provided by the mutt
package).
For European languages, "Content-Transfer-Encoding:
quoted-printable
" with the ISO-8859-1 charset is usually used for
mail since there are not much 8 bit characters. If European text is encoded
in UTF-8, "Content-Transfer-Encoding: quoted-printable
"
is likely to be used since it is mostly 7 bit data.
For Japanese, traditionally "Content-Type: text/plain;
charset=ISO-2022-JP
" is usually used for mail to keep text in 7
bits. But older Microsoft systems may send mail data in Shift-JIS without
proper declaration. If Japanese text is encoded in UTF-8, Base64 is likely to be used since it contains many 8
bit data. The situation of other Asian languages is similar.
![]() |
Nota |
---|---|
If your non-Unix mail data is accessible by a non-Debian client software which can talk to the IMAP4 server, you may be able to move them out by running your own IMAP4 server (see Sección 6.7, “Servidor POP3/IMAP4”). |
![]() |
Nota |
---|---|
If you use other mail storage formats, moving them to mbox format is the good first step. The versatile client program such as mutt(1) may be handy for this. |
You can split mailbox contents to each message using procmail(1) and formail(1).
Each mail message can be unpacked using
munpack(1)
from the mpack
package (or other specialized tools) to
obtain the MIME encoded contents.
The following packages for the graphic data conversion, editing, and organization tools caught my eyes.
Tabla 11.17. List of graphic data tools
paquete | popularidad | tamaño | palabra clave | descripción |
---|---|---|---|---|
gimp
|
V:97, I:509 | 16255 | image(bitmap) | GNU Image Manipulation Program |
imagemagick
|
V:154, I:544 | 191 | image(bitmap) | image manipulation programs |
graphicsmagick
|
V:7, I:14 | 4820 | image(bitmap) |
image manipulation programs (fork of imagemagick )
|
xsane
|
V:24, I:193 | 913 | image(bitmap) | GTK+-based X11 frontend for SANE (Scanner Access Now Easy) |
netpbm
|
V:32, I:547 | 4230 | image(bitmap) | graphics conversion tools |
icoutils
|
V:8, I:72 | 192 | png↔ico(bitmap) | convert MS Windows icons and cursors to and from PNG formats (favicon.ico) |
scribus
|
V:14, I:28 | 19136 | ps/pdf/SVG/… | Scribus DTP editor |
libreoffice-draw
|
V:344, I:479 | 8995 | image(vector) | LibreOffice office suite - drawing |
inkscape
|
V:145, I:360 | 102751 | image(vector) | SVG (Scalable Vector Graphics) editor |
dia-gnome
|
V:6, I:11 | 20 | image(vector) | diagram editor (GNOME) |
dia
|
V:25, I:41 | 3880 | image(vector) | diagram editor (Gtk) |
xfig
|
V:13, I:19 | 1783 | image(vector) | Facility for Interactive Generation of figures under X11 |
pstoedit
|
V:15, I:358 | 667 | ps/pdf→image(vector) | PostScript and PDF files to editable vector graphics converter (SVG) |
libwmf-bin
|
V:14, I:365 | 104 | Windows/image(vector) | Windows metafile (vector graphic data) conversion tools |
fig2sxd
|
V:0, I:0 | 142 | fig→sxd(vector) | convert XFig files to OpenOffice.org Draw format |
unpaper
|
V:2, I:15 | 447 | image→image | post-processing tool for scanned pages for OCR |
tesseract-ocr
|
V:4, I:27 | 558 | image→text | free OCR software based on the HP's commercial OCR engine |
tesseract-ocr-eng
|
I:28 | 37486 | image→text | OCR engine data: tesseract-ocr language files for English text |
gocr
|
V:2, I:25 | 494 | image→text | free OCR software |
ocrad
|
V:1, I:7 | 310 | image→text | free OCR software |
eog
|
V:101, I:337 | 10581 | image(Exif) | Eye of GNOME graphics viewer program |
gthumb
|
V:15, I:27 | 3238 | image(Exif) | image viewer and browser (GNOME) |
geeqie
|
V:17, I:25 | 1535 | image(Exif) | image viewer using GTK+ |
shotwell
|
V:17, I:140 | 5754 | image(Exif) | digital photo organizer (GNOME) |
gtkam
|
V:0, I:7 | 965 | image(Exif) | application for retrieving media from digital cameras (GTK+) |
gphoto2
|
V:1, I:14 | 969 | image(Exif) | The gphoto2 digital camera command-line client |
gwenview
|
V:33, I:104 | 4508 | image(Exif) | image viewer (KDE) |
kamera
|
V:4, I:103 | 230 | image(Exif) | digital camera support for KDE applications |
digikam
|
V:3, I:17 | 1760 | image(Exif) | digital photo management application for KDE |
exiv2
|
V:5, I:77 | 242 | image(Exif) | EXIF/IPTC metadata manipulation tool |
exiftran
|
V:2, I:26 | 67 | image(Exif) | transform digital camera jpeg images |
jhead
|
V:1, I:13 | 105 | image(Exif) | manipulate the non-image part of Exif compliant JPEG (digital camera photo) files |
exif
|
V:1, I:10 | 370 | image(Exif) | command-line utility to show EXIF information in JPEG files |
exiftags
|
V:0, I:3 | 205 | image(Exif) | utility to read Exif tags from a digital camera JPEG file |
exifprobe
|
V:0, I:3 | 482 | image(Exif) | read metadata from digital pictures |
dcraw
|
V:3, I:25 | 358 | image(Raw)→ppm | decode raw digital camera images |
findimagedupes
|
V:0, I:1 | 79 | image→fingerprint | find visually similar or duplicate images |
ale
|
V:0, I:0 | 766 | image→image | merge images to increase fidelity or create mosaics |
imageindex
|
V:0, I:0 | 144 | image(Exif)→html | generate static HTML galleries from images |
outguess
|
V:0, I:0 | 217 | jpeg,png | universal Steganographic tool |
librecad
|
V:12, I:18 | 7762 | DXF | CAD data editor (KDE) |
blender
|
V:4, I:29 | 101399 | blend, TIFF, VRML, … | 3D content editor for animation etc |
mm3d
|
V:0, I:0 | 4668 | ms3d, obj, dxf, … | OpenGL based 3D model editor |
open-font-design-toolkit
|
I:0 | 28 | ttf, ps, … | metapackage for open font design |
fontforge
|
V:1, I:10 | 91 | ttf, ps, … | font editor for PS, TrueType and OpenType fonts |
xgridfit
|
V:0, I:0 | 898 | ttf | program for gridfitting and hinting TrueType fonts |
![]() |
Sugerencia |
---|---|
Search more image tools using regex " |
Although GUI programs such as gimp(1) are very powerful, command line tools such as imagemagick(1) are quite useful for automating image manipulation via scripts.
The de facto image file format of the digital camera is the Exchangeable Image File Format (EXIF) which is the JPEG image file format with additional metadata tags. It can hold information such as date, time, and camera settings.
The Lempel-Ziv-Welch (LZW) lossless data compression patent has been expired. Graphics Interchange Format (GIF) utilities which use the LZW compression method are now freely available on the Debian system.
![]() |
Sugerencia |
---|---|
Any digital camera or scanner with removable recording media works with Linux through USB storage readers since it follows the Design rule for Camera Filesystem and uses FAT filesystem. See Sección 10.1.7, “Dispositivos de almacenamiento extraibles”. |
There are many other programs for converting data. Following packages caught
my eyes using regex "~Guse::converting
" in
aptitude(8)
(see Sección 2.2.6, “Opciones del método de búsqueda con aptitude”).
Tabla 11.18. List of miscellaneous data conversion tools
paquete | popularidad | tamaño | palabra clave | descripción |
---|---|---|---|---|
alien
|
V:5, I:67 | 166 | rpm/tgz→deb | converter for the foreign package into the Debian package |
freepwing
|
V:0, I:0 | 568 | EB→EPWING | converter from "Electric Book" (popular in Japan) to a single JIS X 4081 format (a subset of the EPWING V1) |
calibre
|
V:7, I:33 | 49261 | any→EPUB | gestión de bibliotecas y conversor de libros electrónicos |
You can also extract data from RPM format with the following.
$ rpm2cpio file.src.rpm | cpio --extract