Class ExtractReuters


  • public class ExtractReuters
    extends Object
    Split the Reuters SGML documents into Simple Text files containing: Title, Date, Dateline, Body
    • Constructor Detail

      • ExtractReuters

        public ExtractReuters​(File reutersDir,
                              File outputDir)
    • Method Detail

      • extract

        public void extract()
      • extractFile

        protected void extractFile​(File sgmFile)
        Override if you wish to change what is extracted
        Parameters:
        sgmFile -
      • main

        public static void main​(String[] args)