Package org.biojava.nbio.survival.data
Class WorkSheet
- java.lang.Object
-
- org.biojava.nbio.survival.data.WorkSheet
-
public class WorkSheet extends java.lang.Object
Need to handle very large spreadsheets of expression data so keep memory footprint low- Author:
- Scooter Willis
-
-
Constructor Summary
Constructors Constructor Description WorkSheet()
WorkSheet(java.lang.String[][] values)
WorkSheet(java.util.Collection<java.lang.String> rows, java.util.Collection<java.lang.String> columns)
WorkSheet(CompactCharSequence[][] values)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addCell(java.lang.String row, java.lang.String col, java.lang.String value)
Add data to a cellvoid
addColumn(java.lang.String column, java.lang.String defaultValue)
void
addColumns(java.util.ArrayList<java.lang.String> columns, java.lang.String defaultValue)
Add columns to worksheet and set default valuevoid
addRow(java.lang.String row, java.lang.String defaultValue)
void
addRows(java.util.ArrayList<java.lang.String> rows, java.lang.String defaultValue)
Add rows to the worksheet and fill in default valuevoid
appendWorkSheetColumns(WorkSheet worksheet)
Add columns from a second worksheet to be joined by common row.void
appendWorkSheetRows(WorkSheet worksheet)
Add rows from a second worksheet to be joined by common column.void
applyColumnFilter(java.lang.String column, ChangeValue changeValue)
Apply filter to a column to change values from say numberic to nominal based on some rangevoid
changeColumnHeader(java.lang.String col, java.lang.String newCol)
void
changeColumnHeader(ChangeValue changeValue)
void
changeColumnsHeaders(java.util.LinkedHashMap<java.lang.String,java.lang.String> newColumnValues)
Change the columns in the HashMap Key to the name of the valuevoid
changeRowHeader(java.lang.String row, java.lang.String newRow)
void
changeRowHeader(ChangeValue changeValue)
void
clear()
See if we can free up memoryjava.util.ArrayList<java.lang.String>
getAllColumns()
Get the list of column names including those that may be hiddenjava.util.ArrayList<java.lang.String>
getAllRows()
Get all rows including those that may be hiddenjava.lang.String
getCell(java.lang.String row, java.lang.String col)
Get cell valuejava.lang.Double
getCellDouble(java.lang.String row, java.lang.String col)
java.lang.Integer
getColumnIndex(java.lang.String column)
java.util.LinkedHashMap<java.lang.String,HeaderInfo>
getColumnLookup()
java.util.ArrayList<java.lang.String>
getColumns()
Get the list of column names.static WorkSheet
getCopyWorkSheet(WorkSheet copyWorkSheet)
Create a copy of a worksheet.static WorkSheet
getCopyWorkSheetSelectedRows(WorkSheet copyWorkSheet, java.util.ArrayList<java.lang.String> rows)
Create a copy of a worksheet.java.util.ArrayList<java.lang.String>
getDataColumns()
java.util.ArrayList<java.lang.String>
getDataRows()
Get the list of row namesjava.util.ArrayList<java.lang.String>
getDiscreteColumnValues(java.lang.String column)
Get back a list of unique values in the columnjava.util.ArrayList<java.lang.String>
getDiscreteRowValues(java.lang.String row)
Get back a list of unique values in the rowjava.lang.String
getIndexColumnName()
WorkSheet
getLogScale(double base)
Get the log scale of this worksheet where a zero value will be set to .1 as Log(0) is undefinedWorkSheet
getLogScale(double base, double zeroValue)
Get the log scale of this worksheetjava.util.ArrayList<java.lang.String>
getMetaDataColumns()
java.util.LinkedHashMap<java.lang.String,java.lang.String>
getMetaDataColumnsHashMap()
java.util.ArrayList<java.lang.String>
getMetaDataRows()
java.util.LinkedHashMap<java.lang.String,java.lang.String>
getMetaDataRowsHashMap()
java.util.ArrayList<java.lang.String>
getRandomDataColumns(int number)
java.util.ArrayList<java.lang.String>
getRandomDataColumns(int number, java.util.ArrayList<java.lang.String> columns)
java.lang.String
getRowHeader()
java.lang.Integer
getRowIndex(java.lang.String row)
java.util.LinkedHashMap<java.lang.String,HeaderInfo>
getRowLookup()
java.util.ArrayList<java.lang.String>
getRows()
Get the list of row names.void
hideColumn(java.lang.String column, boolean hide)
void
hideEmptyColumns()
void
hideEmptyRows()
void
hideMetaDataColumns(boolean value)
void
hideMetaDataRows(boolean value)
void
hideRow(java.lang.String row, boolean hide)
boolean
isMetaDataColumn(java.lang.String column)
boolean
isMetaDataRow(java.lang.String row)
boolean
isValidColumn(java.lang.String col)
boolean
isValidRow(java.lang.String row)
void
markMetaDataColumn(java.lang.String column)
void
markMetaDataColumns(java.util.ArrayList<java.lang.String> metaDataColumns)
marks columns as containing meta datavoid
markMetaDataRow(java.lang.String row)
void
randomlyDivideSave(double percentage, java.lang.String fileName1, java.lang.String fileName2)
Split a worksheet randomly.static WorkSheet
readCSV(java.io.File f, char delimiter)
static WorkSheet
readCSV(java.io.InputStream is, char delimiter)
Read a CSV/Tab delimited file where you pass in the delimiterstatic WorkSheet
readCSV(java.lang.String fileName, char delimiter)
Read a CSV/Tab delimitted file where you pass in the delimitervoid
replaceColumnValues(java.lang.String column, java.util.HashMap<java.lang.String,java.lang.String> values)
Change values in a column where 0 = something and 1 = something differentvoid
save(java.io.OutputStream outputStream, char delimitter, boolean quoteit)
void
saveCSV(java.lang.String fileName)
Save the worksheet as a csv filevoid
saveTXT(java.lang.String fileName)
void
setCacheDoubleValues(boolean value)
void
setIndexColumnName(java.lang.String indexColumnName)
void
setMetaDataColumns(java.util.ArrayList<java.lang.String> metaDataColumns)
Clears existing meta data columns and sets new onesvoid
setMetaDataColumnsAfterColumn()
void
setMetaDataColumnsAfterColumn(java.lang.String column)
void
setMetaDataRows(java.util.ArrayList<java.lang.String> metaDataRows)
void
setMetaDataRowsAfterRow()
void
setMetaDataRowsAfterRow(java.lang.String row)
void
setRowHeader(java.lang.String value)
void
shuffleColumnsAndThenRows(java.util.ArrayList<java.lang.String> columns, java.util.ArrayList<java.lang.String> rows)
Randomly shuffle the columns and rows.void
shuffleColumnValues(java.util.ArrayList<java.lang.String> columns)
Need to shuffle column values to allow for randomized testing.void
shuffleRowValues(java.util.ArrayList<java.lang.String> rows)
Need to shuffle rows values to allow for randomized testing.WorkSheet
swapRowAndColumns()
Swap the row and columns returning a new worksheetjava.lang.String
toString()
static WorkSheet
unionWorkSheetsRowJoin(java.lang.String w1FileName, java.lang.String w2FileName, char delimitter, boolean secondSheetMetaData)
Combine two work sheets where you join based on rows.static WorkSheet
unionWorkSheetsRowJoin(WorkSheet w1, WorkSheet w2, boolean secondSheetMetaData)
* Combine two work sheets where you join based on rows.
-
-
-
Constructor Detail
-
WorkSheet
public WorkSheet()
-
WorkSheet
public WorkSheet(java.util.Collection<java.lang.String> rows, java.util.Collection<java.lang.String> columns) throws java.lang.Exception
- Parameters:
rows
-columns
-- Throws:
java.lang.Exception
-
WorkSheet
public WorkSheet(java.lang.String[][] values)
- Parameters:
values
-
-
WorkSheet
public WorkSheet(CompactCharSequence[][] values)
- Parameters:
values
-
-
-
Method Detail
-
clear
public void clear()
See if we can free up memory
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
-
randomlyDivideSave
public void randomlyDivideSave(double percentage, java.lang.String fileName1, java.lang.String fileName2) throws java.lang.Exception
Split a worksheet randomly. Used for creating a discovery/validation data set The first file name will matched the percentage and the second file the remainder- Parameters:
percentage
-fileName1
-fileName2
-- Throws:
java.lang.Exception
-
getCopyWorkSheetSelectedRows
public static WorkSheet getCopyWorkSheetSelectedRows(WorkSheet copyWorkSheet, java.util.ArrayList<java.lang.String> rows) throws java.lang.Exception
Create a copy of a worksheet. If shuffling of columns or row for testing a way to duplicate original worksheet- Parameters:
copyWorkSheet
-rows
-- Returns:
- Throws:
java.lang.Exception
-
getCopyWorkSheet
public static WorkSheet getCopyWorkSheet(WorkSheet copyWorkSheet) throws java.lang.Exception
Create a copy of a worksheet. If shuffling of columns or row for testing a way to duplicate original worksheet- Parameters:
copyWorkSheet
-- Returns:
- Throws:
java.lang.Exception
-
getMetaDataColumns
public java.util.ArrayList<java.lang.String> getMetaDataColumns()
- Returns:
-
getMetaDataRows
public java.util.ArrayList<java.lang.String> getMetaDataRows()
- Returns:
-
getDataColumns
public java.util.ArrayList<java.lang.String> getDataColumns()
- Returns:
-
shuffleColumnsAndThenRows
public void shuffleColumnsAndThenRows(java.util.ArrayList<java.lang.String> columns, java.util.ArrayList<java.lang.String> rows) throws java.lang.Exception
Randomly shuffle the columns and rows. Should be constrained to the same data type if not probably doesn't make any sense.- Parameters:
columns
-rows
-- Throws:
java.lang.Exception
-
shuffleColumnValues
public void shuffleColumnValues(java.util.ArrayList<java.lang.String> columns) throws java.lang.Exception
Need to shuffle column values to allow for randomized testing. The columns in the list will be shuffled together- Parameters:
columns
-- Throws:
java.lang.Exception
-
shuffleRowValues
public void shuffleRowValues(java.util.ArrayList<java.lang.String> rows) throws java.lang.Exception
Need to shuffle rows values to allow for randomized testing. The rows in the list will be shuffled together- Parameters:
rows
-- Throws:
java.lang.Exception
-
hideMetaDataColumns
public void hideMetaDataColumns(boolean value)
- Parameters:
value
-
-
hideMetaDataRows
public void hideMetaDataRows(boolean value)
- Parameters:
value
-
-
setMetaDataRowsAfterRow
public void setMetaDataRowsAfterRow()
-
setMetaDataColumnsAfterColumn
public void setMetaDataColumnsAfterColumn()
-
setMetaDataRowsAfterRow
public void setMetaDataRowsAfterRow(java.lang.String row)
- Parameters:
row
-
-
setMetaDataColumnsAfterColumn
public void setMetaDataColumnsAfterColumn(java.lang.String column)
- Parameters:
column
-
-
setMetaDataColumns
public void setMetaDataColumns(java.util.ArrayList<java.lang.String> metaDataColumns)
Clears existing meta data columns and sets new ones- Parameters:
metaDataColumns
-
-
markMetaDataColumns
public void markMetaDataColumns(java.util.ArrayList<java.lang.String> metaDataColumns)
marks columns as containing meta data- Parameters:
metaDataColumns
-
-
markMetaDataColumn
public void markMetaDataColumn(java.lang.String column)
- Parameters:
column
-
-
isMetaDataColumn
public boolean isMetaDataColumn(java.lang.String column)
- Parameters:
column
-- Returns:
-
isMetaDataRow
public boolean isMetaDataRow(java.lang.String row)
- Parameters:
row
-- Returns:
-
markMetaDataRow
public void markMetaDataRow(java.lang.String row)
- Parameters:
row
-
-
setMetaDataRows
public void setMetaDataRows(java.util.ArrayList<java.lang.String> metaDataRows)
- Parameters:
metaDataRows
-
-
hideEmptyRows
public void hideEmptyRows() throws java.lang.Exception
- Throws:
java.lang.Exception
-
hideEmptyColumns
public void hideEmptyColumns() throws java.lang.Exception
- Throws:
java.lang.Exception
-
hideRow
public void hideRow(java.lang.String row, boolean hide)
- Parameters:
row
-hide
-
-
hideColumn
public void hideColumn(java.lang.String column, boolean hide)
- Parameters:
column
-hide
-
-
replaceColumnValues
public void replaceColumnValues(java.lang.String column, java.util.HashMap<java.lang.String,java.lang.String> values) throws java.lang.Exception
Change values in a column where 0 = something and 1 = something different- Parameters:
column
-values
-- Throws:
java.lang.Exception
-
applyColumnFilter
public void applyColumnFilter(java.lang.String column, ChangeValue changeValue) throws java.lang.Exception
Apply filter to a column to change values from say numberic to nominal based on some range- Parameters:
column
-changeValue
-- Throws:
java.lang.Exception
-
addColumn
public void addColumn(java.lang.String column, java.lang.String defaultValue)
- Parameters:
column
-defaultValue
-
-
addColumns
public void addColumns(java.util.ArrayList<java.lang.String> columns, java.lang.String defaultValue)
Add columns to worksheet and set default value- Parameters:
columns
-defaultValue
-
-
addRow
public void addRow(java.lang.String row, java.lang.String defaultValue)
- Parameters:
row
-defaultValue
-
-
addRows
public void addRows(java.util.ArrayList<java.lang.String> rows, java.lang.String defaultValue)
Add rows to the worksheet and fill in default value- Parameters:
rows
-defaultValue
-
-
addCell
public void addCell(java.lang.String row, java.lang.String col, java.lang.String value) throws java.lang.Exception
Add data to a cell- Parameters:
row
-col
-value
-- Throws:
java.lang.Exception
-
isValidRow
public boolean isValidRow(java.lang.String row)
- Parameters:
row
-- Returns:
-
isValidColumn
public boolean isValidColumn(java.lang.String col)
- Parameters:
col
-- Returns:
-
setCacheDoubleValues
public void setCacheDoubleValues(boolean value)
- Parameters:
value
-
-
getCellDouble
public java.lang.Double getCellDouble(java.lang.String row, java.lang.String col) throws java.lang.Exception
- Parameters:
row
-col
-- Returns:
- Throws:
java.lang.Exception
-
getCell
public java.lang.String getCell(java.lang.String row, java.lang.String col) throws java.lang.Exception
Get cell value- Parameters:
row
-col
-- Returns:
- Throws:
java.lang.Exception
-
changeRowHeader
public void changeRowHeader(ChangeValue changeValue)
- Parameters:
changeValue
-
-
changeColumnHeader
public void changeColumnHeader(ChangeValue changeValue)
- Parameters:
changeValue
-
-
changeRowHeader
public void changeRowHeader(java.lang.String row, java.lang.String newRow) throws java.lang.Exception
- Parameters:
row
-newRow
-- Throws:
java.lang.Exception
-
changeColumnsHeaders
public void changeColumnsHeaders(java.util.LinkedHashMap<java.lang.String,java.lang.String> newColumnValues) throws java.lang.Exception
Change the columns in the HashMap Key to the name of the value- Parameters:
newColumnValues
-- Throws:
java.lang.Exception
-
changeColumnHeader
public void changeColumnHeader(java.lang.String col, java.lang.String newCol) throws java.lang.Exception
- Parameters:
col
-newCol
-- Throws:
java.lang.Exception
-
getColumnIndex
public java.lang.Integer getColumnIndex(java.lang.String column) throws java.lang.Exception
- Parameters:
column
-- Returns:
- Throws:
java.lang.Exception
-
getRowIndex
public java.lang.Integer getRowIndex(java.lang.String row) throws java.lang.Exception
- Parameters:
row
-- Returns:
- Throws:
java.lang.Exception
-
getRandomDataColumns
public java.util.ArrayList<java.lang.String> getRandomDataColumns(int number)
- Parameters:
number
-- Returns:
-
getRandomDataColumns
public java.util.ArrayList<java.lang.String> getRandomDataColumns(int number, java.util.ArrayList<java.lang.String> columns)
- Parameters:
number
-columns
-- Returns:
-
getAllColumns
public java.util.ArrayList<java.lang.String> getAllColumns()
Get the list of column names including those that may be hidden- Returns:
-
getColumns
public java.util.ArrayList<java.lang.String> getColumns()
Get the list of column names. Does not include hidden columns- Returns:
-
getDiscreteColumnValues
public java.util.ArrayList<java.lang.String> getDiscreteColumnValues(java.lang.String column) throws java.lang.Exception
Get back a list of unique values in the column- Parameters:
column
-- Returns:
- Throws:
java.lang.Exception
-
getDiscreteRowValues
public java.util.ArrayList<java.lang.String> getDiscreteRowValues(java.lang.String row) throws java.lang.Exception
Get back a list of unique values in the row- Parameters:
row
-- Returns:
- Throws:
java.lang.Exception
-
getAllRows
public java.util.ArrayList<java.lang.String> getAllRows()
Get all rows including those that may be hidden- Returns:
-
getRows
public java.util.ArrayList<java.lang.String> getRows()
Get the list of row names. Will exclude hidden values- Returns:
-
getDataRows
public java.util.ArrayList<java.lang.String> getDataRows()
Get the list of row names- Returns:
-
getLogScale
public WorkSheet getLogScale(double base) throws java.lang.Exception
Get the log scale of this worksheet where a zero value will be set to .1 as Log(0) is undefined- Parameters:
base
-- Returns:
- Throws:
java.lang.Exception
-
getLogScale
public WorkSheet getLogScale(double base, double zeroValue) throws java.lang.Exception
Get the log scale of this worksheet- Parameters:
base
-- Returns:
- Throws:
java.lang.Exception
-
swapRowAndColumns
public WorkSheet swapRowAndColumns() throws java.lang.Exception
Swap the row and columns returning a new worksheet- Returns:
- Throws:
java.lang.Exception
-
unionWorkSheetsRowJoin
public static WorkSheet unionWorkSheetsRowJoin(java.lang.String w1FileName, java.lang.String w2FileName, char delimitter, boolean secondSheetMetaData) throws java.lang.Exception
Combine two work sheets where you join based on rows. Rows that are found in one but not the other are removed. If the second sheet is meta data then a meta data column will be added between the two joined columns- Parameters:
w1FileName
-w2FileName
-delimitter
-secondSheetMetaData
-- Returns:
- Throws:
java.lang.Exception
-
unionWorkSheetsRowJoin
public static WorkSheet unionWorkSheetsRowJoin(WorkSheet w1, WorkSheet w2, boolean secondSheetMetaData) throws java.lang.Exception
* Combine two work sheets where you join based on rows. Rows that are found in one but not the other are removed. If the second sheet is meta data then a meta data column will be added between the two joined columns- Parameters:
w1
-w2
-secondSheetMetaData
-- Returns:
- Throws:
java.lang.Exception
-
readCSV
public static WorkSheet readCSV(java.lang.String fileName, char delimiter) throws java.lang.Exception
Read a CSV/Tab delimitted file where you pass in the delimiter- Parameters:
fileName
-delimiter
-- Returns:
- Throws:
java.lang.Exception
-
readCSV
public static WorkSheet readCSV(java.io.File f, char delimiter) throws java.lang.Exception
- Throws:
java.lang.Exception
-
readCSV
public static WorkSheet readCSV(java.io.InputStream is, char delimiter) throws java.lang.Exception
Read a CSV/Tab delimited file where you pass in the delimiter- Parameters:
f
-delimiter
-- Returns:
- Throws:
java.lang.Exception
-
saveCSV
public void saveCSV(java.lang.String fileName) throws java.lang.Exception
Save the worksheet as a csv file- Parameters:
fileName
-- Throws:
java.lang.Exception
-
saveTXT
public void saveTXT(java.lang.String fileName) throws java.lang.Exception
- Parameters:
fileName
-- Throws:
java.lang.Exception
-
setRowHeader
public void setRowHeader(java.lang.String value)
- Parameters:
value
-
-
appendWorkSheetColumns
public void appendWorkSheetColumns(WorkSheet worksheet) throws java.lang.Exception
Add columns from a second worksheet to be joined by common row. If the appended worksheet doesn't contain a row in the master worksheet then default value of "" is used. Rows in the appended worksheet not found in the master worksheet are not added.- Parameters:
worksheet
-- Throws:
java.lang.Exception
-
appendWorkSheetRows
public void appendWorkSheetRows(WorkSheet worksheet) throws java.lang.Exception
Add rows from a second worksheet to be joined by common column. If the appended worksheet doesn't contain a column in the master worksheet then default value of "" is used. Columns in the appended worksheet not found in the master worksheet are not added.- Parameters:
worksheet
-- Throws:
java.lang.Exception
-
save
public void save(java.io.OutputStream outputStream, char delimitter, boolean quoteit) throws java.lang.Exception
- Parameters:
outputStream
-delimitter
-quoteit
-- Throws:
java.lang.Exception
-
getIndexColumnName
public java.lang.String getIndexColumnName()
- Returns:
- the indexColumnName
-
setIndexColumnName
public void setIndexColumnName(java.lang.String indexColumnName)
- Parameters:
indexColumnName
- the indexColumnName to set
-
getColumnLookup
public java.util.LinkedHashMap<java.lang.String,HeaderInfo> getColumnLookup()
- Returns:
- the columnLookup
-
getRowLookup
public java.util.LinkedHashMap<java.lang.String,HeaderInfo> getRowLookup()
- Returns:
- the rowLookup
-
getMetaDataColumnsHashMap
public java.util.LinkedHashMap<java.lang.String,java.lang.String> getMetaDataColumnsHashMap()
- Returns:
- the metaDataColumnsHashMap
-
getMetaDataRowsHashMap
public java.util.LinkedHashMap<java.lang.String,java.lang.String> getMetaDataRowsHashMap()
- Returns:
- the metaDataRowsHashMap
-
getRowHeader
public java.lang.String getRowHeader()
- Returns:
- the rowHeader
-
-