-
-
Notifications
You must be signed in to change notification settings - Fork 40
processLargeFile
Reads an XLSX spreadsheet file using streaming and processes each row using a specified User Defined Function without returning any data. Highly memory efficient and suitable for large files.
NOTE: Only XLSX (XML format) spreadsheet files can be processed using this method. It cannot read binary XLS files.
By default the method returns an object to which further optional calls can be chained before finally calling execute() to process the file.
processLargeFile( filepath ).withRowProcessor( function )[...optional configuration calls].execute();
Chainable? No.
To read a large XLSX file into a query or CSV, see readLargeFile()
To specify how each row should be processed, write a function which accepts as its arguments:
- the current row values as an array
- the current row number
- the array of columns/headers if present
processor = function( array row, numeric rowNumber, array columns ){
//insert row into a DB, or add to an array, or filter based on the row's values or its row number, etc.
}
spreadsheet.processLargeFile( filepath )
.withRowProcessor( processor )
.execute();
If your spreadsheet workbook contains more than one sheet you can specify which one to process by name or number. By default the first sheet will be used.
spreadsheet.processLargeFile( filepath )
.withRowProcessor( processor )
.withSheetName( "Sheet2" )
.execute();
spreadsheet.processLargeFile( filepath )
.withRowProcessor( processor )
.withSheetNumber( 2 )
.execute();
If your sheet has a header row which shouldn't be processed then simply add withFirstRowIsHeader():
spreadsheet.processLargeFile( filepath )
.withRowProcessor( processor )
.withFirstRowIsHeader()
.execute();
If you want to skip some rows before starting processing, use withSkipFirstRows( N ):
spreadsheet.processLargeFile( filepath )
.withRowProcessor( processor )
.withSkipFirstRows( 5 )
.execute();
By default values are processed "raw", i.e without any applied formatting. If you want to use visible/formatted values add withUseVisibleValues(). This can be useful with dates or numbers, where you might for example want to avoid scientific notation.
spreadsheet.processLargeFile( filepath )
.withRowProcessor( processor )
.withUseVisibleValues()
.execute();
If your spreadsheet file is encrypted, use withPassword():
spreadsheet.processLargeFile( filepath )
.withRowProcessor( processor )
.withPassword( "topsecret" )
.execute();
Your spreadsheet will be read using the Excel Streaming Reader which can be tuned using 2 optional parameters: bufferSize (number of bytes to use) and rowCacheSize (number of rows to process at a time). The defaults are 1024 bytes and 10 rows but you can adjust these by passing in a struct as follows:
options = {
bufferSize: 2048,
rowCacheSize: 20
};
spreadsheet.processLargeFile( filepath )
.withRowProcessor( processor )
.withStreamingOptions( options )
.execute();