What makes SciDataXLerate so unique?
"The most boring, but most useful software we have ever written"
SciData XLerate’s domain-specific language (DSL) and transformation engine (XLXF) grew out of years of experience working with customers on their scientific data parsing challenges.
The DSL leverages “visual idioms” commonly used in spreadsheets, such as formatting metadata as name/value pairs in adjacent cells or formatting tables as rectangular sections with headers in the first rows and data in the rest, to define the source file structure. Field and Table "Descriptors", as they're called in the DSL, primed with few hints about where to look ("find a table titled 'Table 2-C' in 'Materials' "), tell XLXF to locate the elements and prepare them for use with "Extractors".
"Extractors" use natural terms to extract data - Fields by their name and Tables by their column headers - and support robust query interfaces that enable complex extraction tasks. Data items can go through additional transformations, defined using a rich set of utilities that includes such exciting tools as date/time normalization and term remapping.
In addition to Fields and Tables, a range of Descriptors and Extractors are supported, covering both structured and semi-structured elements in spreadsheets.
A key feature for production systems is XLXF's robustness to underlying changes in the spreadsheet format. New fields added to cover pages or columns added tables are automatically discovered. Reordered fields or columns have no impact on the ability of XLXF to parse the elements. Common typos are tolerated by the robust matching engine that users can further augment with controlled vocabularies.
And it doesn't hallucinate.