Tab Characters in Compares

The case of the missing characters

Last week a customer reported that we had a bug in our comparison view. Their Custom Functions were showing changes between their library drafts and their study build drafts. Here's an example screenshot. Removed characters are shown in red and characters added are shown in green.

Changes

It looked like some spaces were being removed and replaced by a single space.

Looking at the Custom Function in the library it was apparent that it contained a Tab Character. A tab character doesn't have a defined width, it can be represented as one or more spaces. In our Custom Function editor it showed as 4 spaces:

Tabs in Custom Function

But how had it got there and why was it present in the library but not in the Study Draft?

Tabs in Architect?

An investigation of Rave Architect showed that it was possible to get Tab characters (also known as CHR(9) or \t) into a Custom Function in Architect if (and only if) you paste the code from some other editor. Pressing the TAB key in the Rave Custom Function editor just moves you out of the code editor area so you have to do this with cut and paste.

Tabs in Rave Architect

This tab character has no real effect in Rave Architect, it's just whitespace like a space character but a tab character isn't a space so the TrialGrid compare was showing a difference.

Now we knew that it was possible for a tab character to come from Rave Architect we had an idea why it was in the library draft. But why was it missing in the Study Draft?

ALS files have accents

Architect Loader Spreadsheets (ALS) files are stored in Microsoft Excel 2003-2004 XML format. This is not the same as the .xslx format which is new the Excel default format. The 2003-2004 format stores the sheets and rows in a single XML document and the way the contents are formatted gives you a clue as to what created it. An file exported from Rave Architect looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
<?xml version="1.0" encoding="UTF-8"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:ss="...">
<Worksheet ss:Name="Sheet1">
<Table>
<Row>
<Cell>
<Data ss:Type="String">Cell Data 1</Data>
</Cell>
</Row>
</Table>
</Worksheet>
</Workbook> 

while a file saved by Excel looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
<?xml version="1.0" encoding="UTF-8"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:ss="...">
 <Worksheet ss:Name="Sheet1">
  <Table>
   <Row>
    <Cell>
     <Data ss:Type="String">Cell Data 1</Data>
    </Cell>
   </Row>
  </Table>
 </Worksheet>
</Workbook> 

Despite the distinctive formatting differences, these two examples store the exact same data.

Importantly, the Draft where the Tabs had been switched for spaces had the Excel style formatting so we knew it had been modified in Excel before being uploaded to TrialGrid.

Conclusion: Excel eats Tab Characters

Once the finger was pointed at Excel it didn't take long to discover that Excel swaps tab characters for single spaces on save. This was the cause of the characters switching from Tabs to Spaces.

As a result of this finding, we have switched TrialGrid to replace tabs with spaces in Custom Functions. This makes all our comparisons consistent regardless of whether the Draft was loaded direct from Rave or was modified in Excel first.