Initial Bulk Data Import

For data that has already been collected in Libreoffice Calc / Microsoft Excel, a bulk data import macro is available. The following procedure can be used with an empty database or with one that already contains records.The import procedure makes sure that existing records in the database are not overwritten.

Backup, Backup, Backup!

Before proceeding, backup the database and ensure that the restore process is working. Manually removing bulk imported data is often not fun! For details see the instructions provided on the backup page

Text Formatting

The following formatting of text in individual cells is preserved during the data transfer:

Thoughts on Non-ASCII Characters

Non-ASCII characters are used in many languages while most string functions in PHP used in the backend only support single byte ASCII characters. To compensate, strings in DRDB are always HTML encoded and special characters (e.g. German 'Umlaute' such as 'ä,ü,ß', French accented characters such as 'é,â', etc.) are stored as HTML escape sequences. Example: ö = ampersand + ouml;.

Note: This means that the number of characters in a string variable is higher than the actual number of characters if special characters are used. Also, formatting such as bold, italics, etc. is embedded in strings as HTML tags, which further increase the string length.

Empty Lines

Checks Performed Before Data Is Imported

For the bulk data transfer, the following requirements must be fulfilled by the data in the Calc/Excel document:

All checks are performed before bulk import starts. If any of these conditions is not met, the import process is aborted and no data at all is written to the database. It's an all or nothing approach. Example: If even one of the record IDs given in the import data already exists, the import process is aborted and NO records will be imported.

Bulk export from Calc / Excel with a Libreoffice Macro


'****************************************************************
'
' ADAPT the following contants before starting the JSON export
' procedure.
'
'****************************************************************

Const START_ID = 1002000100

'Which sheet to export, the numbering starts at 1
Const SHEET = 1

'Last column to be exported. Numbering starts at 1 (as shown in Calc)
Const LAST_COLUMN = 9

'First and last row to be exported. Row numbers as shown in Calc! 
'(i.e. first line would be 1 and NOT 0!)
Const FIRST_ROW = 1
Const LAST_ROW = 32

Const OUTPUT_FILENAME = "~/Desktop/json-export-data.txt"

Here's how the beginning of the JSON file for the demo database looks like:


{
"0": {
"0": "1002000100",
"1": "<p><b>Doc ID</b></p>", 
"2": "<p><b>Title</b></p>", 
"3": "<p><b>Title Notes</b></p>", 
"4": "<p><b>Author</b></p>", 
"5": "<p><b>Cross Reference</b></p>", 
"6": "<p><b>Format - Paramétrage</b></p>", 
"7": "<p><b>Notes on Content</b></p>", 
"8": "<p><b>Relates to</b></p>", 
"9": "<p><b>Number of Systems</b></p>"
},
"1": {
"0": "1002000200",
"1": "<p><b></b></p>", 

[...]

Note: The HTML formatting (<p>, <b>, etc.) is present because the first row of the Calc sheet is formatted in bold. The HTML tags will automatically be stripped of during the bulk import process on the server side.

Importing the Bulk Data to the DRDB Database

Now open the import page of the database by clicking on the "Administration" icon in the main page of the database and then click on the 'Bulk Import' icon. Select the JSON file with the "Browse" button and then press the 'Upload' button to start the import.

Note: Depending on the amount of data, importing data could take quite some time. 400 records with a total JSON file size of 1 MB could take around 5 minutes to import. To see how the system processes the data, open a shell window on the server and do a 'tail -f' on the log file.

Screenshot

Success or Failure Indication

Before data is imported, DRDB validates that the number of columns matches the number of fields of the database, that the field names match and that the record numbers in the JSON data do not yet exist in the database. If any of these conditions are not fulfilled, no data is imported and an error message is shown.

The following screenshot shows how DRDB reports that some records in the JSON data already exist in the database:

Screenshot

The following screenshot shows how DRDB reports that two field names in the database do not match the names of the rows in the JSON file.

Screenshot

The following screenshot shows how DRDB reports successful data import.

Screenshot

Large Bulk Data Import

While the web interface allows importing bulk data, the maximum amount of data that can be processed is currently limited depending on the web server settings. A Docker installation with an nginx reverse proxy front end is limited to around 2 MB of json formatted data. Also, the http connection may time out before the import has finished. When a timeout error is shown, check on the server if the import is still running in the background. This is definitely an area that needs to be improved in a future software enhancement.

In the meantime, the following 'hack' could help to import a large amount of data:

        // Get the bulk data from the POST variable and convert from JSON
        // format to an associative array

        //$bulk_data_JSON = $_POST['bulk'];
        $bulk_data_JSON = file_get_contents('/bulk-data-to-import.txt');