ELAN → Flex → ELAN “round trip” workflow

John Mansfield, Univesity of Melbourne, 29 May 2015

This guide is based on ELAN 4.9, and SIL Fieldworks Language Explorer 8 (“Flex”).


Main steps:

  1. Transcribe recording in ELAN

  2. Export to .flextext format

  3. Import into Flex

  4. Interlinearise in Flex

  5. Export in .flextext format

  6. Merge interlinear analysis with original .flextext file

  7. Import .flextext back into ELAN

  8. Run a tier-­‐renaming script on the .eaf file

  9. ELAN file is now ready for further transcription, contains morphological analysis, and can go on another round trip if further interlinearisation is required.


1. Transcribe recording in ELAN

Transcribe as per normal using MPI ELAN. You can use whatever tiers you want, though some might have their names changed. But note, however, that an utterance > word > morph hierarchical structure is assumed for this entire process. I.e. the tier hierarchy should look something like this (for a simple transcript):

Or like this, for a transcript with interlinearisation:

2. Export to .flextext format

Use the ELAN menu option, File > Export As > FLEx File. This will prompt you

through four steps:

Steps 1 and 2 don’t ever seem to require deviation from the default settings. If there is a problem here, this guide will have to be updated accordingly. Step 3 does require some setting: basically you have to map your ELAN tiers to the Flex data types. The basic tier types map as follows:

transcription → txt

translation → gls

comments → comment

word/morph parsing → txt

word/morph glossing → gls

The dialogue lets you map each ELAN tier type as generalisation (use this if you have multiple transcript tiers, multiple translation tiers, etc), or map individual ELAN tiers. Both options seem to work fine. Here’s the mapping for a simple transcription file, with no interlinearisation or specialty tiers:

You also need to specify for each tier whether it is the language being transcribed (e.g. Murrinhpatha (mwf)) or English (en). These language codes are not already in the ELAN system, so you add them down the bottom of the dialogue box with “Add custom value (language)”.

Here’s the mapping for a simple

Here’s the mapping I’ve used for exporting a rather complex transcript that already has interlinearisation. The ELAN tier names I’ve been using here are designed to match the Flex data structure in an intuitive way:

You might have to add custom data types for any specialty tiers you’ve used. ELAN will let you add any data type you like, but the names of these types should adhere to some kind of standard naming for the corpus you’re building, otherwise they won’t gel with other tools down the track.

Export file should have same name as original, but with .eaf changed to .flextext.

And they should sit alongside each other in your corpus directory structure.

3. Import into Flex

Straightforward, no options to set.

But if this transcription is already in your Flex project, you should take some steps to avoid duplicating it. I would suggest 1) rename old version of transcript in Flex; 2) import new version; 3) delete old version if satisfied with import.

4. Interlinearise in Flex

Do it.

5. Export in .flextext format

When you are sick of interlinearising, export it back out again. You must save it

back into the relevant corpus directory, now with the file ending

.postflex.flextext

6. Merge interlinear analysis with original .flextext file

The biggest limitation with Flex interlinearisation is that most (or all) custom tiers in your .flextext file are not retained when it goes in and out of the program. So essentially you just want to get the interlinearised analysis that Flex has

produced, and insert it into your original .flextext file. Do this using the XSL script merge-­‐interlinear.xsl.

Also you will often have fixed some transcriptions or translations while

interlinearising in Flex. Therefore this merge process overwrites the originals of

these tiers, unless you ask it not to by using the option overwrite=no.

Examples of usage:

java -jar -Xmx1024m /Library/SaxonHE9-4-0-4J/saxon9he.jar -t

Magultje-test.flextext merge-interlinear.xsl interlin=Magultje-

test.postFlex.flextext overwrite=no > Magultje-test.merge.flextext

or

java -jar -Xmx1024m /Library/SaxonHE9-4-0-4J/saxon9he.jar -t

../archival/1959_Hale-recording/HALE_K06-004534.flextext merge-

interlinear.xsl interlin=../archival/1959_Hale-recording/HALE_K06-

004534.postflex.flextext >../archival/1959_Hale-recording/HALE_K06-

004534.merge.flextext

The two earlier flextext files should be moved to the /old-­‐versions folder that you keep for every corpus session. The merged version will be the file used for corpus analysis, because it’s a much more elegant data structure than ELAN’s .eaf format.

7. Import .flextext back into ELAN

The merge script will have produced a file that you should import into ELAN. I’ve just been leaving all the options at default here.

8. Run a tier-­‐renaming script on the .eaf file

The file back in ELAN now has awkward (but highly logical) tier names, derived from the Flex data structure. You could change all these. To convert these into nicer, shorter tier names, use the script rename-­‐tiers.py. This will send the older version into your /old-­‐versions directory, and replace it with a renamed one.

(Also does the accompanying .pfsx file.)

Usage:

python rename-­‐tiers.py ../PATH/TO/FILE.eaf

9. Done

ELAN file is now ready for further transcription, contains morphological analysis, and can go on another round trip if further interlinearisation is required.

results matching ""

    No results matching ""