Producing time-aligned interlinear texts: Towards a SayMore–FLEx–ELAN workflow

Rayan Pennington

SIL PNG

20 March 2014

Abstract

This document illustrates a workflow for the creation and organization of time-aligned annotations by utilizing the advantages of three software programs—SayMore, FLEx, and ELAN. Texts are segmented, transcribed, and translated using SayMore. After this basic annotation, the file is pulled into FLEx to allow for the incorporation of data from a FLEx-based lexicon. Next, the file is exported back into the SayMore file structure and then opened in ELAN for final processing. This final time-aligned interlinear text remains in SayMore for depositing alongside the entire corpus in a repository or archive. Since SayMore is not equipped to handle multiple speakers, however, many recorded texts will require a greater reliance on ELAN. Therefore, instructions for an ELAN-first methodology are provided as well.

Introduction1

This document describes and illustrates a workflow for the creation and organization of time aligned annotations. The method prescribed herein allows for the advantages of SayMore, FLEx, and ELAN to be utilized in tandem with one another. Texts are segmented, transcribed, and translated using SayMore. Following this, the file is pulled into FLEx to allow for the incorporation of data from a FLEx-based lexicon. Next, the file is exported back into the SayMore file structure and then opened in ELAN for final processing of the time-aligned interlinear text product.2 This file then remains in SayMore for depositing alongside the entire corpus in a repository or archive. After investigating the topic and beginning to compose my ideas, I came across a manuscript by TIM GAVED and SOPHIE SALFFNER (2014) on the same topic at http://tla.mpi.nl/tools/tla-tools/elan/thirdparty/. This manuscript utilizes a similar (ELAN–FLEx–ELAN) workflow, and provides an ELAN template and teaching set with all the files needed in order to walk through the steps along with the paper. The two primary methodological differences between this paper and that of GAVED & SALFFNER are listed below:

  • The method in GAVED & SALFFNER consists of segmenting and annotating in ELAN, whereas the method prescribed here suggests performing these tasks within the simpler SayMore program.
  • The final output of both methods is a fully-interlinearized time-aligned text. However, the method prescribed here suggests pulling the file into SayMore to relate to the organizational structure of the entire documentary corpus.

Note that the current paper does not provide any instructions for installation or using SayMore, ELAN, or FLEx. For SayMore, the reader is encouraged to appeal to information found at http://saymore.palaso.org/, as well as helps found within the program itself. For FLEx, appeal to information found at [http://fieldworks.sil.org/]\(http://fieldworks.sil.org/\), as well as manuals at [http://wiki.lingtransoft.info/doku.php?id=tutorials:student_manual]\(http://wiki.lingtransoft.info/doku.php?id=tutorials:student_manual\), and Demo Movies and helps found within the program. For ELAN, visit [http://tla.mpi.nl/tools/tla-tools/elan/]\(http://tla.mpi.nl/tools/tla-tools/elan/) for user guides and manuals, as well as helps within the program.

Before introducing the steps, it will be valuable to become acquainted with the strengths and weaknesses of the three programs. SayMore is a unique program with the following strengths:

  • Simple, easy-to-learn, and engaging interface
  • Provides a structure for organizing the tasks of a full documentation project
  • Provides charts and statistics to track progress of the project
  • Aids in organization by helping to name all files consistently
  • Easy venue for the collection of project- and text-level metadata
  • Extremely easy-to-use tool for segmentation and annotation of texts
  • Annotated text output in .eaf format (XML), which is future-proof and also allows for easy editing in ELAN
  • Archival in IMDI format or SIL’s REAP format

In spite of these many strengths, SayMore does not allow hierarchical annotations like ELAN’s tier structure. SayMore is also not equipped to tackle multiple-speaker recordings. Neither does SayMore allow for multiple translations. These can only be adequately handled in ELAN. Finally, SayMore also does not handle interlinearization. FLEx is best equipped for this task. For more information about the strengths and weaknesses of SayMore, see MOELLER (2014).

ELAN is an extremely powerful program. Some of its primary strengths are listed below:

  • Time-alignment of annotations with the intended media file
  • Hierarchical dependency of tiers which allows for time-saving activities (e.g. ‘tokenizing’ a phrasal tier to produce a word tier) and important temporal relationships between tiers
  • XML file for a future-proof output in unproprietary format Various viewing options which allow for easy segmentation (‘Segmentation Mode’), transcription (‘Transcription mode’), and then an all-in-one setting (‘Annotation Mode’)
  • Powerful searching capabilities based on the participant, the tier, the type of tier, or otherwise; precise searches can be performed on a file, or across an entire directory of files
  • Output to many file types, as well as the possibility of easily producing subtitled video files

Though powerful in many ways, ELAN suffers from being difficult to learn. This is due to its strange use of certain terms, some poor choices of keyboard shortcuts, and an overall unfriendly user interface. It also offers poor support for non-roman scripts. In spite of these weaknesses, once it is learned it is a valuable tool with full customization of almost every option (including keyboard shortcuts).

The FLEx interlinear tool is also powerful:

  • Easy insertion of lexemes from the text into the lexicon
  • Automatic population of interlinear texts with information from the lexicon
  • Saves history of choices and makes smart guesses regarding parsing
  • Automatic parser can process rules to provide accurate morphological divisions, even with opaque alternations
  • Allows for easy creation of example sentences for a dictionary
  • Various export options

The primary weakness of FLEx is its lack of time-alignment and linking with media. Additionally, it is not easy for someone else to access a FLEx database and find a specific text. With ELAN’s ubiquity in academic linguistics, and with its ouput of single fully-searchable files, any linguist can take a single ELAN xml (.eaf) file, along with its source media file, and quickly have at their fingertips all they could want.3 This raises the question: Why not just interlinearize a text in ELAN and do away with FLEx? The answer is three-fold: 1) FLEx allows input from the lexicon in the interlinearization process, while ELAN does not. Even if one has every word of a language memorized, they cannot type as fast as FLEx can remember. 2) Morphological parsing is rather difficult to accomplish in ELAN. 3) For the creation of a dictionary, FLEx is an invaluable tool. ELAN does not have as an output a publishable dictionary.

The purpose of this paper, then, is to provide a workflow so that one can profit from the strengths of all three programs. Unfortunately, they do not play very well together all the time. Hopefully in future development solutions will be implemented, but in the meantime there is no need to wait. Proper time-aligned interlinear texts can be created and organized by following the steps described in this paper. Interlinearization is a difficult, time-consuming task. I hope this paper can save time and hassle so others can focus on the task at hand.

The workflow developed here assumes that one will begin by segmenting, transcribing, and translating a text in SayMore (§2).4 From there, the text is exported in the FLEx format and imported into FLEx for interlinearization, and finally exported back into the SayMore file structure (§3). Next, the text is opened in ELAN to be cleaned up and saved as the final timealigned interlinear product (§4). It was already mentioned above that SayMore cannot deal with multiple speakers, and since dialogue has a centralized role in a documentary corpus, such an omission will force a greater reliance on ELAN than is assumed throughout this paper. Instructions for an ELAN-first methodology are provided in (§5), as well as in GAVED & SALFFNER (2014).

SayMore segmentation & annotation

The first task is to open SayMore, create a session, and pull in an audio file for annotation. If the audio file is in any format other than WAV, then it will need to be converted for annotation to proceed. Also, if the media file is video with embedded audio, then the audio needs to be extracted first. These choices are accomplished by clicking Convert… above the file pane and selecting a conversion or extraction option. If you try to proceed with segmentation without this step, SayMore will prompt you to do it anyway. This conversion or extraction creates a new audio file named ‘Sessionid_Source_StandardAudio’. It is this file which will be annotated, and with this naming scheme SayMore knows that the new audio file is linked to other media file. If, instead, separate audio and video files are imported for the same session, then both can be labeled as ‘Sessionid_Source.xxx’, since they are both technically source files. However, it is important that only the WAV file be dealt with in the ensuing steps.

Before annotation can commence, a file needs to be segmented. To do this, click on the audio file (either the ‘Source’ file or the ‘StandardAudio’ file) and then click on the Start Annotating tab. SayMore provides the option of using the Manual Segmentation Tool or the ‘auto segmenter’.

FIGURE 1: SAYMORE SEGMENTATION METHODS

If you choose the Manual Segmentation Tool, then simply play the file and press enter at every pause break to create a segment boundary. If the auto segmenter is chosen, the file will automatically be chunked into intonation unit groupings and the segments will be presented for easy annotation. You will want to check the segments and make adjustments, so click on Segment… above the Free Translation annotation column. Segment boundaries can be added or moved easily. If there are segments that are composed of introductory metadata, or intermittent breaks in the speech event, then these segments can be ignored. Also, you can zoom in for finer attention to segment boundaries. Once the segments are aligned according to preference, annotation can begin. By pressing Tab the cursor jumps from Transcription to Translation and then to the next segment. By pressing Enter, the cursor just moves down a single column (either Transcription or Translation) from segment to segment. As transcription progresses, the selected segment automatically plays on a loop.

4

FIGURE 2: SAYMORE TEXT ANNOTATION

After written transcription and translation is complete, the text can be exported for interlinearization in FLEx. Note that a project must have been created in FLEx with the correct writing systems established; otherwise, the export will not work properly.

If FLEx is ready, then in SayMore select Export, which is located above the Free Translation column of the annotations pane. Select FLEx Interlinear Text…, select the transcription and translation languages, and press Export… Finally, when asked for the location to save the flextext export, locate the SayMore data directory and save it directly in the correct session’s folder. Just leave the name as it suggests, which will be ‘Sessionid_Source.wav.flextext’ (or, in the case of a converted file, ‘Sessionid_Source_StandardAudio.wav.flextext’). Saving the file here allows for fuller transparency: the output from every stage will be included in the session’s list of files. If any problems arise, it will be easier to find the source of the issue and make a correction. This method will also protect from the misplacement or loss of important files, especially if you plan to undertake these stages at different times.

FLEx interlinearization

Open FLEx, go to the Texts & Words tab, go to File/Import/FLExText Import…, find the file in the SayMore session folder, and open it. The new text will automatically be selected. If some of the interlinear lines are not visible, you may need to go to Tools/Configure/Interlinear… to make sure the correct lines are displayed. FLEx will suggest analyses by recalling past decisions.

Optionally, if properly constrained, one of the automatic parsers will do its job by suggesting morpheme breaks as well. Interlinearize the text and input any necessary information in the fields on the Info tab. Once all decisions have been made, you are ready to export the file into ELAN.

5

FIGURE 3: FLEX INTERLINEARIZATION

With the text selected, go to File/Export Interlinear…, select the FLEXTEXT extension, and click Export… . Check the box next to the appropriate text name and select OK. When asked for the location, save the file in the session’s folder within the SayMore data directory. Here, it is best to name the file ‘Sessionid_Interlinear.flextext’. This separates it from the other flextext file that has already been saved in the directory (i.e. the original export from SayMore).

ELAN time-aligned interlinear output

Open ELAN and go to File/Import/FLEx File… The Import FLEx window appears, as shown in Figure 4.

FIGURE 4: ELAN IMPORT FLEX FILE WINDOW

6

  • For the FLEx file, select the file you just exported from FLEx.
  • For the Media file(s), there is no need to put anything. The media file from SayMore is embedded in the output and survives the trip through FLEx. However, if you saved one of the files in a location outside of the SayMore session folder at some point, then
  • ELAN will prompt you to locate the media file after you click OK.
  • Check ‘Include “interlinear-text” element’. This pulls information from the ‘Info’ tab of the text in FLEx, including the title, comments, source, and genre.
  • Do not check ‘Include “paragraph” element’
  • Do not check ‘Import participant information from Note field’
  • Selecting ‘phrase’ as the smallest time-alignable element means that the word tier in
  • ELAN will be a symbolic subdivision, and therefore all words will be equal subdivisions of its parent—the phrasal annotation tier. This is the simplest choice.

Selecting ‘word’ means that the word tier will be a time subdivision, and therefore its annotations can be moved to align with the recording. This is a time-consuming task, but will result in the ability to play individual words in ELAN or in PRAAT using the sendpraat program. If you do not intend to pursue this level of detail in the alignment of annotations, then simply select ‘phrase’.

Under ‘Linguistic Types’ choose ‘Create for all basic elements’ to create separate broad types such as phrase, word, morph, etc. For more finely-divided types, choose the other option.

The ‘Duration per phrase element’ box does not matter in this workflow, because the time-alignment has remained from the segmented file created in SayMore. Still, the current version of ELAN does not allow it to be empty, even if the field is unused.

Therefore, put any number here.

Click OK and the full interlinearized time-aligned text is shown, as in Figure 5. The tiers will

automatically be organized alphabetically, but to view them hierarchically right-click in the tier pane and select Sort Tiers/Sort by Hierarchy.

7

FIGURE 5: NEWLY IMPORTED FILE IN ELAN

It is important to go ahead and save the file, which will then allow the program to automatically back up your data (as often as every minute, if this option is selected). Again, choose to save the file in the appropriate SayMore session folder, this time with the file name ‘Sessionid_FinalInterlinear.eaf’. This name separates this final product from the other intermediate stages. Saving the file within the SayMore session folder has the added benefit of allowing ELAN to save the preferences (.psfx) file into the SayMore directory as well. One hiccup in this process is that FLEx cannot output the intonation unit annotations that were created in SayMore. Instead, this tier—which is called ‘A_phrase-segnum-en’—consists of sequentially ordered numerals. However, though this is not ideal, since the original annotation xml file remains in the SayMore file structure, the data is not lost.

Three final steps remain to be undertaken with this ELAN file:

  • Delete tiers according to preference.
  • Rename tiers to fit your own preferred conventions.
  • Optionally shift the annotation boundaries on the word tier for time-alignment (only possible if you selected ‘word’ as the smallest time-alignable element in the import window.

When the entire process is complete, returning to SayMore will show the following files for the session, as exemplified in Figure 6. This list does not include any oral annotations such as careful speech or oral translation.

  • Sessionid.session
  • Sessionid_FinalInterlinear.eaf
  • Sessionid_FinalInterlinear.eaf.0015
  • Sessionid_Interlinear.flextext
  • Sessionid_Source.wav
  • Sessionid_Source.wav.annotations.eaf
  • Sessionid_Source.wav.flextext

FIGURE 6: FINAL SAYMORE SESSION CONTENTS

Of course, if the interlinearization is based on a converted media file, then the list will be

slightly different:

5

This is an automatic backup of the ELAN file, which can be deleted.

8

  • Sessionid.session
  • Sessionid_FinalInterlinear.eaf
  • Sessionid_FinalInterlinear.eaf.001
  • Sessionid_Interlinear.flextext
  • Sessionid_Source.mov (or whatever media file extension)
  • Sessionid_Source_StandardAudio.wav
  • Sessionid_Source_StandardAudio.wav.annotations.eaf
  • Sessionid_Source_StandardAudio.wav.flextext

Also note that a number of other files may be present in the folder that SayMore simply does not display:

ELAN’s saved preferences for the file6

o Sessionid_Interlinear.pfsx

SayMore’s saved metadata for each file7

o Sessionid_FinalInterlinear.eaf.meta

o Sessionid_Interlinear.flextext.meta

o Sessionid_Source.wav.flextext.meta

o Sessionid_Source.wav.meta

Finally, it is a good idea to explain in the SayMore notes field for each file what is contained

in that file (e.g. ‘This is the original exported annotations file which was brought into the FLEx

program for interlinearization.’). This is exemplified for the FLEx export in Figure 7.

FIGURE 7: SAYMORE ITEM NOTE

ELAN segmentation & annotation

While SayMore is preferred, especially when teaching others, due to its simple user interface,

ELAN is far more powerful. It was mentioned in §0 that SayMore cannot handle multiple

speakers or multiple translations. If either of these options are needed, then the workflow should

6

The preferences file may be deleted as well, but this would revert future viewings of the file to the default (e.g. if

the tier hierarchy were sorted differently, then this preference would be lost).

7

Technically, these files are displayed as the ‘Properties’, ‘Contributors’, and ‘Notes’ tabs for each individual file.

9

begin in ELAN instead of SayMore.8 Note that SayMore did not require attention to be paid to the naming of tiers or of special protocols for exporting into FLEx. ELAN, on the other hand, requires a careful series of steps. Those steps are outlined in GAVED & SALFFNER (2014), though some updates, further explanations, and simpler steps are provided here instead. My understanding of how the ELAN tiers should be named comes from the aforementioned document. With an aim to simplicity, I believe, GAVED & SALFFNER provide one careful set of steps with no mention of alternate methods which may result in the same final product. Here I provide the necessary steps of the workflow, while still allowing flexibility for various preferences in working within each program. Additionally, I do not provide any accompanying materials such as an ELAN template or media file, as do GAVED & SALFFNER. Two primary differences between the methodologies are listed below:

  • GAVED & SALFFNER suggest tokenizing the text tier to produce a word tier before exporting into FLEx. This wastes time, since FLEx does this automatically.
  • GAVED & SALFFNER are particular about naming conventions, without pointing out that the conventions only aid in simpler exporting. This paper provides simpler file names, while also showing how to export the file no matter what name is chosen.

The goal is to get the text segmented, transcribed, and translated as efficiently as possible.

Then that file can be exported for FLEx interlinearization and kept organized within SayMore, if preferred.

5.1

Segmenting

The first task is to open ELAN, create a new file, and add a media file. At this point, it is best to go ahead and save the file so that the automatic backup feature of ELAN can keep your data protected. If you plan to pull this file into SayMore, then it is best to save it in the correct SayMore session folder with a name like, ‘Sessionid_Interlinear’. This will allow you to remain organized, and it has the added benefit of allowing ELAN to save the preferences (.psfx) file into the SayMore directory as well. Once this is done, segmentation can commence. There are two methods, an automatic segmenter and a manual segmenter. These are discussed in turn below.

To allow ELAN to segment automatically, click on the Audio Recognizer tab and select the Silence Recognizer MPI-PL. Select the default tier and set values for the Minimal Silence Duration (determines how long silence must last to be counted as silence) and Minimal Non Silence Duration (determines how long noise must last to be counted as a segment). Finally, click Start. When the process completes, select Create Tier(s). Uncheck the ‘s’ label (silence) and click Create.

8

Though if a second translation or another simple need is all that is driving this change, much of this can be easily

entered into FLEx, or into ELAN after the import. Another method would be to create the original annotations in

SayMore, then open the file with ELAN and continue with the steps outlined in §5.3.

10

FIGURE 8: AUTOMATIC SEGMENTATION IN ELAN

As a result, there is a tier called ‘Channel1’ with intonation-unit segments, each labeled ‘x’,

as shown in Figure 9.

FIGURE 9: RESULT OF AUTOMATIC SEGMENTATION

To clear the ‘x’ values, go to Tier/Remove Annotations or Values…, select the ‘Channel1’ tier only, and select ‘Annotation Values’ and ‘All annotations’. Press OK. Some boundaries may need to be adjusted. If so, hold alt and select an annotation boundary to move it.

To segment manually, go to Options/Segmentation Mode. This mode allows you to press a button to mark segments as the media file plays. Select ‘Two keystrokes per annotation (nonadjacent annotations)’ if you’d like to hit ‘enter’ for every starting and ending annotation boundary. Select ‘One keystroke per annotation (adjacent annotations)’ to make the starting point of each new annotation the ending point of the previous one. Select delayed mode and input number of milliseconds if you’d like it to correct your delayed reflexes by a fixed duration.

Finally, play through the file and press enter as you go. You can press the space bar to pause.

You may return later and make changes or add annotations if you miss some. The result is a fully segmented ‘default’ tier that is ready for annotation.

5.2

Creating Types and Tiers

Before annotation can commence, ‘types’ need to be created so that ‘tiers’ can be created as well. Though this process might take a few minutes, you can save the result as a template for any future files. It turns out that when the text is exported out of FLEx and re-imported into ELAN (§4), new types will be created. That is, do not waste time carefully organizing a hierarchical organization of types. FLEx will create a good hierarchy for you, which can be altered and renamed according to preference. For now, all that is necessary is that three types be created9—

vernacular text, translation, and a note type (if desired). These types can be named anything, and even their stereotypes do not matter at this point. However, for Transcription Mode to be best utilized, it is ideal for the translation and note types to be symbolic associations. The following

naming conventions are used herein, with their suggested stereotypes listed in parentheses:

text (none)

translation (symbolic association)

note (symbolic association)

If you would like to create a tier to hold the title of the text, or any other notes that pertain to the entire text, then the following type can be added:

title (none)

Once these types are finished, tiers can be created. Again, tier names are not important.

However, keep in mind that the FLEx export will create new tier names based on its own structure. Therefore, once again, do not waste time with careful naming conventions. In fact, if you follow the example here, the export process out of ELAN is made much simpler (i.e. less customization of mapping between the programs). Create tiers as shown below, where the Parent Tier is in parentheses and the Linguistic Type is in brackets. Create a hierarchy for each participant in the recording. For a narrative, only the ‘A’ set is needed. Participant information may be entered, as it will be brought back into ELAN. Information in the Annotator field will be lost, however.

A-txt-xxx (none) [text]

o A-gls-en (A-txt-xxx) [translation]

o A-gls-zzz (A-txt-xxx) [translation]

o A-note-en (A-txt-xxx) [note]

B-txt-xxx (none) [text]

o B-gls-en (B-txt-xxx) [translation]

o B-gls-zzz (B-txt-xxx) [translation]

o B-note-en (B-txt-xxx) [note]

The following list explains the various elements of the above tier names:

The first element is the participant name code used by FLEx. No matter what you enter here, it will be exported back into ELAN as ‘A’ or ‘B’ or ‘C’, etc., depending upon the number of participants. This element is not required by FLEx. It just turns out that, by using it, the export mapping fields are populated correctly without the need for customization.

9

The default type can be renamed to create your first type, and then your segmented tier can be renamed as well.

Any additional empty tiers can simply be deleted.

12

The second element (txt, gls, note) is composed of a code for a FLEx type. By entering these here, the tiers are mapped correctly to the correct codes in FLEx. The code ‘txt’ is the vernacular line, ‘gls’ is a translation, ‘note’ is the note field for a particular vernacular phrase. Here there are two translation tiers, one for English and one for another LWC. This second translation line is optional.

The third element matches the internal languages codes used in FLEx. These are typically the ISO codes, but not necessarily. English is ‘en’, Tok Pisin is ‘tpi-PG’, etc.

By entering this element correctly, the tiers are mapped to the correct language codes in FLEx. (To determine the language codes, open your FLEx project and go to File/Project Management/FieldWorks Project Properties… and go to the Writing Systems tab. Then click on a language and click the Modify button and look at the bottom right of the window. See Figure 10.)

FIGURE 10: FLEX-INTERNAL LANGUAGE CODES

Finally, for tiers that provide additional information for the entire text, name them as follows:

interlinear-text-title-xxx (none) [title]

o interlinear-text-source-xxx (interlinear-text-title-xxx) [note]

o interlinear-text-comment-xxx (interlinear-text-title-xxx) [note]

This provides you with a tier each for the title field in FLEx, the source field, and the comment field. The ‘xxx’ corresponds to which language you would like for this field. Again though, if you wait and just enter this data into FLEx, then when the file is re-imported into ELAN these tiers will be present.

5.3

Annotating

Now that segments have been created, and types & tiers have been named, the file should look similar to Figure 11.

13

FIGURE 11: TIERS AND SEGMENTS

It is now time to begin the annotation stage. Go to Options/Transcription Mode and select ‘text’ as the first column. Then select ‘translation’, then ‘note’. If you have two translation tiers, then two different columns can be selected as ‘translation’. Finally, click Apply. By selecting ‘Automatic playback of media’ on the lefthand side, each time the cursor moves to a new field the segment will play again. By selecting ‘Navigate across column’ you can hit ‘enter’ to move across the row. This allows you to transcribe, translate, and enter notes as you move through the recording. By unchecking this option, then you are able to focus on a single tier at a time.

FIGURE 12: ANNOTATING IN TRANSCRIPTION MODE

Note that, if the media file has introductory metadata (such as an a summary or introduction in English or another LWC), do not provide annotations for these segments. If you desire to provide a translation for such material, it is best to wait until the file has been re-imported into ELAN.

Once you have fully annotated the file, and you have entered any other annotations such as the title or source or comment, then you are ready to export the file for interlinearization. Even though ELAN has a powerful ‘tokenize’ feature, which creates word-level annotations based on a phrase-level tier, this is unneeded in the current workflow. FLEx will automatically break up the phrases in order to parse each word, so time need not be spent doing this manually.

5.4

Exporting

To export the ELAN file, go to File/Export As/FLEx file. In the first window, select ‘Export interlinear-text tier’ if you have created a tier to hold the title. The ‘interlinear-text’ encompasses the information that can be inputted in the ‘Info’ tab of a FLEx interlinear text. The only fields which are currently able to be exported are the Title, Abbreviation, Source, and Comment. If ‘Export interlinear-text-tier’ is checked, select the appropriate tier as well. Do not check ‘Export

14

paragraph tier’. Make sure that, next to ‘phrase’ the ‘Corresponding Tier Type’ is ‘text’. Also make sure that the text tier is selected under ‘phrase’ in the section entitled ‘Select tiers to be exported’. This has mapped the top-level tiers to FLEx categories. Click Next.

FIGURE 13: STEP 1/4 OF ELAN EXPORT

The second window allows you to map the child tiers to the appropriate categories in FLEx.

The ‘interlinear-text_item’ column should have ‘note’ selected if you have entered any information like Comment, Source, etc. The ‘phrase_item’ column should have ‘translation’ and ‘note’ selected. Now that the types have been mapped, the appropriate tiers should be checked in the bottom half of the window. These will consist of all the child tiers which depend on the toplevel tiers (text and title). Click Next.

15

FIGURE 14: STEP 2/4 OF ELAN EXPORT

The third window allows you to specify the FLEx category types and languages for each tier. If you have followed the naming conventions described above, then these will be filled out correctly automatically. If you chose to follow your own naming conventions, then this is where you can input the correct information. Add FLEx type codes or language codes and then select them in the columns. When finished, click Next.

16

FIGURE 15: STEP 3/4 OF ELAN EXPORT

In the final window, save the file. If you are choosing to utilize SayMore to organize your

corpus, then save this file in the proper session folder as ‘Sessionid_Interlinear.flextext’. Now

return to §§3–4 for FLEx interlinearization and producing a final ELAN output.10

References

GAVED, TIM, and SOPHIE SALFFNER. 2014. Working with ELAN and FLEx together: an ELAN-FLEx-ELAN teaching set. http://tla.mpi.nl/tools/tla-tools/elan/thirdparty/ (Accessed 13 February, 2014).

MOELLER, SARAH RUTH. 2014. SayMore, a tool for language documentation productivity. Language Documentation and Conservation 8. 66–74. http://hdl.handle.net/10125/4610 (Accessed 16 March, 2014).

HELLWIG, BIRGIT, and JEROEN GEERTS. 2013. User guide for ELAN - Linguistic Annotator v4.6.2. The Language Archive, MPI for Psycholinguistics, Nijmegen, The Netherlands. http://tla.mpi.nl/tools/tla-tools/elan/ (Accessed 4 November, 2013).

MAX PLANCK INSTITUTE FOR PSYCHOLINGUISTICS. ELAN Linguistic Annotator v4.6.2. The Language Archive, Nijmegen, The Netherlands. http://tla.mpi.nl/tools/tla-tools/elan/.

10

If you find that you need to re-import a file from ELAN into FLEx, ensure that you first delete the text in FLEx.

Otherwise, the FLEx text file will become corrupted. Another option is to, during the import, answer ‘Yes’ to merge the import into the existing text. Renaming the existing text will not prevent the corruption, since the source of the problem is that FLEx uses an internal ID for each text that cannot be seen or changed. This error should be resolved with FLEx version 8.0.10.

1. Some of the rough edges of this workflow were improved by Tim Gaved, Marlon Hovland, and Sarah Moeller, who offered helpful comments on previous drafts.
2. The steps described here are based on the use of SayMore version 3.0.172, FLEx (Fieldworks Language Explorer) version 8.0.9, and ELAN (Eudico Linguistic Annotator) version 4.6.2.
3. Unless a particular specialized font is used.
4. Enter footnote here.Though a text can be transcribed and interlinearized in FLEx first, with later time-alignment in ELAN, the segmentation and annotation tools in both SayMore and ELAN are superior. This paper, therefore, anticipates that users will follow the preferred SayMore–FLEx–ELAN method (or the later-described ELAN–FLEx–ELAN method). Still, a FLEX–ELAN workflow can be accomplished by first transcribing a text in FLEx, and then following the instructions described in §§3–4.

results matching ""

    No results matching ""