The PAL Meeting Assistant Transcriber provides batch transcription of independently recorded speech files using the Mercury speech recognition APIs. In essence, Transcriber provides a stand-alone software service for the transcription of speech from meetings.
Access to Transcriber is through a browser interface. A transcription request involves uploading speech files to the system and providing an email to which the results will be sent. When Transcriber is finished processing the files, it sends the transcription of the files via email along with a link for downloading additional audio analysis files (intended for use by speech researchers).
Note that the uploaded file must be a zipped archive of audio files that satisfy the following requirements:
After the PAL Meeting Assistant Transcriber has been installed on a web server at your site, obtain the Transcriber URL from your site administrator.
Enter the Transcriber URL in a browser of your choice to obtain the main Transcriber interface, as shown in Figure 1.
Figure 1. Transcriber Main Interface
To initiate a transcription request, click the "UPLOAD AUDIO" tab; a form will appear as shown in Figure 2. In this form, enter an email address to which the results should be sent, specify the location of the audio zipped file, and then click the "Upload" button.
Figure 2. Upload a File
When the zipped file is received by the Transcriber server, Transcriber returns with the job submission ID as shown in Figure 3. The upload time depends on the size of the zip file. As a baseline, it takes approximately 30 seconds to upload an hour-long audio file.
Figure 3. File Submission Confirmation
Transcriber will send an email message when it completes processing of the uploaded audio files. An attachment to the email provides the full transcribed data in HTML format, as shown in Figure 4. Unlike Mercury (MA Configuration I), the transcriptions by the Transcriber do not segregate by speaker since speech is recorded external to the MA system. The transcription time depends on the size and quality of the audio input. Typically, it takes four times the duration of the audio file. Informal analysis, based on samples drawn from a set of recordings from the House Armed Services hearings, showed accuracy of around 90% for SRI's Decipher Automated Speech Recognition (ASR) system.
Figure 4. Audio Transcription File
Besides the full transcript, the notification email also provides a link to download additional transcription files to facilitate further analysis of the uploaded audio. These files, intended for usage by researchers in ASR, include the following:
The "HELP" tab in the main Transcriber interface brings up online help documentation, as shown in Figure 5.
Figure 5. Online Help
Transcriber seems to take a long time to process my audio. What is it doing?
To improve transcription accuracy, Transcriber does not use real-time speech recognition. Transcriber typically takes four times real time for processing an audio file.
Can I upload more than one file at a time?
Yes. Transcriber allows you to upload a .zip archive containing multiple .wav files. The results file will contain a transcript for each .wav file in the archive.