|
|
||||||||||||||||||||||
| Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur | |||||||||||||||||||||||
Spoken Language Processing Group (TLP)Processing multilingual broadcast audio for information accessOne rapidly expanding application area for speech recognition technology is the processing of broadcast audio for information access. Audio indexation must take into account the specificities of audio data, such as the need to deal with a continuous data stream and an imperfect word transcription. This research aims at combining multilingual speech recognition technology with natural language processing to support a variety of tasks such as automatic structurization of audio data, spoken document retrieval, topic detection and generation of alerts. Large vocabulary continuous speech recognition is a key technology that can be used to enable content-based information access in audio and video documents, such as broadcast audio. This data is challenging as it contains segments of various acoustic and linguistic natures, which require appropriate modeling. Via speech recognition, spoken document retrieval (SDR) can support random access to relevant portions of audio documents, reducing the time needed to identify recordings in large multimedia databases. Since most of the linguistic information is encoded in the audio channel of video data, once transcribed the information can be accessed using text-based tools. The TREC (Text REtrieval Conference) SDR evaluation showed that for American English only small differences in information retrieval performance are observed for automatic and manual transcriptions. Selective dissemination of information and media monitoring require identifying specific information in multimedia data, such as TV or radio broadcasts, and alerting users whenever topics they are interested in are detected. This research is carried out in a multilingual environment in the context of several recent (OLIVE) and ongoing (ALERT, ECHO) European and national (THEOREME, AUDIOSURF) projects. A characteristic of the broadcast news domain is that, at least for what concerns major news events, similar topics are simultaneously covered in different emissions and in different countries and languages. Multilinguality is thus of particular interest for media watch applications, since news may first break in a foreign country or language. We have developed broadcast news transcription systems for the American English, Arabic, French, German, Mandarin, Portuguese and Spanish languages. The system can transcribe unrestricted American English broadcast news data with word error rates under 20%. Our transcription systems for French, German, and Spanish have comparable error rates for news broadcasts. The Portuguese system being trained on only a small amount of data has a word error of about 37%. The character error rate for Mandarin is also about 20% as is the word error for Arabic without counting errors on vowels or geminates. Based on our experience, it appears that with appropriately trained models, recognizer performance is more dependent upon the type and source of data, than on the language. For example, documentaries are particularly challenging to transcribe, as the audio quality is often not very high, and there is a large proportion of voice over. Even with the higher word error rates obtained by running a fast transcription system or by transcribing compressed audio data (such as that can be loaded over the Internet), the IR performance remains quite good. Topic trackingKeeping aware of information is of strategic importance for many industrial companies, government and security agencies. With the rapid expansion of different media sources (newswires, radio, television, internet) for information dissemination, there is a strong demand for monitoring these sources and an increasing need for automatic processing of the data in order to allow a larger the number of information sources to be monitored. Recent progress in automatic speech recognition (ASR) technology, both in terms of speed and accuracy, has led to systems which can automatically transcribe, index and search radio and television broadcast news. These techniques can be applied in media-watch applications (IST-1999-10354 ALERT) as well as used to structure multimedia digital libraries (IST-1999-11994 ECHO). A topic tracking system was developed which relies on a unigram topic model. A topic is defined by a set of topic related audio and/or textual documents. These documents are used to train a topic model, which is used to locate on-topic documents in an incoming stream. The flow of documents is segmented into stories, and each story is compared to the topic model to decide if it is on- or off-topic. The similarity measure of the incoming document is the normalized likelihood ratio between the topic model and a general English model.We participated in the NIST Topic Detection and Tracking (TDT2001) evaluation, for the topic tracking task. For this task a small set of on-topic stories (one to four) are given for training and the system has to decide for each incoming story whether it is on- or off-topic. One of the difficulties of the this task is that only a very limited amount of information about the topic may be available in the training data, in particular when there is only one training story. The amount of information also varies across stories and topics: some stories contain fewer than 20 terms after stopping and stemming, whereas others may contain on the order of 300 terms. In order to compensate for the small amount of data available for estimating the on-topic model, we make use document expansion techniques relying on external information sources like past news, in conjunction with unsupervised online adaptation techniques to update the on-topic model with information obtained from the test data itself. In this case, the topic model is adapted by adding incoming stories identified as on-topic by the system as long as the stories have a similarity score higher than an adaptation threshold. Compared with a baseline unigram tracker, document expansion reduces the tracking cost by 23%, and online unsupervised confidence score weighted adaptation reduces the tracking cost by 54%.
Last modified: Friday,18-February-11 02:58:37 CET |
|||||||||||||||||||||||