The results obtained confirm the theory that the addition of the
visual mortality increases the accuracy of ASR systems. It was
observed that, while both the audio and visual recognition
systems may fail to recognize a particular phoneme, the
combined audio-visual recognition will be more likely to
succeed. This observation is attributed to the fact the audio visual
integration scheme maximize the output probabilities of both
modalities. It was also observed that highly confusable audio
phonemes were recognised easier by the visual modality e.g. /f/
and /th/ and the same applied for visemes e.g. /w/ and /r/. The
preliminary results of our AVASR system while meeting the
synergy requirements will be improved using more training data.