Audio Visual Automatic Speech Recognition systems use visual
information to enhance ASR systems in clean and noisy environments. This paper
investigates of a number of different visual feature extraction methods. It was observed
that when performing visual speech recognition the visual feature vector requires a base
level of detail for improved recognition. Geometric feature extraction provides lower
recognition than pixel based methods due to the loss of characteristic speech information
such as €tuck, protrusion etc. Downsampling of images reduces visual recognition
scores due to the loss of &tail in the images. Also, the role of dynamic features was
investigated for improved recognition. It was observed that static features alone
outperform a combination of both static and dynamic features when restricting he
dimension of the feature vector e.g. 50. This illustrates that the need for a certain level of
detail in visual speech recognition is a higher priority than dynamic information. Once
this base level of detail is attained the dynamic features should then be able to improve
the recognition rate.