Inventors:
Kenneth C. Scott - La Crescenta CA
Matthew C. Yeates - Montrose CA
David S. Kagels - Pasadena CA
Stephen Hilary Watson - Pasadena CA
Assignee:
California Institute of Technology - Pasadena CA
International Classification:
G06T 1300
Abstract:
A method and apparatus for synthesizing speech or facial movements to match selected speech sequences. A videotape of an arbitrary text sequence is obtained including a plurality of images of a user speaking various sequences. Video images corresponding to specific spoken phonemes are obtained. A video frame is digitized from that sequence which represents the extreme of mouth motion and shape. This is used to create a database of images of different facial positions relative to spoken phonemes and diphthongs. An audio speech sequence is then used as the element to which a video sequence will be matched. The audio sequence is analyzed to determine spoken phoneme sequences and relative timings. The database is used to obtain images for each of these phonemes and these times, and morphing techniques are used to create transitions between the images. Different parts of the images can be processed in different ways to make a more realistic speech pattern.