Tuesday, Oct 18, 2016

Prof. Li-Hai Tan

Wednesday, Oct 19, 2016

Prof. Alan W. Black

Thursday, Oct 20, 2016

Prof. Frank K. Soong

Prof. Li-Hai Tan
Shenzhen Institute of Neuroscience and Shenzhen University School of Medicine
Brain Plasticity and Language Processing
Neuroplasticity refers to the brain's ability to reorganize itself by forming new neural pathways due to changes in behavior, cognition, and environment. Contrary to the conventional wisdom, recent neuroscience research indicates that experience can quickly change both the human adult brain's physical structure (anatomy) and functional organization (physiology). In this talk I will summarize structural and functional MRI evidence from the studies of language to show that how the anatomical and functional networks of the normal adult brains change in response to language learning. Evidence from the brain-damaged patients will also be reviewed, suggesting that cortical dynamics adjust their activities to compensate for injury and disease. The promises and challenges of translating basic neuroimaging research into clinical practice require us to carefully investigate individual- and culture-related neuroplasticity in order to preserve critical life skills such as language and motor for patients after the neurosurgery.
Biography: Prof. Li-Hai Tan is Director of Shenzhen Institute of Neuroscience, Distinguished University Professor of Shenzhen University Health Science Center, Chief Scientist of National Basic Research Program of China (973 program) (2012-2016). He received his Ph.D. in psycholinguistics from the University of Hong Kong in 1995. Following post-doctoral research training in Learning Research and Development Center of the University of Pittsburgh, he worked in University of Hong Kong during 1999-2014, where he was tenured professor in 2007. Prof. Tan has performed research in the field of psycholinguistics and neuroscience at the University of Hong Kong, the Research Imaging Center of the University of Texas Health Science Center, University of Pittsburgh, Intramural Research Programs of the National Institute of Mental Health of NIH, and Chinese Academy of Sciences. He founded the State Key Laboratory of Brain and Cognitive Sciences at the University of Hong Kong in 2005 and served as its director until 2014. He has served as an associate editor of the journal Human Brain Mapping, and is now an editorial board member of the following journals: Human Brain Mapping, Neuroscience, Journal of Neurolinguistics, Culture and Brain, and Contemporary Linguistics. His research interests include the neuroimaging study of language andtranslating the basic research findings into clinical practice.

[Back to Top]

Prof. Alan W Black
Carnegie Mellon University, USA
Speech Processing for Unwritten Languages
Current speech processing techniques expect a well-defined writing system in order to define what text may be used for speech-to-text and text-to-speech systems. However, most languages do not have a well-defined writing system, and it is common that literacy may be in a different language than is spoken on a daily basis. Such distinctions are sometimes considered dialect variations (e.g. Modern Standard Arabic vs Arabic dialects, or Standard Mandarin vs Chinese dialects), but also different languages such as Konkani (spoken in India) or many Native American Languages.

This talk describes techniques to derive symbolic representations of speech signals for languages with no or only poorly defined writing systems. These techniques may be evaluated within large speech and language systems such as speech translation, spoken dialog systems, or information retrieval.
Biography: Alan W Black is a Professor in the Language Technologies Institute in the School of Computer Science at Carnegie Mellon University. He was born in Edinburgh, Scotland, and did his bachelors in Coventry, England, and his masters and doctorate at the University of Edinburgh. Before joining the faculty at CMU in 1999, he worked in the Centre for Speech Technology Research at the University of Edinburgh, and before that at ATR in Japan. He is one of the principal authors of the free software Festival Speech Synthesis System, the FestVox voice building tools and CMU Flite, a small footprint speech synthesis engine, that is the basis for many research and commercial systems around the world. He also works in spoken dialog systems, the LetsGo Bus Information project and mobile speech-to-speech translation systems. Prof Black was an elected member of ISCA board (2007-2015). He has over 200 refereed publications and is one of the highest cited authors in his field.

[Back to Top]

Prof. Frank K. Soong
Speech Group, Microsoft Research Asia (MSRA), Beijing, China
Rendering Speech Across Speaker and Language Barriers
As a person’s speech is strongly conditioned by his own articulatory characteristics and the language he speaks, it is academically attractive and technically challenging to investigate how to render speech inter-speaker and inter-language wise. The performance of rendering quality can be assessed in three criteria used for evaluating generic TTS: naturalness, intelligibility and similarity to the original speaker. In many earlier attempts, all three quality criteria cannot be satisfied together easily when rendering is done both cross-speaker and cross-language. In this talk we will analyze the key factors which cause high quality rendering difficult in both the acoustic and phonetic domains. Speech databases in the same language but recorded by different speakers or bilingual speech databases recorded by the same speaker(s) are used. Both acoustic and phonetic measures are adopted to quantify naturalness, intelligibility and speaker’s timber objectively. Our “trajectory tiling” algorithm-based, cross-lingual TTS is used as the baseline system for comparison. To equalize speaker difference automatically, DNN-based ASR acoustic model trained speaker independently is used. Kullback-Leibler divergence are proposed to measure statistically the phonetic similarity between any two given speech segments, which are from two different speakers/languages, in order to select good rendering candidates. Demos will be given to show various rendering results either inter-speaker or inter-language wise, or both.
Biography: Frank K. Soong is a Principal Researcher and Research Manager, Speech Group, Microsoft Research Asia (MSRA), Beijing, China, where he works on fundamental research on speech and its practical applications. His professional research career spans over 30 years, first with Bell Labs, US, then ATR, Japan, before joining MSRA in 2004. At Bell Labs, he worked on stochastic modeling of speech signals, optimal decoding algorithm, speech analysis and coding, speech and speaker recognition. He was responsible for developing the recognition algorithm which was developed into voice-activated mobile phone products rated by the Mobile Office Magazine (Apr. 1993) as the “outstandingly the best”. He is a co-recipient of the Bell Labs President Gold Award for developing the Bell Labs Automatic Speech Recognition (BLASR) system. He has served as a member of the Speech and Language Technical Committee, IEEE Signal Processing Society and other society functions, including Associate Editor of the IEEE Speech and Audio Transactions and chairing IEEE Workshop. He published extensively with more than200 papers and co-edited a widely used reference book, Automatic Speech and Speaker Recognition - Advanced Topics, Kluwer, 1996. He is a visiting professor of the Chinese University of Hong Kong (CUHK) and a few other top-rated universities in China. He is also the co-Director of the National MSRA-CUHK Joint Research Lab. He got his BS, MS and PhD from National Taiwan Univ., Univ. of Rhode Island, and Stanford Univ., all in Electrical Eng. He is an IEEE Fellow “for contributions to digital processing of speech”.

[Back to Top]