TOHOKU UNIVERSITY Researchers

Details of the Researcher

Home

日本語 English

Takashi Nose

Section

Graduate School of Engineering

Job title

Associate Professor

Degree

博士（工学） (Tokyo Institute of Technology)

researchmap

https://researchmap.jp/7000007028

J-GLOBAL ID

201401047826406636

ORCID

https://orcid.org/0000-0002-2278-0429

Education 1

Tokyo Institute of Technology　Graduate School, Division of Integrated Science and Engineering　物理情報システム専攻

2006/04 - 2009/03

Committee Memberships 5

日本音響学会東北支部　会計幹事

2024/04 - Present
日本音響学会東北支部　会計監査

2018/04 - 2020/03
日本音響学会東北支部　庶務幹事

2016/04 - 2018/03
日本音響学会東北支部　会計幹事

2014/04 - 2016/03
音声研究会　幹事補佐

2014/04 - 2016/03

Professional Memberships 5

ISCA
情報処理学会
音響学会
電子情報通信学会
IEEE

Research Interests 7

マルチメディア情報処理
音楽情報処理
音声符号化
音声対話
音声認識
音声合成
音声情報処理

Research Areas 2

Informatics / Intelligent robotics /
Informatics / Perceptual information processing /

Papers 161

The Development of an Emotional Embodied Conversational Agent and the Evaluation of the Effect of Response Delay on User Impression

Simon Christophe Jolibois, Akinori Ito, Takashi Nose

Applied Sciences　2025/04/11

DOI： 10.3390/app15084256 　
Adaptive Fine-Grained Pruning via Binary Search for Efficient Environmental Sound Classification

Changlong Wang, Akinori Ito, Takashi Nose

IEEE Access　2025

DOI： 10.1109/ACCESS.2025.3617879 　
Generation of Listening Motion of Embodied Conversational Agents Using Speech and Text Information

Haruki Ito, Akinori Ito, Takashi Nose

2025

DOI： 10.1007/978-3-032-05994-9_10 　
Unified model for voice conversion of speech and singing voice using adaptive pitch constraints

Shogo Fukawa, Takashi Nose, Shuhei Imai, Akinori Ito

Acoustical Science and Technology　46　(1)　120-123　2025/01/01
Publisher: Acoustical Society of Japan
DOI： 10.1250/ast.e24.47 　

ISSN： 1346-3969

eISSN： 1347-5177
We open our mouths when we are silent

Shoki Kawanishi, Yuya Chiba, Akinori Ito, Takashi Nose

Acoustical Science and Technology　46　(1)　96-99　2025/01/01
Publisher: Acoustical Society of Japan
DOI： 10.1250/ast.e24.21 　

ISSN： 1346-3969

eISSN： 1347-5177
Selection of key sentences from lecture video transcription and its application to feedback to the learner

Miki Takeuchi, Akinori Ito, Takashi Nose

Proceedings of the 2024 8th International Conference on Education and Multimedia Technology　218-223　2024/06/22
Publisher: ACM
DOI： 10.1145/3678726.3678733 　
Character Expressions in Meta-Learning for Extremely Low Resource Language Speech Recognition

Rui Zhou, Akinori Ito, Takashi Nose

Proceedings of the 2024 16th International Conference on Machine Learning and Computing　2024/02/02
Publisher: ACM
DOI： 10.1145/3651671.3651730 　
Evaluation of Environmental Sound Classification using Vision Transformer

Changlong Wang, Akinori Ito, Takashi Nose, Chia-Ping Chen

Proceedings of the 2024 16th International Conference on Machine Learning and Computing　665-669　2024/02/02
Publisher: ACM
DOI： 10.1145/3651671.3651733 　
Toward Photo-Realistic Facial Animation Generation Based on Keypoint Features

Zikai Shu, Takashi Nose, Akinori Ito

Proceedings of the 2024 16th International Conference on Machine Learning and Computing　39　334-339　2024/02/02
Publisher: ACM
DOI： 10.1145/3651671.3651731 　
Scheduled Curiosity-Deep Dyna-Q: Efficient Exploration for Dialog Policy Learning

Niu, X., Ito, A., Nose, T.

IEEE Access　12　2024/01/31

DOI： 10.1109/ACCESS.2024.3376418 　

ISSN： 2169-3536
Simultaneous Adaptation of Acoustic and Language Models for Emotional Speech Recognition Using Tweet Data

Kosaka, T., Saeki, K., Aizawa, Y., Kato, M., Nose, T.

IEICE Transactions on Information and Systems　E107.D　(3)　2024

DOI： 10.1587/transinf.2023HCP0010 　

ISSN： 1745-1361 0916-8532
A Replaceable Curiosity-Driven Candidate Agent Exploration Approach for Task-Oriented Dialog Policy Learning

Niu, X., Ito, A., Nose, T.

IEEE Access　12　2024

DOI： 10.1109/ACCESS.2024.3462719 　

ISSN： 2169-3536
Multilingual Meta-Transfer Learning for Low-Resource Speech Recognition

Zhou, R., Koshikawa, T., Ito, A., Nose, T., Chen, C.-P.

IEEE Access　2024

DOI： 10.1109/ACCESS.2024.3486711 　

ISSN： 2169-3536
Fast end-to-end non-parallel voice conversion based on speaker-adaptive neural vocoder with cycle-consistent learning

Shuhei Imai, Aoi Kanagaki, Takashi Nose, Shogo Fukawa, Akinori Ito

Acoustical Science and Technology　2024
Publisher: Acoustical Society of Japan
DOI： 10.1250/ast.e24.46 　

ISSN： 1346-3969

eISSN： 1347-5177
Multimodal Expressive Embodied Conversational Agent Design

Simon Jolibois, Akinori Ito, Takashi Nose

Communications in Computer and Information Science　244-249　2023/07/09
Publisher: Springer Nature Switzerland
DOI： 10.1007/978-3-031-35989-7_31 　

ISSN： 1865-0929

eISSN： 1865-0937
Effect of Data Size and Machine Translation on the Accuracy of Automatic Personality Classification

Yuki Fukazawa, Akinori Ito, Takashi Nose

Advances in Intelligent Information Hiding and Multimedia Signal Processing　405-413　2023/05/24
Publisher: Springer Nature Singapore
DOI： 10.1007/978-981-99-0105-0_36 　

ISSN： 2190-3018

eISSN： 2190-3026
Spoken term detection from utterances of minority languages

Ito, A., Mizuochi, S., Nose, T.

Issues in Japanese Psycholinguistics from Comparative Perspectives: Volume 1: Cross-Linguistic Studies　2023

DOI： 10.1515/9783110778946-014 　
Response Sentence Modification Using a Sentence Vector for a Flexible Response Generation of Retrieval-based Dialogue Systems

Ryota Yahagi, Akinori Ito, Takashi Nose, Yuya Chiba

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)　2022/11/07
Publisher: IEEE
DOI： 10.23919/apsipaasc55919.2022.9979841 　
Design and Construction of Japanese Multimodal Utterance Corpus with Improved Emotion Balance and Naturalness

Daisuke Horii, Akinori Ito, Takashi Nose

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)　2022/11/07
Publisher: IEEE
DOI： 10.23919/apsipaasc55919.2022.9980272 　
Multimodal Dialogue Response Timing Estimation Using Dialogue Context Encoder

Ryota Yahagi, Yuya Chiba, Takashi Nose, Akinori Ito

Lecture Notes in Electrical Engineering　133-141　2022/11/01
Publisher: Springer Nature Singapore
DOI： 10.1007/978-981-19-5538-9_9 　

ISSN： 1876-1100

eISSN： 1876-1119
Spoken Term Detection of Zero-Resource Language Using Posteriorgram of Multiple Languages

Satoru MIZUOCHI, Takashi NOSE, Akinori ITO

Interdisciplinary Information Sciences　28　(1)　1-13　2022
Publisher: Graduate School of Information Sciences, Tohoku University
DOI： 10.4036/iis.2022.a.04 　

ISSN： 1340-9050

eISSN： 1347-6157
Analysis of Feature Extraction by Convolutional Neural Network for Speech Emotion Recognition

Daisuke Horii, Akinori Ito, Takashi Nose

2021 IEEE 10th Global Conference on Consumer Electronics (GCCE)　2021/10/12
Publisher: IEEE
DOI： 10.1109/gcce53005.2021.9621964 　
Improvement of Automatic English Pronunciation Assessment with Small Number of Utterances Using Sentence Speakability

Satsuki Naijo, Akinori Ito, Takashi Nose

Interspeech 2021　2021/08/30
Publisher: ISCA
DOI： 10.21437/interspeech.2021-1132 　
Neural Spoken-Response Generation Using Prosodic and Linguistic Context for Conversational Systems

Yoshihiro Yamazaki, Yuya Chiba, Takashi Nose, Akinori Ito

Interspeech 2021　2021/08/30
Publisher: ISCA
DOI： 10.21437/interspeech.2021-381 　
SMOC corpus: A large-scale Japanese spontaneous multimodal one-on-one chat-talk corpus for dialog systems

Yoshihiro Yamazaki, Yuya Chiba, Takashi Nose, Akinori Ito

Acoustical Science and Technology　42　(4)　210-213　2021/07/01
Publisher: Acoustical Society of Japan
DOI： 10.1250/ast.42.210 　

ISSN： 1346-3969

eISSN： 1347-5177
CycleGAN-Based High-Quality Non-Parallel Voice Conversion with Spectrogram and WaveRNN

Aoi Kanagaki, Masaya Tanaka, Takashi Nose, Ryohei Shimizu, Akira Ito, Akinori Ito

2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020　356-357　2020/10/13

DOI： 10.1109/GCCE50665.2020.9291952 　
Incremental response generation using prefix-to-prefix model for dialogue system

Ryota Yahagi, Yuya Chiba, Takashi Nose, Akinori Ito

2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020　349-350　2020/10/13

DOI： 10.1109/GCCE50665.2020.9291883 　
A study on minimum spectral error analysis of speech

Takuma Hayasaka, Takashi Nose, Akinori Ito

2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020　362-363　2020/10/13

DOI： 10.1109/GCCE50665.2020.9291840 　
Filler prediction based on bidirectional LSTM for generation of natural response of spoken dialog

Yoshihiro Yamazaki, Yuya Chiba, Takashi Nose, Akinori Ito

2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020　360-361　2020/10/13

DOI： 10.1109/GCCE50665.2020.9291867 　
Successive Japanese lyrics generation based on encoder-decoder model

Rikiya Takahashi, Takashi Nose, Yuya Chiba, Akinori Ito

2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020　126-127　2020/10/13

DOI： 10.1109/GCCE50665.2020.9291718 　
Analysis and Estimation of Sentence Speakability for English Pronunciation Evaluation

Satsuki Naijo, Yuya Chiba, Takashi Nose, Akinori Ito

2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020　353-355　2020/10/13

DOI： 10.1109/GCCE50665.2020.9292072 　
LJSing: large-scale singing voice corpus of single Japanese singer

Takuto Fujimura, Takashi Nose, Akinori Ito

2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020　364-365　2020/10/13

DOI： 10.1109/GCCE50665.2020.9291704 　
Improving Pronunciation Clarity of Dysarthric Speech Using CycleGAN with Multiple Speakers

Shuhei Imai, Takashi Nose, Aoi Kanagaki, Satoshi Watanabe, Akinori Ito

2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020　366-367　2020/10/13

DOI： 10.1109/GCCE50665.2020.9292041 　
Spoken term detection based on acoustic models trained in multiple languages for zero-resource language

Satoru Mizuochi, Yuya Chiba, Takashi Nose, Akinori Ito

2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020　351-352　2020/10/13

DOI： 10.1109/GCCE50665.2020.9291761 　
Integration of accent sandhi and prosodic features estimation for japanese text-to-speech synthesis

Daisuke Fujimaki, Takashi Nose, Akinori Ito

2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020　358-359　2020/10/13

DOI： 10.1109/GCCE50665.2020.9291906 　
Language modeling in speech recognition for grammatical error detection based on neural machine translation

Jiang Fu, Yuya Chiba, Takashi Nose, Akinori Ito

Acoustical Science and Technology　41　(5)　788-791　2020/09/01
Publisher: Acoustical Society of Japan
DOI： 10.1250/ast.41.788 　

ISSN： 1346-3969

eISSN： 1347-5177
Scyclone: High-Quality and Parallel-Data-Free Voice Conversion Using Spectrogram and Cycle-Consistent Adversarial Networks

Masaya Tanaka, Takashi Nose, Aoi Kanagaki, Ryohei Shimizu, Akira Ito

2020/05/07

More details Close

This paper proposes Scyclone, a high-quality voice conversion (VC) technique without parallel data training. Scyclone improves speech naturalness and speaker similarity of the converted speech by introducing CycleGAN-based spectrogram conversion with a simplified WaveRNN-based vocoder. In Scyclone, a linear spectrogram is used as the conversion features instead of vocoder parameters, which avoids quality degradation due to extraction errors in fundamental frequency and voiced/unvoiced parameters. The spectrogram of source and target speakers are modeled by modified CycleGAN networks, and the waveform is reconstructed using the simplified WaveRNN with a single Gaussian probability density function. The subjective experiments with completely unpaired training data show that Scyclone is significantly better than CycleGAN-VC2, one of the existing state-of-the-art parallel-data-free VC techniques.
Automatic assessment of English proficiency for Japanese learners without reference sentences based on deep neural network acoustic models

Jiang Fu, Yuya Chiba, Takashi Nose, Akinori Ito

Speech Communication　116　86-97　2020/01

DOI： 10.1016/j.specom.2019.12.002 　

ISSN： 0167-6393
A symbol-level melody completion based on a convolutional neural network with generative adversarial learning

Kosuke Nakamura, Takashi Nose, Yuya Chiba, Akinori Ito

Journal of Information Processing　28　248-257　2020

DOI： 10.2197/ipsjjip.28.248 　

ISSN： 0387-5806

eISSN： 1882-6652
Construction and analysis of a multimodal chat-talk corpus for dialog systems considering interpersonal closeness

Yoshihiro Yamazaki, Yuya Chiba, Takashi Nose, Akinori Ito

LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings　443-448　2020
Multi-stream attention-based BLSTM with feature segmentation for speech emotion recognition

Yuya Chiba, Takashi Nose, Akinori Ito

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH　2020-October　3301-3305　2020

DOI： 10.21437/Interspeech.2020-1199 　

ISSN： 2308-457X

eISSN： 1990-9772
Developing a Multi-Platform Speech Recording System Toward Open Service of Building Large-Scale Speech Corpora

Keita Ishizuka, Takashi Nose

2019/12/19

More details Close

This paper briefly reports our ongoing attempt at the development of a multi-platform browser-based speech recording system. We designed the system toward a service of providing open service of building large-scale speech corpora at a low-cost for any researchers and developers related to speech processing. The recent increase in the use of crowdsourcing services, e.g., Amazon Mechanical Turk, enable us to reduce the cost of collecting speakers in the web, and there have been many attempts to develop the automated speech collecting platforms or application that is designed for the use the crowdsourcing. However, one of the major problems in the previous studies and developments for the attempts is that most of the systems are not a form of common service of speech recording and corpus building, and each corpus builder is necessary to develop the system in their own environment including a web server. For this problem, we develope a new platform where both the corpus builders and recording participants can commonly use a single system and service by creating their user accounts. A brief introduction of the system is given in this paper as the start of this challenge.
Improving human scoring of prosody using parametric speech synthesis Peer-reviewed

Prafianto, H., Nose, T., Chiba, Y., Ito, A.

Speech Communication　111　14　2019/08
Publisher: Elsevier {BV}
DOI： 10.1016/j.specom.2019.06.001 　

ISSN： 0167-6393
Multi-condition training for noise-robust speech emotion recognition

Yuya Chiba, Takashi Nose, Akinori Ito

Acoustical Science and Technology　40　(6)　406-409　2019

DOI： 10.1250/ast.40.406 　

ISSN： 1346-3969

eISSN： 1347-5177
Evaluation of English Speech Recognition for Japanese Learners Using DNN-Based Acoustic Models Peer-reviewed

Jiang Fu, Yuya Chiba, Takashi Nose, Akinori Ito

Smart Innovation, Systems and Technologies　110　93-100　2019/01
Comparison of Speech Recognition Performance Between Kaldi and Google Cloud Speech API Peer-reviewed

Takashi Kimura, Takashi Nose, Shinji Hirooka, Yuya Chiba, Akinori Ito

Smart Innovation, Systems and Technologies　110　109-115　2019/01
Segmental Pitch Control Using Speech Input Based on Differential Contexts and Features for Customizable Neural Speech Synthesis Peer-reviewed

Shinya Hanabusa, Takashi Nose, Akinori Ito

Smart Innovation, Systems and Technologies　110　124-131　2019/01
Melody Completion Based on Convolutional Neural Networks and Generative Adversarial Learning Peer-reviewed

Kosuke Nakamura, Takashi Nose, Yuya Chiba, Akinori Ito

Smart Innovation, Systems and Technologies　110　116-123　2019/01
Two-Stage Sequence-to-Sequence Neural Voice Conversion with Low-to-High Definition Spectrogram Mapping Peer-reviewed

Sou Miyamoto, Takashi Nose, Kazuyuki Hiroshiba, Yuri Odagiri, Akinori Ito

Smart Innovation, Systems and Technologies　110　132-139　2019/01
DNN-Based Talking Movie Generation with Face Direction Consideration Peer-reviewed

Toru Ishikawa, Takashi Nose, Akinori Ito

Smart Innovation, Systems and Technologies　110　157-164　2019/01
A Study on a Spoken Dialogue System with Cooperative Emotional Speech Synthesis Using Acoustic and Linguistic Information Peer-reviewed

Mai Yamanaka, Yuya Chiba, Takashi Nose, Akinori Ito

Smart Innovation, Systems and Technologies　110　101-108　2019/01
Improvement of accent sandhi rules based on Japanese accent dictionaries Peer-reviewed

Hiroto Aoyama, Takashi Nose, Yuya Chiba, Akinori Ito

Smart Innovation, Systems and Technologies　110　140-148　2019/01

DOI： 10.1007/978-3-030-03748-2_17 　

ISSN： 2190-3018
Data collection and analysis for automatically generating record of human behaviors by environmental sound recognition Peer-reviewed

Takahiro Furuya, Yuya Chiba, Takashi Nose, Akinori Ito

Smart Innovation, Systems and Technologies　110　149-156　2019/01/01

DOI： 10.1007/978-3-030-03748-2_18 　

ISSN： 2190-3018
Effect of mutual self-disclosure in spoken dialog system on user impression Peer-reviewed

Shunsuke Tada, Yuya Chiba, Takashi Nose, Akinori Ito

Proceedings of 2018 APSIPA-ASC　806-810　2018/11
Improving User Impression in Spoken Dialog System with Gradual Speech Form Control. Peer-reviewed

Yukiko Kageyama, Yuya Chiba, Takashi Nose, Akinori Ito

Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, Melbourne, Australia, July 12-14, 2018　235-240　2018/07
Publisher: Association for Computational Linguistics
An Analysis of the Effect of Emotional Speech Synthesis on Non-Task-Oriented Dialogue System. Peer-reviewed

Yuya Chiba, Takashi Nose, Taketo Kase, Mai Yamanaka, Akinori Ito

Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, Melbourne, Australia, July 12-14, 2018　371-375　2018/07
Publisher: Association for Computational Linguistics
Analyses of example sentences collected by conversation for example-based non-task-oriented dialog system Peer-reviewed

Kageyama, Y., Chiba, Y., Nose, T., Ito, A.

IAENG International Journal of Computer Science　45　(2)　285-293　2018/05

ISSN： 1819-9224 1819-656X
Analyzing effect of physical expression on English proficiency for multimodal computer-assisted language learning Peer-reviewed

Haoran Wu, Yuya Chiba, Takashi Nose, Akinori Ito

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH　2018-September　1746-1750　2018/01/01
Publisher: ISCA
DOI： 10.21437/Interspeech.2018-1425 　

ISSN： 2308-457X
Analysis of preferred speaking rate and pause in spoken Easy Japanese for non-native listeners Peer-reviewed

Hafiyan Prafiyanto, Takashi Nose, Yuya Chiba, Akinori Ito

Acoustical Science and Technology　39　92-100　2018/01/01

DOI： 10.1250/ast.39.92 　

ISSN： 1346-3969
Dialog-based interactive movie recommendation: Comparison of dialog strategies Peer-reviewed

Hayato Mori, Yuya Chiba, Takashi Nose, Akinori Ito

Smart Innovation, Systems and Technologies　82　77-83　2018
Publisher: Springer Science and Business Media Deutschland GmbH
DOI： 10.1007/978-3-319-63859-1_10 　

ISSN： 2190-3026 2190-3018

eISSN： 2190-3026
A study on 2D photo-realistic facial animation generation using 3D facial feature points and deep neural networks Peer-reviewed

Kazuki Sato, Takashi Nose, Akira Ito, Yuya Chiba, Akinori Ito, Takahiro Shinozaki

Smart Innovation, Systems and Technologies　82　113-118　2018
Publisher: Springer Science and Business Media Deutschland GmbH
DOI： 10.1007/978-3-319-63859-1_15 　

ISSN： 2190-3026 2190-3018
Voice conversion from arbitrary speakers based on deep neural networks with adversarial learning Peer-reviewed

Sou Miyamoto, Takashi Nose, Suzunosuke Ito, Harunori Koike, Yuya Chiba, Akinori Ito, Takahiro Shinozaki

Smart Innovation, Systems and Technologies　82　97-103　2018
Publisher: Springer Science and Business Media Deutschland GmbH
DOI： 10.1007/978-3-319-63859-1_13 　

ISSN： 2190-3026 2190-3018

eISSN： 2190-3026
Response selection of interview-based dialog system using user focus and semantic orientation Peer-reviewed

Shunsuke Tada, Yuya Chiba, Takashi Nose, Akinori Ito

Smart Innovation, Systems and Technologies　82　84-90　2018
Publisher: Springer Science and Business Media Deutschland GmbH
DOI： 10.1007/978-3-319-63859-1_11 　

ISSN： 2190-3026 2190-3018

eISSN： 2190-3026
Development and evaluation of julius-compatible interface for Kaldi ASR Peer-reviewed

Yusuke Yamada, Takashi Nose, Yuya Chiba, Akinori Ito, Takahiro Shinozaki

Smart Innovation, Systems and Technologies　82　91-96　2018
Publisher: Springer Science and Business Media Deutschland GmbH
DOI： 10.1007/978-3-319-63859-1_12 　

ISSN： 2190-3026 2190-3018

eISSN： 2190-3026
Detection of singing mistakes from singing voice Peer-reviewed

Isao Miyagawa, Yuya Chiba, Takashi Nose, Akinori Ito

Smart Innovation, Systems and Technologies　82　130-136　2018
Publisher: Springer Science and Business Media Deutschland GmbH
DOI： 10.1007/978-3-319-63859-1_17 　

ISSN： 2190-3026 2190-3018

eISSN： 2190-3026
Evaluation of nonlinear tempo modification methods based on sinusoidal modeling Peer-reviewed

Kosuke Nakamura, Yuya Chiba, Takashi Nose, Akinori Ito

Smart Innovation, Systems and Technologies　82　104-111　2018
Publisher: Springer Science and Business Media Deutschland GmbH
DOI： 10.1007/978-3-319-63859-1_14 　

ISSN： 2190-3026 2190-3018

eISSN： 2190-3026
Analysis of Efficient Multimodal Features for Estimating User’s Willingness to Talk: Comparison of Human-Machine and Human-Human Dialog Peer-reviewed

Yuya Chiba, Takashi Nose, Akinori Ito

Proceeding of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference　2018-February　1-4　2017/12/13

DOI： 10.1109/APSIPA.2017.8282069 　
HMM-Based Photo-Realistic Talking Face Synthesis Using Facial Expression Parameter Mapping with Deep Neural Networks Peer-reviewed

Kazuki Sato, Takashi Nose, Akinori Ito

Journal of Computer and Communications　5　(10)　55-65　2017/08

DOI： 10.4236/jcc.2017.510006 　
日常音識別による活動記録自動生成のためのデータの収集と分析

古谷崇拓, 千葉祐弥, 能勢隆, 伊藤彰則

情報処理学会研究報告　1-6　2017/06/17
Cluster-based approach to discriminate the user's state whether a user is embarrassed or thinking to an answer to a prompt Peer-reviewed

Yuya Chiba, Takashi Nose, Akinori Ito

JOURNAL ON MULTIMODAL USER INTERFACES　11　(2)　185-196　2017/06

DOI： 10.1007/s12193-017-0238-y 　

ISSN： 1783-7677

eISSN： 1783-8738
Sentence Selection Based on Extended Entropy Using Phonetic and Prosodic Contexts for Statistical Parametric Speech Synthesis Peer-reviewed

Takashi Nose, Yusuke Arao, Takao Kobayashi, Komei Sugiura, Yoshinori Shiga

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING　25　(5)　1107-1116　2017/05

DOI： 10.1109/TASLP.2017.2688585 　

ISSN： 2329-9290

eISSN： 2329-9304
Dimensional paralinguistic information control based on multiple-regression HSMM for spontaneous dialogue speech synthesis with robust parameter estimation Peer-reviewed

Tomohiro Nagata, Hiroki Mori, Takashi Nose

SPEECH COMMUNICATION　88　137-148　2017/04

DOI： 10.1016/j.specom.2017.01.002 　

ISSN： 0167-6393

eISSN： 1872-7182
A Study on Tailor-Made Speech Synthesis Based on Deep Neural Networks Peer-reviewed

Shuhei Yamada, Takashi Nose, Akinori Ito

ADVANCES IN INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, VOL 1　63　159-166　2017

DOI： 10.1007/978-3-319-50209-0_20 　

ISSN： 2190-3018
Synthesis of Photo-Realistic Facial Animation from Text Based on HMM and DNN with Animation Unit Peer-reviewed

Kazuki Sato, Takashi Nose, Akinori Ito

ADVANCES IN INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, VOL 2　64　29-36　2017

DOI： 10.1007/978-3-319-50212-0_4 　

ISSN： 2190-3018
Development of an Easy Japanese Writing Support System with Text-to-Speech Function Peer-reviewed

Takeshi Nagano, Hafiyan Prafianto, Takashi Nose, Akinori Ito

ADVANCES IN INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, VOL 2　64　221-228　2017

DOI： 10.1007/978-3-319-50212-0_27 　

ISSN： 2190-3018
クロスリンガル音声合成のための共有決定木コンテクストクラスタリングを用いた話者適応 Peer-reviewed

長濱大樹, 能勢隆, 郡山知樹, 小林隆夫

電子情報通信学会論文誌D　J100-D　(3)　385-393　2017
統計モデルに基づく多様な音声の合成技術 Peer-reviewed

能勢隆

電子情報通信学会論文誌D　J100-D　(4)　556-569　2017
Collection of example sentences for non-task-oriented dialog using a spoken dialog system and comparison with hand-crafted DB Peer-reviewed

Yukiko Kageyama, Yuya Chiba, Takashi Nose, Akinori Ito

Communications in Computer and Information Science　713　458-464　2017
Publisher: Springer Verlag
DOI： 10.1007/978-3-319-58750-9_63 　

ISSN： 1865-0929
Construction and analysis of phonetically and prosodically balanced emotional speech database Peer-reviewed

Takeishi, E, Nose, T, Chiba, Y, Ito, A

2016 Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques, O-COCOSDA 2016　16-21　2016/10

DOI： 10.1109/ICSDA.2016.7918977 　
Efficient Implementation of Global Variance Compensation for Parametric Speech Synthesis Peer-reviewed

Takashi Nose

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING　24　(10)　1694-1704　2016/10

DOI： 10.1109/TASLP.2016.2580298 　

ISSN： 2329-9290
Estimating the user's state before exchanging utterances using intermediate acoustic features for spoken dialog systems Peer-reviewed

Chiba, Y., Nose, T., Ito, M., Ito, A.

IAENG International Journal of Computer Science　43　(1)　1-9　2016/02/29

ISSN： 1819-9224 1819-656X
A PRECISE EVALUATION METHOD OF PROSODIC QUALITY OF NON-NATIVE SPEAKERS USING AVERAGE VOICE AND PROSODY SUBSTITUTION Peer-reviewed

Hafiyan Prafianto, Takashi Nose, Akinori Ito

PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP)　208-212　2016

DOI： 10.1109/ICALIP.2016.7846620 　
DNNを利用したAnimation Unitの変換に基づく顔画像変換の検討 Peer-reviewed

齋藤優貴, 能勢隆, 伊藤彰則

電子情報通信学会論文誌　J199-D　(11)　1112-1115　2016
Prosodically rich speech synthesis interface using limited data of celebrity voice Peer-reviewed

Takashi Nose, Taiki Kamei

Journal of Computer and Communications　4　(16)　79-94　2016
発話状態推定に基づく協調的感情音声合成による音声対話システムの評価 Peer-reviewed

加瀬嵩人, 能勢隆, 千葉祐弥, 伊藤彰則

電子情報通信学会論文誌　J199-A　(1)　25-35　2016/01
Investigation of Pause Insertion Effect in Spoken Easy Japanese for Non-Native Listeners Peer-reviewed

Hafiyan Prafianto, Takeshi Nagano, Takashi Nose, Akinori Ito

Proceedings of 12th Western Pacific Acoustics Conference　507-511　2015/12/08
Automatic Generation of Proper Noun Entries in a Speech Recognizer for Local Information Recognition Peer-reviewed

Kenta Shiga, Takashi Nose, Akinori Ito, Ryo Masumura, Hirokazu Masataki

Proceedings of 12th Western Pacific Acoustics Conference　2015/12/08
Real-time talking avatar on the internet using Kinect and voice conversion Peer-reviewed

Takashi Nose, Yuki Igarashi

International Journal of Advanced Computer Science and Applications　6　(12)　301-307　2015/12
A Computer-Assisted English Conversation Training System for Response-Timing-Aware Oral Conversation Exercise Peer-reviewed

Naoto Suzuki, Yutaka Hiroi, Yuya Chiba, Takashi Nose, Akinori Ito

情報処理学会論文誌　56　(11)　2177-2189　2015/11/01
HMM-based expressive singing voice synthesis with singing style control and robust pitch modeling Peer-reviewed

Takashi Nose, Misa Kanemoto, Tomoki Koriyama, Takao Kobayashi

COMPUTER SPEECH AND LANGUAGE　34　(1)　308-322　2015/11

DOI： 10.1016/j.csl.2015.04.001 　

ISSN： 0885-2308

eISSN： 1095-8363
Conversion of Speaker's Face Image Using PCA and Animation Unit for Video Chatting Peer-reviewed

Saito, Y, Nose, T, Shinozaki, T, Ito, A

Proceedings - 2015 International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2015　433-436　2015/09/25
Publisher: IEEE
DOI： 10.1109/IIH-MSP.2015.85 　
Tempo Modification of Mixed Music Signal by Nonlinear Time Scaling and Sinusoidal Modeling Peer-reviewed

Nishino, T, Nose, T, Ito, A

Proceedings - 2015 International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2015　146-149　2015/09/24
Publisher: IEEE
DOI： 10.1109/IIH-MSP.2015.86 　
Entropy-based sentence selection for speech synthesis using phonetic and prosodic contexts Peer-reviewed

Takashi Nose, Yusuke Arao, Takao Kobayashi, Komei Sugiura, Yoshinori Shiga, Akinori Ito

Proceedings of 16th Annual Conference of the International Speech Communication Association　3491-3495　2015/09/10
On appropriateness and estimation of the emotion of synthesized response speech in a spoken dialogue system Peer-reviewed

Taketo Kase, Takashi Nose, Akinori Ito

Communications in Computer and Information Science　528　747-752　2015/01/01

DOI： 10.1007/978-3-319-21380-4_126 　

ISSN： 1865-0929
Statistical Parametric Speech Synthesis Based on Gaussian Process Regression Peer-reviewed

Tomoki Koriyama, Takashi Nose, Takao Kobayashi

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING　8　(2)　173-183　2014/04

DOI： 10.1109/JSTSP.2013.2283461 　

ISSN： 1932-4553

eISSN： 1941-0484
A Parameter Generation Algorithm Using Local Variance for HMM-Based Speech Synthesis Peer-reviewed

Takashi Nose, Vataya Chunwijitra, Takao Kobayashi

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING　8　(2)　221-228　2014/04

DOI： 10.1109/JSTSP.2013.2283459 　

ISSN： 1932-4553

eISSN： 1941-0484
Prosodic variation enhancement using unsupervised context labeling for HMM-based expressive speech synthesis Peer-reviewed

Yu Maeno, Takashi Nose, Takao Kobayashi, Tomoki Koriyama, Yusuke Ijima, Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka

SPEECH COMMUNICATION　57　144-154　2014/02

DOI： 10.1016/j.specom.2013.09.014 　

ISSN： 0167-6393

eISSN： 1872-7182
PARAMETRIC SPEECH SYNTHESIS USING LOCAL AND GLOBAL SPARSE GAUSSIAN PROCESSES Peer-reviewed

Tomoki Koriyama, Takashi Nose, Takao Kobayashi

2014 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP)　2014

ISSN： 2161-0363
Speech Recognition in a Home Environment Using Parallel Decoding with GMM-Based Noise Modeling Peer-reviewed

Kohei Machida, Takashi Nose, Akinori Ito

2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)　2014

DOI： 10.1109/APSIPA.2014.7041622 　
PARAMETRIC SPEECH SYNTHESIS BASED ON GAUSSIAN PROCESS REGRESSION USING GLOBAL VARIANCE AND HYPERPARAMETER OPTIMIZATION Peer-reviewed

Tomoki Koriyama, Takashi Nose, Takao Kobayashi

2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)　3862-3866　2014

DOI： 10.1109/ICASSP.2014.6854319 　

ISSN： 1520-6149
Tone modeling using stress information for HMM-based Thai speech synthesis Peer-reviewed

Decha Moungsri, Tomoki Koriyama, Tashi Nose, Takao Kobayashi

Proceedings of the 7th International Conference on Speech Prosody　1057-1061　2014
Controlling Switching Pause Using an AR Agent for Interactive CALL System Peer-reviewed

Naoto Suzuki, Takashi Nose, Akinori Ito, Yutaka Hiroi

Communications in Computer and Information Science　435　588-593　2014
Publisher: Springer Verlag
DOI： 10.1007/978-3-319-07854-0_102 　

ISSN： 1865-0929
Subjective Evaluation of Packet Loss RecoveryTechniques for Voice over IP Peer-reviewed

Masahito Okamoto, Takashi Nose, Akinori Ito, Takeshi Nagano

2014 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), VOLS 1-2　711-714　2014

DOI： 10.1109/ICALIP.2014.7009887 　
A Study on the Effect of Speech Rate on Perception of Spoken Easy Japanese Using Speech Synthesis Peer-reviewed

Hafiyan Prafianto, Takashi Nose, Yuya Chiba, Akinori Ito, Kazuyuki Sato

2014 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), VOLS 1-2　476-479　2014

DOI： 10.1109/ICALIP.2014.7009839 　
Robot: Have I Done Something Wrong? -Analysis of Prosodic Features of Speech Commands under the Robot's Unintended Behavior- Peer-reviewed

Noriko Totsuka, Yuya Chiba, Takashi Nose, Akinori Ito

2014 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), VOLS 1-2　887-890　2014

DOI： 10.1109/ICALIP.2014.7009922 　
Tempo modification of music signal using sinusoidal model and LPC-based residue model Peer-reviewed

Akinori Ito, Yuki Igarashi, Masashi Ito, Takashi Nose

Proceedings of the 21st International Congress on Sound and Vibration　1　1-8　2014
User modeling by using bag-of-behaviors for building a dialog system sensitive to the interlocutor's internal state Peer-reviewed

Yuya Chiba, Masashi Ito, Takashi Nose, Akinori Ito

Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue　74-78　2014
Quantized F0 Context and Its Applications to Speech Synthesis, Speech Coding and Voice Conversion Peer-reviewed

Takashi Nose, Takao Kobayashi

2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014)　578-581　2014

DOI： 10.1109/IIH-MSP.2014.149 　
Analysis of English pronunciation of singing voices sung by Japanese speakers Peer-reviewed

Kazumichi Yoshida, Takashi Nose, Akinori Ito

2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014)　554-557　2014

DOI： 10.1109/IIH-MSP.2014.143 　
Transform Mapping Using Shared Decision Tree Context Clustering for HMM-Based Cross-Lingual Speech Synthesis Peer-reviewed

Daiki Nagahama, Takashi Nose, Tomoki Koriyama, Takao Kobayashi

15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4　770-774　2014

ISSN： 2308-457X
Accent type and phrase boundary estimation using acoustic and language models for automatic prosodic labeling Peer-reviewed

Tomoki Koriyama, Hiroshi Suzuki, Takashi Nose, Takahiro Shinozaki, Akinori Ito

Proceedings of 15th Annual Conference of the International Speech Communication Association　2337-2341　2014
Analysis of spectral enhancement using global variance in HMM-based speech synthesis Peer-reviewed

Takashi Nose, Akinori Ito

Proceedings of 15th Annual Conference of the International Speech Communication Association　2917-2921　2014

ISSN： 2308-457X

eISSN： 1990-9772
Frame-level acoustic modeling based on Gaussian process regression for statistical nonparametric speech synthesis Peer-reviewed

Tomoki Koriyama, Takashi Nose, Takao Kobayashi

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings　8007-8011　2013/10/18

DOI： 10.1109/ICASSP.2013.6639224 　

ISSN： 1520-6149
An intuitive style control technique in HMM-based expressive speech synthesis using subjective style intensity and multiple-regression global variance model Peer-reviewed

Takashi Nose, Takao Kobayashi

SPEECH COMMUNICATION　55　(2)　347-357　2013/02

DOI： 10.1016/j.specom.2012.09.003 　

ISSN： 0167-6393

eISSN： 1872-7182
[招待講演] 統計モデルに基づく音声合成における話者・スタイルの多様化 Invited

能勢隆

電子情報通信学会技術研究報告　Vol. 112　(No. 422)　67-72　2013
HMM-BASED EXPRESSIVE SPEECH SYNTHESIS BASED ON PHRASE-LEVEL F0 CONTEXT LABELING Peer-reviewed

Yu Maeno, Takashi Nose, Takao Kobayashi, Tomoki Koriyama, Yusuke Ijima, Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka

2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)　7859-7863　2013

DOI： 10.1109/ICASSP.2013.6639194 　

ISSN： 1520-6149
SPEAKER-INDEPENDENT STYLE CONVERSION FOR HMM-BASED EXPRESSIVE SPEECH SYNTHESIS Peer-reviewed

Hiroki Kanagawa, Takashi Nose, Takao Kobayashi

2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)　7864-7868　2013

DOI： 10.1109/ICASSP.2013.6639195 　

ISSN： 1520-6149
A style control technique for singing voice synthesis based on multiple-regression HSMM Peer-reviewed

Takashi Nose, Misa Kanemoto, Tomoki Koriyama, Takao Kobayashi

Proceedings of 14th Annual Conference of the International Speech Communication Association　378-382　2013
Statistical nonparametric speech synthesis using sparse Gaussian processes Peer-reviewed

Tomoki Koriyama, Takashi Nose, Takao Kobayashi

Proceedings of 14th Annual Conference of the International Speech Communication Association　1072-1076　2013
Robust Estimation of Multiple-Regression HMM Parameters for Dimension-Based Expressive Dialogue Speech Synthesis Peer-reviewed

Tomohiro Nagata, Hiroki Mori, Takashi Nose

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5　1548-1552　2013

ISSN： 2308-457X
Very low bit-rate F0 coding for phonetic vocoders using MSD-HMM with quantized F0 symbols Peer-reviewed

Takashi Nose, Takao Kobayashi

SPEECH COMMUNICATION　54　(3)　384-392　2012/03

DOI： 10.1016/j.specom.2011.10.002 　

ISSN： 0167-6393

eISSN： 1872-7182
A tone-modeling technique using a quantized F0 context to improve tone correctness in average-voice-based speech synthesis Peer-reviewed

Vataya Chunwijitra, Takashi Nose, Takao Kobayashi

SPEECH COMMUNICATION　54　(2)　245-255　2012/02

DOI： 10.1016/j.specom.2011.08.006 　

ISSN： 0167-6393

eISSN： 1872-7182
HMMに基づく対話音声合成における多様な韻律生成のためのコンテクストの拡張 Peer-reviewed

郡山知樹, 能勢隆, 小林隆夫

電子情報通信学会論文誌　Vol. J95-D　(No. 3)　597-607　2012
AN F0 MODELING TECHNIQUE BASED ON PROSODIC EVENTS FOR SPONTANEOUS SPEECH SYNTHESIS Peer-reviewed

Tomoki Koriyama, Takashi Nose, Takao Kobayashi

2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)　4589-4592　2012

DOI： 10.1109/ICASSP.2012.6288940 　

ISSN： 1520-6149
Discontinuous Observation HMM for Prosodic-Event-Based F0 Generation Peer-reviewed

Tomoki Koriyama, Takashi Nose, Takao Kobayashi

13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3　462-465　2012
A speech parameter generation algorithm using local variance for HMM-based speech synthesis Peer-reviewed

Vataya Chunwijitra, Takashi Nose, Takao Kobayashi

13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3　1150-1153　2012
Speaker-independent HMM-based voice conversion using adaptive quantization of the fundamental frequency Peer-reviewed

Takashi Nose, Takao Kobayashi

SPEECH COMMUNICATION　53　(7)　973-985　2011/09

DOI： 10.1016/j.specom.2011.05.001 　

ISSN： 0167-6393

eISSN： 1872-7182
TONAL CONTEXT LABELING USING QUANTIZED F-0 SYMBOLS FOR IMPROVING TONE CORRECTNESS IN AVERAGE-VOICE-BASED SPEECH SYNTHESIS Peer-reviewed

Vataya Chunwijitra, Takashi Nose, Takao Kobayashi

2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING　4708-4711　2011

DOI： 10.1109/ICASSP.2011.5947406 　

ISSN： 1520-6149
VERY LOW BIT-RATE F0 CODING FOR PHONETIC VOCODER USING MSD-HMM WITH QUANTIZED F0 CONTEXT Peer-reviewed

Takashi Nose, Takao Kobayashi

2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING　5236-5239　2011

DOI： 10.1109/ICASSP.2011.5947538 　

ISSN： 1520-6149
A Perceptual Expressivity Modeling Technique for Speech Synthesis Based on Multiple-Regression HSMM Peer-reviewed

Takashi Nose, Takao Kobayashi

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5　116-119　2011
HMM-Based Emphatic Speech Synthesis Using Unsupervised Context Labeling Peer-reviewed

Yu Maeno, Takashi Nose, Takao Kobayashi, Yusuke Ijima, Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5　1860-+　2011
Performance Prediction of Speech Recognition Using Average-Voice-Based Speech Synthesis Peer-reviewed

Tatsuhiko Saito, Takashi Nose, Takao Kobayashi, Yohei Okato, Akio Horii

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5　1964-+　2011
On the Use of Extended Context for HMM-based Spontaneous Conversational Speech Synthesis Peer-reviewed

Tomoki Koriyama, Takashi Nose, Takao Kobayashi

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5　2668-2671　2011
Recent development of HMM-based expressive speech synthesis and its applications Peer-reviewed

Takashi Nose, Takao Kobayashi

Proceedings of 2011 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference　1-4　2011
HMM-Based Voice Conversion Using Quantized F0 Context Peer-reviewed

Takashi Nose, Yuhei Ota, Takao Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E93D　(9)　2483-2490　2010/09

DOI： 10.1587/transinf.E93.D.2483 　

ISSN： 0916-8532
A Rapid Model Adaptation Technique for Emotional Speech Recognition with Style Estimation Based on Multiple-Regression HMM Peer-reviewed

Yusuke Ijima, Takashi Nose, Makoto Tachibana, Takao Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E93D　(1)　107-115　2010/01

DOI： 10.1587/transinf.E93.D.107 　

ISSN： 0916-8532
A Technique for Estimating Intensity of Emotional Expressions and Speaking Styles in Speech Based on Multiple-Regression HSMM Peer-reviewed

Takashi Nose, Takao Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E93D　(1)　116-124　2010/01

DOI： 10.1587/transinf.E93.D.116 　

ISSN： 0916-8532
HMM-BASED SPEECH SYNTHESIS WITH UNSUPERVISED LABELING OF ACCENTUAL CONTEXT BASED ON F0 QUANTIZATION AND AVERAGE VOICE MODEL Peer-reviewed

Takashi Nose, Koujirou Ooki, Takao Kobayashi

2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING　4622-4625　2010

DOI： 10.1109/ICASSP.2010.5495548 　

ISSN： 1520-6149
Learning lexicons from spoken utterances based on statistical model selection Peer-reviewed

Ryo Taguchi, Naoto Iwahashi, Kotaro Funakoshi, Mikio Nakano, Takashi Nose, Tsuneo Nitta

Transactions of the Japanese Society for Artificial Intelligence　25　(4)　549-559　2010

DOI： 10.1527/tjsai.25.549 　

ISSN： 1346-0714 1346-8030
HMM-based robust voice conversion using adaptive F0 quantization Peer-reviewed

Takashi Nose, Takao Kobayashi

Proceedings of 7th ISCA Workshop on Speech Synthesis　80-85　2010
Evaluation of Prosodic Contextual Factors for HMM-based Speech Synthesis Peer-reviewed

Shuji Yokomizo, Takashi Nose, Takao Kobayashi

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2　430-433　2010
Conversational Spontaneous Speech Synthesis Using Average Voice Model Peer-reviewed

Tomoki Koriyama, Takashi Nose, Takao Kobayashi

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2　853-856　2010
Speaker-independent HMM-based Voice Conversion Using Quantized Fundamental Frequency Peer-reviewed

Takashi Nose, Takao Kobayashi

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4　1724-1727　2010
Grounding new words on the physical world in multi-domain human-robot dialogues Peer-reviewed

Mikio Nakano, Naoto Iwahashi, Takayuki Nagai, Taisuke Sumii, Xiang Zuo, Ryo Taguchi, Takashi Nose, Akira Mizutani, Tomoaki Nakamura, Muhammad Attamimi, Hiromi Narimatsu, Kotaro Funakoshi, Yuji Hasegawa

AAAI Publications, 2010 AAAI Fall Symposium Series　74-79　2010
Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis Peer-reviewed

Junichi Yamagishi, Takashi Nose, Heiga Zen, Zhen-Hua Ling, Tomoki Toda, Keiichi Tokuda, Simon King, Steve Renals

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING　17　(6)　1208-1230　2009/08

DOI： 10.1109/TASL.2009.2016394 　

ISSN： 1558-7916

eISSN： 1558-7924
HMM-Based Style Control for Expressive Speech Synthesis with Arbitrary Speaker's Voice Using Model Adaptation Peer-reviewed

Takashi Nose, Makoto Tachibana, Takao Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E92D　(3)　489-497　2009/03

DOI： 10.1587/transinf.E92.D.489 　

ISSN： 0916-8532
EMOTIONAL SPEECH RECOGNITION BASED ON STYLE ESTIMATION AND ADAPTATION WITH MULTIPLE-REGRESSION HMM Peer-reviewed

Yusuke Ijima, Makoto Tachibana, Takashi Nose, Takao Kobayashi

2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS　4157-4160　2009

DOI： 10.1109/ICASSP.2009.4960544 　

ISSN： 1520-6149
Speaking Style Adaptation for Spontaneous Speech Recognition Using Multiple-Regression HMM Peer-reviewed

Yusuke Ijima, Takeshi Matsubara, Takashi Nose, Takao Kobayashi

INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5　548-551　2009
HMM-based Speaker Characteristics Emphasis Using Average Voice Model Peer-reviewed

Takashi Nose, Junichi Adada, Takao Kobayashi

INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5　2599-2602　2009
Learning Lexicons from Spoken Utterances Based on Statistical Model Selection Peer-reviewed

Ryo Taguchi, Naoto Iwahashi, Takashi Nose, Kotaro Funakoshi, Mikio Nakano

INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5　2687-2690　2009
Recent development of the HMM-based speech synthesis system (HTS) Peer-reviewed

Heiga Zen, Keiichiro Oura, Takashi Nose, Junichi Yamagishi, Shinji Sako, Tomoki Toda, Takashi Masuko, Alan W. Black, Keiichi Tokuda

Proceedings of 2009 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference　121-130　2009
Performance evaluation of the speaker-independent HMM-based speech synthesis system "HTS-2007" for the Blizzard Challenge 2007 Peer-reviewed

Junichi Yamagishi, Takashi Nose, Heiga Zen, Tomoki Toda, Keiichi Tokuda

2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12　3957-+　2008

DOI： 10.1109/ICASSP.2008.4518520 　

ISSN： 1520-6149
Speaker and style adaptation using average voice model for style control in HMM-based speech synthesis Peer-reviewed

Makoto Tachibana, Shinsuke Izawa, Takashi Nose, Takao Kobayashi

2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12　4633-4636　2008

DOI： 10.1109/ICASSP.2008.4518689 　

ISSN： 1520-6149
An On-line Adaptation Technique for Emotional Speech Recognition Using Style Estimation with Multiple-Regression HMM Peer-reviewed

Yusuke Ijima, Makoto Tachibana, Takashi Nose, Takao Kobayashi

INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5　1297-1300　2008
An Estimation Technique of Style Expressiveness for Emotional Speech Using Model Adaptation Based on Multiple-Regression HSMM Peer-reviewed

Takashi Nose, Yoichi Kato, Makoto Tachibana, Takao Kobayashi

INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5　2759-2762　2008
A style control technique for HMM-based expressive speech synthesis Peer-reviewed

Takashi Nose, Junichi Yamagishi, Takashi Masuko, Takao Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E90D　(9)　1406-1413　2007/09

DOI： 10.1093/ietisy/e90-d.9.1406 　

ISSN： 0916-8532
A speaker adaptation technique for MRHSMM-based style control of. synthetic speech Peer-reviewed

Takashi Nose, Yoichi Kato, Takao Kobayashi

2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3　833-+　2007

DOI： 10.1109/ICASSP.2007.367042 　

ISSN： 1520-6149
The HMM-based speech synthesis system version 2.0 Peer-reviewed

Heiga Zen, Takashi Nose, Junichi Yamagishi, Shinji Sako, Takashi Masuko, Alan W. Black, Keiichi Tokuda

Proceedings of 6th ISCA Workshop on Speech Synthesis　294-299　2007
Style Estimation of Speech Based on Multiple Regression Hidden Semi-Markov Model Peer-reviewed

Takashi Nose, Yoichi Kato, Takao Kobayashi

INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4　2900-2903　2007
A Style Control Technique for Speech Synthesis Using Multiple Regression HSMM Peer-reviewed

Takashi Nose, Junichi Yamagishi, Takao Kobayashi

INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5　1324-1327　2006
A Technique for Controlling Voice Quality of Synthetic Speech Using Multiple Regression HSMM Peer-reviewed

Makoto Tachibana, Takashi Nose, Junichi Yamagishi, Takao Kobayashi

INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5　2438-2441　2006

Show all ︎Show first 5

Misc. 52

Invited Talk : Synthesis, Recognition and Conversion of Various Speech Using Deep Learning and Their Applications

117　(160)　3-8　2017/07/27
Publisher: 電子情報通信学会
ISSN： 0913-5685
A Study on DNN-Based Speech Synthesis Using Vector Quantization of Spectral Features

116　(414)　65-70　2017/01/21
Publisher: 電子情報通信学会
ISSN： 0913-5685
Poster Presentation : A Study on Singer-Independent Singing Voice Conversion Using Read Speech Based on Neural Network

116　(414)　17-22　2017/01/21
Publisher: 電子情報通信学会
ISSN： 0913-5685
Improvement of Accent Sandhi Rules Based on Accent Dictionary for Japanese Text-to-Speech Systems

116　(378)　31-36　2016/12/20
Publisher: 電子情報通信学会
ISSN： 0913-5685
Poster Presentation : Development of the Julius-compatible interface for the speech recognition engine of Kaldi toolkit

116　(378)　49-51　2016/12/20
Publisher: 電子情報通信学会
ISSN： 0913-5685
Poster Presentation : F0 control by modeling differential features in DNN-based speech synthesis

116　(378)　37-42　2016/12/20
Publisher: 電子情報通信学会
ISSN： 0913-5685
A Study on Colorization in Photo-Realistic Facial Animation Synthesis from Text Based on HMM and DNN with Animation Unit

116　(220)　67-72　2016/09/15
Publisher: 電子情報通信学会
ISSN： 0913-5685
A Study on Colorization in Photo-Realistic Facial Animation Synthesis from Text Based on HMM and DNN with Animation Unit

40　(31)　67-72　2016/09
Publisher: 映像情報メディア学会
ISSN： 1342-6893
Study of Photo-realistic Face Moving Image Generation from the Text Using the Facial Feature

116　(33)　43-48　2016/05/19
Publisher: 電子情報通信学会
ISSN： 0913-5685
A study on quick model training in HMM-based speech synthesis

115　(253)　27-32　2015/10/15
Publisher: 電子情報通信学会
ISSN： 0913-5685
Automatic generation of abbreviated named entities for localized speech recognition

115　(184)　7-12　2015/08/21
Publisher: 電子情報通信学会
ISSN： 0913-5685
HMM音声合成におけるアクセントラベリング基準が合成音声に与える影響の分析

高橋遼太, 能勢隆, 伊藤彰則

情報処理学会研究報告. SLP, 音声言語情報処理　2015　(1)　1-6　2015/05/18
Publisher: 一般社団法人情報処理学会

More details Close

本論文では,従来の HMM 音声合成において曖昧であったアクセントラベリング基準について検討を行い,合成音声への影響を調べる.具体的には,アクセント型の表現およびアクセント句境界の基準について検討する.アクセント型については,尾高型が 0 型とモーラ長型の 2 通りの表現があることに着目し,それらを用いた場合に合成音声の F0 がどのような影響を受けるかについて客観評価を行う.また,2 段階クラスタリングを用いる効果についても検証する.アクセント句境界については,アクセント句によっては 0 型と 1 型の 2 つのアクセント句で表現する場合と,それらを結合し 1 つのアクセント句として表現する場合があり,これらの違いが合成音声に与える影響を調べる.またこれらの評価において,日本語アクセントの高低の誤りを客観的指標として導入し,この指標の有効性について分析を行う.
シナリオ対話における感情音声合成を用いた対話システムの評価と感情付与方法の検討

加瀬嵩人, 能勢隆, 千葉祐弥, 伊藤彰則

情報処理学会研究報告. SLP, 音声言語情報処理　2015　(9)　1-7　2015/05/18
Publisher: 一般社団法人情報処理学会

More details Close

近年,非タスク指向型の音声対話システムへの需要が拡大しており,様々な研究がされている.それらほとんどの研究は言語的な観点から適切な応答の生成を目指したものである.一方で人間同士の会話においては,感情表現や発話様式などのパラ言語情報を効果的に利用することにより,対話を円滑に進めることができると考えられる.そこで我々はシステムの応答の内容ではなく,応答の仕方に着目し,感情音声合成を対話システムに用いることを試みる.本研究ではまず,適切な感情付与を人手により与えた場合に実際に対話システムの質が向上するかを複数のシナリオを作成して主観基準により評価する.次に,感情付与を自動化するために,システム発話に応じた付与とユーザ発話に協調した付与の 2 つの手法について検討を行う.評価結果から,感情を自動付与することで対話におけるユーザの主観評価スコアが向上すること,またユーザ発話に協調した感情付与がより効果的であることを示す.
ユーザの対話意欲自動推定を目標とした対話データの分析と音声画像特徴量の検討

千葉祐弥, 能勢隆, 伊藤彰則

研究報告音声言語情報処理（SLP）　2015　(10)　1-6　2015/02/20
Publisher: 一般社団法人情報処理学会

More details Close

対話型システムがユーザに適応して話題の提供や情報推薦を行うためには，ユーザの情報を効率的に獲得できることが望ましい．本研究では，ユーザに対して積極的に質問するインタビュー型の音声対話システムを想定する．このようなシステムとの対話では，ユーザが話したいと思う話題に関してはより詳細な情報が得られる可能性がある一方，ユーザが話したくない話題に関しては有益な情報が得られない可能性が高いと考えられるため，システムはユーザの対話意欲を考慮して質問や話題の選択を行う必要がある．本稿では，ユーザの対話意欲を自動推定するための初期検討として，人間同士のインタビュー対話の分析とその自動識別を行った．分析から，対話者自身が自分の対話意欲の高低を自覚できている場合，70～80% 程度の精度で第三者にあたる評価者が対話意欲を判断できることが示唆された．また，評価者のアンケートに挙げられたマルチモーダル情報を利用することで，人間と同程度の精度で自動識別できることが示された．
Waveletを用いた特徴量抽出法とその高精度化手法の評価

松井清彰, 能勢隆, 伊藤彰則

研究報告音声言語情報処理（SLP）　2015　(5)　1-6　2015/02/20
Publisher: 一般社団法人情報処理学会

More details Close

音声認識の普及のために，より安価な音声認識システムの実現が必要である．音声認識の低演算量化に関しては様々な先行研究が行われているが，特徴量抽出処理に関しては研究が不十分である．そのため我々は，Wavelet 変換を用いた新しい低演算量特徴量抽出法およびその高精度化手法について提案してきた．本論文では，Haar Wavelet 及び Daubechies Wavelet の 2 種類の Wavelet を用いて特徴量抽出を行い，その性能を MFCC と比較した．その結果，高精度化手法を用いることで，若干の認識率の向上が見られた．また，フレーム間の動的特徴量である Δ 特徴量及び MFCC と同様に，DCT 出力の高次削減によって，さらに認識率を向上させることができた．一方，計算時間に関しては，最もシンプルな Wavelet を用いることで，MFCC の 5 倍以上の計算速度を確保できることが分かった．
Performance Evaluation of Large-Scale Training Sentence Set Construction Based on Entropy in Statistical Speech Synthesis

能勢隆, 荒生侑介, 荒生侑介, 小林隆夫, 杉浦孔明, 志賀芳則

電子情報通信学会技術研究報告　115　(184(SP2015 50-58))　2015

ISSN： 0913-5685
英会話学習システムの複数回使用時における学習者の交替潜時の変化に関する検討

鈴木直人, 廣井富, 藤原祐磨, 千葉祐弥, 能勢隆, 伊藤彰則

日本音響学会研究発表会講演論文集(CD-ROM)　2015　2015

ISSN： 1880-7658
英会話学習システムにおける応答タイミング練習方法の有効性の検証

鈴木直人, 廣井富, 藤原祐磨, 千葉祐弥, 能勢隆, 伊藤彰則

情報処理学会研究報告(Web)　2015　(SLP-105)　2015
日本人による英語歌唱音声の発音評価手法の検討

吉田一道, 能勢隆, 伊藤彰則

研究報告音楽情報科学（MUS）　2014　(9)　1-6　2014/11/13

More details Close

我々は日本人による英語歌唱音声の英語発音の自動評価を目指している．本研究では，日本人による英語歌詞朗読音声，歌唱音声のデータベースを構築し，英語ネイティブ話者と日本語ネイティブ話者による主観評価を行った．また，英語ネイティブ話者と日本語ネイティブ話者による英語歌詞朗読音声と英語歌唱音声の評価を比較し，歌唱音声では発話音声と比較して伸ばすフレーズに発音誤りが生じやすいということが示唆された．さらに，HMM による英語歌唱の自動発音評価手法について検討し，日米 2 言語のネイティブ話者による発話音声から学習した HMM を用いた簡単な発音誤り判定実験を行った．その結果，発音誤り判定時の尤度差の閾値や歌唱時に伸ばすフレーズの発音誤りの検討により，更に検出精度を向上させられる可能性がある事を論じた．
日本人による英語歌唱音声の発音評価手法の検討

吉田一道, 能勢隆, 伊藤彰則

研究報告デジタルコンテンツクリエーション（DCC）　2014　(9)　1-6　2014/11/13

More details Close

我々は日本人による英語歌唱音声の英語発音の自動評価を目指している．本研究では，日本人による英語歌詞朗読音声，歌唱音声のデータベースを構築し，英語ネイティブ話者と日本語ネイティブ話者による主観評価を行った．また，英語ネイティブ話者と日本語ネイティブ話者による英語歌詞朗読音声と英語歌唱音声の評価を比較し，歌唱音声では発話音声と比較して伸ばすフレーズに発音誤りが生じやすいということが示唆された．さらに，HMM による英語歌唱の自動発音評価手法について検討し，日米 2 言語のネイティブ話者による発話音声から学習した HMM を用いた簡単な発音誤り判定実験を行った．その結果，発音誤り判定時の尤度差の閾値や歌唱時に伸ばすフレーズの発音誤りの検討により，更に検出精度を向上させられる可能性がある事を論じた．
A Study on Intuitive Control of Emotional Expressions and Speaking Styles Using Facial Features by Kinect

BI Yu, NOSE Takashi, ITO Akinori

IEICE technical report. Speech　114　(303)　25-30　2014/11/13
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

This paper proposes a style control technique of synthetic speech based on multiple regression HSMM (MRHSMM) using facial features. In the proposed technique, styles and their intensities are represented by Animation Unit (AU) parameters and are modeled by an assumption that mean parameters of acoustic models are given as multiple regressions of the AU parameters. Since correlation among AU parameters is problematic in the modeling, we conducted orthogonalization and dimiensionality reduction in advance. When synthesizing speech, we can generated synthetic speech with an intended style by inputting the corresponding facial expression. In this study, we examine the appropriate number of AU parameters and discuss the performance difference depending on the users.
A Study on Hyperparameter Optimization for Speech Synthesis Based on Gaussian Process Regression

KORIYAMA Tomoki, NOSE Takashi, KOBAYASHI Takao

IEICE technical report. Speech　113　(404)　19-24　2014/01/23
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

In a statistical parametric speech synthesis framework based on Gaussian process regression, it is important to use an appropriate kernel function. However the parameters of the kernel function, which are hyperparameters of Gaussian processes, were not optimized in our previous work. In this study, we examine hyperparameter optimization algorithm based on an empirical Bayes approach. We show that the proposed method can enhance the predictive likelihood and improve the naturalness of synthesic speech through objective and subjective evaluation results.
英会話学習システムにおけるCGキャラクタの効果と学習者の発話タイミング制御のための付加表現に関する検討

鈴木直人, 廣井富, 藤原祐磨, 千葉祐弥, 能勢隆, 伊藤彰則

日本音響学会研究発表会講演論文集(CD-ROM)　2014　2014

ISSN： 1880-7658
ARキャラクタとの英会話練習時における交替潜時のタイムプレッシャーによる制御

鈴木直人, 廣井富, 藤原祐磨, 黒田尚孝, 戸塚典子, 千葉祐弥, 能勢隆, 伊藤彰則

日本音響学会研究発表会講演論文集(CD-ROM)　2014　2014

ISSN： 1880-7658
Automatic Estimation of Accent Phrase Boundaries Using Language and Acoustic Models

SUZUKI Hiroshi, KORIYAMA Tomoki, NOSE Takashi, SHINOZAKI Takahiro, KOBAYASHI Takao

IEICE technical report. Speech　113　(366)　97-102　2013/12/19
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

This paper proposes a technique for automatically estimating accent phrase boundaries for text-to-speech synthesis systems. To construct speech synthesis systems, we need to prepare a database that has annotations of prosodic information including accents. However, manual annotation for this purpose generally requires costly process. In contrast, the proposed method utilizes conditional random field (CRF) for the language models of accent phrase boundary and accent type, and uses hidden markov model (HMM) for the acoustic feature model. In this paper, we confirmed that the proposed method improved the estimation accuracy for reading-style speech data compared with conventional method.
Automatic Estimation of Accent Phrase Boundaries Using Language and Acoustic Models

Hiroshi Suzuki, Tomoki Koriyama, Takashi Nose, Takahiro Shinozaki, Takao Kobayashi

IPSJ SIG Notes　2013　(16)　1-6　2013/12/12
Publisher: Information Processing Society of Japan (IPSJ)

More details Close

This paper proposes a technique for automatically estimating accent phrase boundaries for text-to-speech synthesis systems. To construct speech synthesis systems, we need to prepare a database that has annotations of prosodic information including accents. However, manual annotation for this purpose generally requires costly process. In contrast, the proposed method utilizes conditional random field (CRF) for the language models of accent phrase boundary and accent type, and uses hidden markov model (HMM) for the acoustic feature model. In this paper, we confirmed that the proposed method improved the estimation accuracy for reading-style speech data compared with conventional method.
A Study on a Style Control Based on Multiple-Regression HSMM for Synthesizing Singing Voices with Various Expressivity

NOSE Takashi, KANEMOTO Misa, KORIYAMA Tomoki, KOBAYASHI Takao

IEICE technical report. Speech　112　(422)　79-84　2013/01/30
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

This paper proposes a style control technique based on multiple regression HSMM (MRHSMM) for changing styles and their intensities appearing in synthetic singing voices. In the proposed technique, styles and their intensities are represented by low-dimensional vectors called style vectors and are modeled by an assumption that mean parameters of acoustic models are given as multiple regressions of the style vectors. When synthesizing speech, we can weaken or emphasize the intensity of each style by setting a desired style vector. In addition, the idea of pitch adaptive training is introduced into the MRHSMM to improve the modeling accuracy of F0 associated with musical notes. The novel vibrato modeling technique is also presented to extract vibrato parameters from singing voices that sometimes have unclear vibrato expressions. Subjective evaluations show that we can intuitively contorol styles and their intensities while keeping naturalness of synthetic speech.
A study on Speaker-Normalized Style Conversion for Arbitrary Speaker's Expressive Speech Synthesis

KANAGAWA Hiroki, NOSE Takashi, KOBAYASHI Takao

IEICE technical report. Speech　112　(422)　73-78　2013/01/30
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

This paper proposes a technique for improving naturalness of synthetic speech using a framework of speaker adaptive training in HMM-based style conversion. In the style conversion, speaker-independent linear transforms are estimated using neutral- and target-style speech data of multiple speakers, and estimated transforms are applied to a new speaker's neutral-style model. As a result, we can convert the style expressivity of the acoustic model to the target style without preparing any target-style speech of the speaker. When spectral and prosodic features of training speakers are significantly different from each other, naturalness of synthetic speech fioni the converted model decreases. The proposed technique attempt to alleviate this problem by normalizing speaker characteristics using an approach similar to speaker adaptive training. From the objective and subjective evaluation results, we show that the speaker normalization technique can provide more natural sounding speech.
A Study on Multi-Class Local Prosodic Context for Expressive Prosody Generation

MAENO Yu, NOSE Takashi, KOBAYASHI Takao, KORIYAMA Tomoki, IJIMA Yusuke, NAKAJIMA Hideharu, MIZUNO Hideyuki, YOSHIOKA Osamu

IEICE technical report. Speech　112　(422)　85-90　2013/01/23
Publisher: The Institute of Electronics, Information and Communication Engineers

More details Close

This paper describes a technique for reproducing local prosodic variability which appears in expressive speech including various speaking styles. Synthetic speech generated using only linguistic contexts in HMM-based speech synthesis tends to have smaller prosody variation compared with the original speech. To add more variation in synthetic speech, we define novel phrase-level prosodic contexts from the residual information of prosodic features between original and synthetic speech for training data. Specifically, we create the prosodic contexts of F0, duration, and power feature by using average difference between original and synthetic speech in each phrase. We evaluate the potential of the proposed technique under a condition where the appropriate prosodic contexts of test sentences are known in synthesis phase. We also examine whether users can intuitively modify the pitch by adjusting proposed prosodic contexts.
Modeling of Local Variance of Spectral Features and Its Application to Parameter Generation in HMM-based Speech Synthesis

NOSE Takashi, CHUNWIJITRA Vataya, KOBAYASHI Takao

IEICE technical report. Speech　112　(281)　43-48　2012/11/01
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

In this paper, we describe a technique for modeling local variance (LV) of speech features and propose a novel parameter generation algorithm using the LV model for HMM-based speech synthesis. In the proposed technique, We define the LV as a feature that represents the local variation around each frame of the spectral features and model them using context-dependent phone HMMs. To appropriately model the dynamic characteristics of LVs, we take into account the dynamic features of LVs as well as the static one. In the parameter generation process, a spectral parameter sequence is estimated so as to maximize a target function where conventional HMMs and LV models are combined. By using the LV models, the proposed technique can impose a more precise variance restriction in the parameter generation than the conventional technique where the global variance (GV) model is used. Through objective and subjective evaluations, we examine the effectiveness of the proposed technique.
A Study on Automatic Prosodic Context Labeling for Emphatic Speech Synthesis

MAENO Yu, NOSE Takashi, KOBAYASHI Takao, IJIMA Yusuke, NAKAJIMA Hideharu, MIZUNO Hideyuki, YOSHIOKA Osamu

IEICE technical report. Speech　112　(81)　1-6　2012/06/07
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

This paper describes automatic prosodic context labeling of training data for synthesizing expressive speech in HMM-based speech synthesis framework by focusing on emphasis expression. We have proposed unsupervised labeling technique with emphasis context using the property of the difference between original and generated F0 patterns. In this approach, there is a problem that the threshold which is used to judge whether a phrase is emphasized or not has to be pre-determined. To overcome this problem, we propose a technique for determining an optimal threshold automatically based on a behavior of F0 pattern in emphatic speech. Experimental results show that the proposed technique gives a similar result to the labeling obtained by subjectively and the emphasis expression is well reproduced in synthetic speech.
A Study on Phone Duration Modeling Using Dynamic Features for HMM-Based Speech Synthesis

2011　(33)　1-6　2011/12/12
On the use of prosodic-event-based HMM in F0 generation of conversational speech

KORIYAMA Tomoki, NOSE Takashi, KOBAYASHI Takao

IEICE technical report. Natural language understanding and models of communication　111　(364)　185-190　2011/12/12
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

In this paper, we propose prosodic-event-based HMM for effectively modeling F0 pattern of spontaneous conversational speech in HMM-based speech synthesis. The prosodic-event-based HMM uses the segment such as pitch falling by accent or pitch rising of boundary pitch movement (BPM) as a modeling unit of HMM. The proposed HMM is expected to reduce the model parameters of F0 because there are less prosodic events derived from F0 features than phones that strongly depends on spectral features. We performed the objective and subjective experiments using spontaneous conversational speech data, and the results show that the prosodic-event-based HMM can significantly reduce the number of model parameters while keeping the quality of the synthetic speech.
An MRHSMM-based conversational speech synthesis with controllability of paralinguistic information

NAGATA Tomohiro, MORI Hiroki, NOSE Takashi

IEICE technical report. Natural language understanding and models of communication　111　(364)　179-184　2011/12/12
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

In this paper, we aim at the realization of the speech synthesis that can control paralinguistic information using multiple regression HSMM, incorporated a multiple regression model in hidden semi-Markov model (HSMM)-based speech synthesis scheme. In this study, the paralinguistic information is expressed as a coordinate on space comprised of a small number dimension and the dimensions are used as an explanation variable of the multiple regression model. Two dimensions that considered to be a general index to express emotional state for "PLEASANTNESS" and "AROUSAL" are used. When learning model, evaluated values are used subjectively for each dimenstions. And when synthesize speech, we synthesize any speech that reflected emotion by giving arbitrary values. We examine the influence that two dimensions give synthesized speech with acoustic features of synthesized speech. Additionally, we have three subjective experiments for synthesized speech. First, the result of a naturally test show that synthesized speech are natural. Next, the result of a reproducibility test show that reproducibility of given emotion. Finally, the result of a emotional expression test show that synthesized speech transmit an aimed emotion.
A Study on Speaker-Independent Style Conversion in HMM Speech Synthesis

KANAGAWA Hiroki, NOSE Takashi, KOBAYASHI Takao

IEICE technical report. Natural language understanding and models of communication　111　(364)　191-196　2011/12/12
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

This paper proposes a technique for synthesizing speech of a desired style using speaker-independent style conversion in HMM-based speech synthesis. The HMM-based style adaptation technique has been proposed that can synthesize speech of arbitrary sentences with a target style. However, this technique cannot be used when the speech data of the target style is not available. To overcome the problem, we extend the speaker-dependent style conversion in the style adaptation to speaker-independent one. Specifically, first we prepare neutral- and target-style speech data of multiple speakers and train a neutral-style average voice model. The style conversion from the average voice model to the target style one is trained using linear transformation. We then apply the transformation matrices to the neutral-style model of the target speaker. Finally, we obtain the target-style model of the target speaker and synthesize the style-converted speech. We evaluate the proposed technique in terms of speaker and style characteristics and naturalness.
A Study on Phone Duration Modeling Using Dynamic Features for HMM-Based Speech Synthesis

NOSE Takashi, KOBAYASHI Takao

IEICE technical report. Natural language understanding and models of communication　111　(364)　197-202　2011/12/12
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

This paper proposes a technique for modeling and generating phone durations using their dynamic features to improve prediction accuracy of phone durations in HMM-based speech synthesis. For the duration modeling, a technique with explicit state-duration modeling based on hidden semi-Markov model (HSMM) has been proposed. However, the HSMM cannot directly model phone durations, and the relation of phone durations among adjacent phonemes are represented only by context labels. In the proposed technique, phone durations are regarded as observable data obtained by manual labeling or forced alignment and are directly modeled using single Gaussian distributions. To explicitly take into account the correlation of phone durtions in the model training and speech synthesis, we use not only static phone durations but also dynamic ones. When synthesizing speech, we generate a phone-duration sequence from the trained duration models using a parameter generation algorithm with static and dynamic features. We evaluate the performance of our duration modeling technique by comparing to other techniques with static or static log-duration features.
Performance Evaluation of Contexts for Conversational Speech Synthesis Using Corpus of Spontaneous Japanese

KORIYAMA Tomoki, NOSE Takashi, KOBAYASHI Takao

IEICE technical report　111　(28)　155-160　2011/05/05
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

This paper proposes an extended context set for generating the prosodic variability of spontaneous speech in HMM-based conversational speech synthesis. Since the conventional context set used for HMM-based reading-style speech synthesis is insufficient for conversational speech synthesis, we introduce new contexts derived from the Corpus of Spontaneous Japanese. We compare the context sets with and without newly introduced contexts, and the experimental results show that the contexts about phone prolongation and X-JToBI tone tier label are effective. Furthermore, we examine the stopping criteria for decision-tree clustering and the automatic estimation of a part of contexts for practical applications.
Study on HMM-based F0 Coding for Very Low Bit-Rate Vocoder

2010　(5)　1-6　2011/02
Publisher: 情報処理学会
ISSN： 1884-0930
Study on HMM-based F0 Coding for Very Low Bit-Rate Vocoder

NOSE Takashi, KUMAMOTO Masashi, KOBAYASHI Takao

IEICE technical report　110　(356)　189-194　2010/12/13
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

This paper presents a novel F0 coding technique for very low bit-rate HMM-based phonetic vocoder. Our technique is based on the multi-space distribution HMM (MSD-HMM) with quantized F0 symbols used as a prosodic context. By introducing the F0 symbol, we can model F0 values without using manually labeled speech data including accent information. In the encoding process, the F0 sequence extracted from an input utterance is converted into the quantized F0 symbol sequence, and these symbols are transmitted with the phonemes and state durations obtained by a phoneme recognizer. In the decoding process, context-dependent labels are created from the phonemes and F0 symbols, and the spectral and F0 sequences are generated using the pre-trained MSD-HMM on the basis of a maximum likelihood criterion. The experimental results show that the degradation of F0 quality through the coding process is not annoying even if the bit-rate for F0 is less than 50 bit/s.
A Study on Conversational Speech Synthesis Based on Average Voice Model

KORIYAMA Tomoki, NOSE Takashi, KOBAYASHI Takao

IEICE technical report　109　(375)　33-38　2010/01/14
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

This paper describes a conversational speech synthesis technique using average voice model and model adaptation based on hidden semi-Markov model (HSMM). In conversational speech, the acoustic features are affected by various factors such as speaker individuality, speaking style, and speaker's intention, and it is not easy to generate natural sounding speech using a small amount of speech data of a target speaker. To overcome this problem, the proposed technique utilizes an average voice model trained in advance using multiple speakers' speech data and adapts the model to the target speaker's one using a speaker adaptation technique. We can generate synthetic speech even if the available speech data of the target speaker is very limited. In this study, we evaluate the performance of the proposed technique by objective measures. We use two types of average voice models, one is trained with read speech, and the other with conversational speech. The experimental results show that the distortion of spectral and pitch features between synthetic and original speech samples decreases when using the proposed technique.
Performance evaluation of Voice Conversion Based on F0 Quantization and Non-parallel Training

OTA Yuhei, NOSE Takashi, KOBAYASHI Takao

IEICE technical report　109　(375)　27-32　2010/01/14
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

This paper describes the performance evaluation results of a context-dependent HMM-based voice conversion technique to show its effectiveness by comparing with a GMM-based one. In the HMM-based conversion, first we extract the phonetic and prosodic information from input speech of a source speaker. Then, converted synthetic speech is generated from the pre-trained acoustic model of a target speaker. To appropriately model the pitch information, we use a roughly quantized F0 symbol sequence as the prosodic context instead of accent information obtained by manual labeling for training data. By using the phonetically and prosodically context-dependent HMMs, the speaker characteristics appearing in segmental and supra-segmental features can be also converted, which is difficult in conventional GMM-based techniques. Objective and subjective experimental results show that the naturalness and speaker individuality of converted speech are significantly improved by using HMM-based voice conversion.
A Study on Voice Conversion Based on F0 Quantization and Non-parallel Training

OTA Yuhei, NOSE Takashi, KOBAYASHI Takao

IEICE technical report　109　(356)　171-176　2009/12/14
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

This paper presents a novel voice conversion technique using HMM-based phoneme recognition and speech synthesis with nonparallel training data. In the proposed technique, a phoneme sequence with durations and a rough F0 contour are extracted from input speech of a source speaker using phoneme recognition and F0 quantization, and are transmitted to synthesis part. In the synthesis part, a context-dependent label sequence is generated from the transmitted phonemes, durations, and quantized F0 symbols. Then, converted speech is generated from the label sequence using a target speaker's pre-trained MSD-HMM. In the model training, the models of the source and the target speakers can be trained separately with nonparallel data. For duration modification, linear transformation is applied to each phone duration of input speech. The objective and subjective experimental results show that the proposed technique works well even if the parallel speech data is not available.
HMM-based Speech Synthesis Using Quantized-F0-based Prosodic Context

OOKI Koujirou, NOSE Takashi, KOBAYASHI Takao

IEICE technical report　109　(356)　141-146　2009/12/14
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

This paper describes a technique for an HMM-based speech synthesis without using any manual labeling of accent information for a target speaker's training data. The proposed technique utilizes coarsely quantized F0 symbols instead of hand-labeled accent information for the context-dependent label in F0 modeling based on HMM. F0 quantization leads to automatic labeling of F0 contexts for the training data with high accuracy. When synthesizing speech, an F0 contour is firstly generated by using a pre-trained average voice model with a conventional context-dependent label sequence converted from an input text, and then a label sequence for synthesis is created by quantizing the generated F0 contour. Synthetic speech is generated from the target speaker's model with the obtained label. Results of the objective and subjective evaluation tests are shown to demonstrate the effectiveness of the proposed method.
Speaking Style Classification of Spontaneous Speech Using Multiple-Regression HMM

NOSE Takashi, MATSUBARA Takeshi, IJIMA Yusuke, KOBAYASHI Takao

IEICE technical report　109　(139)　31-36　2009/07/10
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

This paper describes speaking style classification and speech recognition for spontaneous speech based on multiple-regression HMM (MRHMM). In MRHMM, the mean vector of each probability density function is given by multiple regression of a low-dimensional vector, called style vector. Each component of the style vector corresponds to the intensity of expressivity of speaking style variation, and the type of speaking style can be classified by estimating the style vector for input speech based on an ML criterion. Moreover, in spontaneous speech recognition, acoustic models are adapted on-line by updating model parameters using the estimated style vector for each input utterance. The performance evaluation using the Corpus of Spontaneous Japanese (CSJ) shows that a high classification rate is obtained even when the amount of available training data is very limited. The effectiveness of the proposed technique is also shown by a phoneme recognition experiment.
自然な対話の中で物体の名前を覚えるロボット

中野幹生, 長井隆行, 能勢隆, 田口亮, 水谷了, 中村友昭, 船越孝太郎, 長谷川雄二, 鳥井豊隆, 岩橋直人

JSAI大会論文集　2009　(0)　1F2OS73-1F2OS73　2009
Publisher: 一般社団法人人工知能学会

More details Close

<p>発話と画像情報を入力として，物の名前を覚えるロボットが研究されているが，名前を覚えさせるモードをあらかじめ設定しておかなくてはならなかったり，名前を覚えさせる発話のパタンが決まっていたりした．本稿では，さまざまなドメインの対話を行うことができ，対話の途中で物の名前を教示する発話を聞くと学習を行うことができるロボットのアーキテクチャとその実装について述べる． </p>
モデル選択による言語獲得手法とその評価

田口亮, 岩橋直人, 能勢隆, 船越孝太郎, 中野幹生

JSAI大会論文集　2009　(0)　1F2OS72-1F2OS72　2009
Publisher: 一般社団法人人工知能学会

More details Close

<p>本稿では，単語の知識を持たないロボットが，人の自由な発話から物や場所の名前を学習する手法を提案する．初期の単語候補は，学習データの音素認識結果から生成する．この単語候補を用いて単語認識と意味・文法の学習を行い，統計的モデル選択の基準を元に，音響的，文法的，意味的に不要な単語を削除・連結する．そして再び単語認識を行う．これを繰り返すことで，単語の正しい音素系列と意味が獲得される．</p>
Acoustic Model Training Technique for Speech Recognition Using Style Estimation with Multiple-Regression HMM

IJIMA Yusuke, TACHIBANA Makoto, NOSE Takashi, KOBAYASHI Takao

IPSJ SIG Notes　2008　(123)　37-42　2008/12/02
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

We propose a technique for emotional speech recognition based on multiple-regression HMM (MRHMM). To achieve emotional speech recognition for an arbitrary speaker with a small amount of training data, we incorporate a speaker and style adaptation technique into speaker-dependent MRHMM-based emotional speech recognition. In the proposed technique, we first adapt the speaker-independent model to target speaker's respective styles with a small amount of speech data. Then, using obtained speaker- and style-adapted HMMs and low-dimensional style control vector for each training style, the regression matrices of MRHMM are estimated based on least square method and maximum likelihood estimation. We assess the performance of the proposed technique on the recognition of acted emotional speech uttered by both professional narrators and non-professional speakers and show the effectiveness of the technique.
Acoustic Model Training Technique for Speech Recognition Using Style Estimation with Multiple-Regression HMM

IJIMA Yusuke, TACHIBANA Makoto, NOSE Takashi, KOBAYASHI Takao

IEICE technical report　108　(337)　37-42　2008/12/02
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

We propose a technique for emotional speech recognition based on multiple-regression HMM (MRHMM). To achieve emotional speech recognition for an arbitrary speaker with a small amount of training data, we incorporate a speaker and style adaptation technique into speaker-dependent MRHMM-based emotional speech recognition. In the proposed technique, we first adapt the speaker-independent model to target speaker's respective styles with a small amount of speech data. Then, using obtained speaker- and style-adapted HMMs and low-dimensional style control vector for each training style, the regression matrices of MRHMM are estimated based on least square method and maximum likelihood estimation. We assess the performance of the proposed technique on the recognition of acted emotional speech uttered by both professional narrators and non-professional speakers and show the effectiveness of the technique.
An MRHSMM-based voice quality control technique for synthetic speech using speaker adaptation from average voice model

TACHIBANA Makoto, KOUNO Akifumi, NOSE Takashi, KOBAYASHI Takao

IEICE technical report　108　(265)　41-46　2008/10/16
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

This paper describes a technique for controlling voice quality of synthetic speech using multiple-regression hidden semi-Markov model (MRHSMM). To achieve voice quality control with a small amount of training data, we incorporate a speaker adaptation technique from an average voice model into MRHSMM-based voice quality control. In the proposed technique, we first adapt the average voice model to respective training speakers using a small amount of adaptation data. Then, using obtained speaker-adapted HSMMs and low-dimensional voice quality control vector for each training speaker, the regression matrices of MRHSMM are estimated based on least square method and maximum likelihood estimation. We attempt to control voice quality of synthetic speech using 20 speakers' data of 50 sentences for each speaker. From results of subjective evaluation, we show that the proposed technique can control several voice qualities of synthetic speech. Furthermore, we propose model interpolation technique for the MRHSMMs and show its evaluation results.
Recent developments of the HMM-based speech synthesis system (HTS)

ZEN Heiga, OURA Keiichiro, NOSE Takashi, YAMAGISHI Junichi, SAKO Shinji, TODA Tomoki, MASUKO Takashi, BLACK Alan W., TOKUDA Keiichi

IPSJ SIG Notes　2007　(129)　301-306　2007/12/21
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

A statistical parametric speech synthesis approach based on hidden Markov models (HMMs) has grown in popularity over the last few years. In this approach, spectrum, excitation, and duration of speech are simultaneously modeled by context-dependent HMMs, and speech waveforms are generated from the HMMs themselves. Since December 2002, we have publicly released an open-source software toolkit named "HMM-based speech synthesis system (HTS)" to provide a research and development toolkit of statistical parametric speech synthesis. This paper describes recent developments of HTS in detail, as well as future release plans.
Recent developments of the HMM-based speech synthesis system (HTS)

ZEN Heiga, OURA Keiichiro, NOSE Takashi, YAMAGISHI Junichi, SAKO Shinji, TODA Tomoki, MASUKO Takashi, BLACK Alan W., TOKUDA Keiichi

IEICE technical report　107　(406)　301-306　2007/12/13
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

A statistical parametric speech synthesis approach based on hidden Markov models (HMMs) has grown in popularity over the last few years. In this approach, spectrum, excitation, and duration of speech are simultaneously modeled by context-dependent HMMs, and speech waveforms are generated from the HMMs themselves. Since December 2002, we have publicly released an open-source software toolkit named "HMM-based speech synthesis system (HTS)" to provide a research and development toolkit of statistical parametric speech synthesis. This paper describes recent developments of HTS in detail, as well as future release plans.
A Speaker Adaptation Technique Using Average Voice Model for MRHSMM-based Style Control of Synthetic Speech

IZAWA Shinsuke, TACHIBANA Makoto, NOSE Takashi, KOBAYASHI Takao

IEICE technical report　107　(282)　81-86　2007/10/18
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

This paper describes a technique for synthesizing speech with an arbitrary target speaker's voice as well as desired style expressivity. In the conventional MLLR-based speaker adaptation technique for multiple regression hidden semi-Markov model (MRHSMM), the quality of synthesized speech crucially depends on the initial MRHSMM trained from a certain source speaker's data and it is not always possible to synthesize natural sounding speech with any target speaker's voice. To overcome this problem, we propose a technique for simultaneous adaptation of speaker and style from an average voice model. Experimental results show that the proposed technique provides more natural speech than the conventional one with speaker adaptation only.

Show all ︎Show first 5

Books and Other Publications 3

音響キーワードブック

能勢隆

2016/03/22
進化するヒトと機械の音声コミュニケーション

能勢隆

(株)エヌ・ティー・エス　2015/09
Human Machine Interaction - Getting Closer

Ryo Taguchi, Naoto Iwahashi, Kotaro Funakoshi, Mikio Nakano, Takashi Nose, Tsuneo Nitta

2012/01

Research Projects 18

Development of a virtual classmate for assistance of online course

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Scientific Research (B)

Institution: Tohoku University

2021/04/01 - 2026/03/31
Development of a virtual classmate for assistance of online course

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Scientific Research (B)

Institution: Tohoku University

2021/04/01 - 2026/03/31
話者・地域・スタイルモーフィング音声合成による実環境リスニング学習支援

能勢隆, 伊藤彰則

Offer Organization: 日本学術振興会

System: 科学研究費助成事業

Category: 基盤研究(B)

Institution: 東北大学

2022/04/01 - 2025/03/31

More details Close

本研究課題では、「音響工学および音声知覚の観点から、リスニング能力の効率的な向上のための方法論とはなにか?」という学術的問いに対する解を導くため、これまで我々が統計的音声合成、機械学習、対話型英会話学習システムなどの研究により培ってきた個別の要素技術を融合・発展させ、話者・地域・スタイル・訛りといった英語音声の特徴を深層学習に基づくモーフィング技術により段階的にシミュレーション可能な全く新しい実環境リスニング学習支援の実現を目指し、以下の具体的な4項目について検討を行うことを目的とする。(a)多様な話者・地域・スタイルを有する音声コーパスの設計と構築、(b)深層学習に基づくモーフィング音声合成技術の確立、(c)モーフィング音声合成を用いたリスニング学習支援システムの開発、(d)提案システムによる実環境におけるリスニング能力向上の実証実験。2023年度は上記のうち(b)および(c)について話速スタイルの観点から検討を行った。(b)については、Glow-TTSをベースとして話速情報を埋め込むことにより話速および話速に関係するスタイル（話速スタイル）の制御が可能であることを示すとともに、テキストエンコーダの改良により、音声・スタイルの再現性についての改善手法を提案し、その有効性を客観指標により示した。(c)についてはWebベースで利用可能な段階的な話速制御に基づくリスニング学習・評価システムを構築した。(d)については(c)のシステムをクラウドソーシングにおり実際に利用してもらい、従来の話速制御を行わないシステムと比較してリスニング能力が向上することを実験的に示した。
話者・地域・スタイルモーフィング音声合成による実環境リスニング学習支援

能勢隆, 伊藤彰則

Offer Organization: 日本学術振興会

System: 科学研究費助成事業

Category: 基盤研究(B)

Institution: 東北大学

2022/04/01 - 2025/03/31
Research and development of multi-modal interactive English learning system based on deep learning

ITO Akinori

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Scientific Research (A)

Institution: Tohoku University

2017/04/01 - 2021/03/31

More details Close

We developed technologies for an English conversation learning system based on deep learning and created a CALL system for practicing English conversation: (1) We established technology for recognizing English speech spoken by Japanese with high accuracy to improve the accuracy of interfaces for speech, facial expressions, and gestures based on deep learning. (2) To establish English pronunciation evaluation and English conversation simulation technology based on deep learning, we investigated the effects of facial expressions and gestures on English proficiency evaluation. In addition, we established a method to evaluate pronunciation with high accuracy for interactive speech. (3) We integrated the technologies to create a spoken dialogue English conversation learning system.
Research and development of a Japanese pronunciation training system using average voice morphing

NOSE Takashi

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Challenging Exploratory Research

Institution: Tohoku University

2016/04/01 - 2019/03/31

More details Close

In this study, we aim to make a new framework of realizing low cost, convenient, and convincing system for a Japanese pronunciation training for non-native speakers in Japan. Specifically, we used a statistical parametric speech synthesis with an teacher average-voice model trained using multiple teachers' speech, and achieved a more precise labeling of pronunciation scores by using feature substitution technique for phonetic and prosodic parameters of speech. We trained a prediction model of pronunciation scores for phoneme, accent, and rhythm, and achieved an efficient pronunciation training method by predicting non-native speakers' pronunciation scores.
Study on new vocal design focusing on naturally dehumanized singing

MORISE Masanori

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Challenging Exploratory Research

Institution: University of Yamanashi

2016/04/01 - 2018/03/31

More details Close

Vocal design algorithms for approximating the human singing have been proposed with the growth of commercial software such as VOCALOID. On the other hand, there are music contents by the dehumanized singing, and for this purpose, applications such as Auto-Tune are generally used to remove the human-like feature in singing. This study proposes the vocal design algorithm to output the dehumanized and natural singing. In the proposed algorithm, we first proposed the speech analysis/synthesis algorithm. And then, we propose an algorithm for exaggerating several features such as fluctuation of fundamental frequency. We carried out subjective evaluations to verify the effectiveness of the proposed algorithm. The result suggests that the exaggeration can synthesize the singing with a certain level of naturalness and humanness.
Establishment of speech synthesis framework based on Gaussian process regression

Kobayashi Takao, MOUNGSRI Decha, NAGAHAMA Daiki, NOSE Takashi, ARIFIANTO Dhany

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Scientific Research (B)

Institution: Tokyo Institute of Technology

2015/04/01 - 2018/03/31

More details Close

The purpose of the research is to develop a novel statistical parametric speech synthesis framework based on Gaussian process regression (GPR). We have proposed prosody generation techniques including pitch pattern prediction and phone duration prediction as well as the spectral parameter generation technique based on GPR. We developed a GPR-based speech synthesis system and showed its effectiveness through assessment of synthetic speech quality. Furthermore, we examined the proposed framework for generating expressive speech. We also examined it for generating more natural-sounding prosody in speech synthesis of a tonal language.
Research of Human-Kind Dialogue System with Recognition and Synthesis of Various Speech Based on State Estimation

Nose Takashi, MORI Hiroki

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Scientific Research (B)

Institution: Tohoku University

2015/04/01 - 2018/03/31

More details Close

In this research project, we improved and advanced techniques of recognition and synthesis of various speech, and studied a state estimation technique of system users and its applications to realize a dialogue system kind to users. Specifically, (1) We studied the validity of using emotions and a technique for emotion estimation. (2) We proposed and evaluated a sentence selection technique based on extended entropy where phonetic and prosodic contexts are taken into account. (3) We recorded and analyzed dialogue data for willingness estimation. (4) We constructed a large-scale emotional speech corpus that can be used for emotional speech synthesis/recognition and emotion estimation. (5) We proposed and evaluated variance compensation and taylor-made speech synthesis as a technique of synthesizing various and high-quality speech synthesis.
Affect burst: Analysis and synthesis of unconscious exposition of emotion

Mori Hiroki, NAGATA Tomohiro

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Scientific Research (B)

Institution: Utsunomiya University

2014/04/01 - 2018/03/31

More details Close

(1) A multimodal corpus of gaming interaction that easily induces interlocutors' shouts was developed. This corpus contains more than ten times of shouts compared to existing corpora. Analysis of the shouts revealed the acoustical differences to regular words or interjections. (2) A taxonomy of expressive interjections was developed, which enabled the synthesis of the interjections "a" with various forms. A perceptual experiment using the synthesized interjections revealed the relationship between the forms and paralinguistic information. (3) Factors that affect the acoustical properties of laugh calls were identified. Incorporating these factors into the definition of context for the framework of the HMM-based speech synthesis enabled a flexible laughter synthesis. A perceptual experiment revealed the advantage of incorporating these contextual factors with respect to the naturalness of synthesized laughter.
Self-Organized Learning of Speech Recognition and Synthesis Systems

Shinozaki Takahiro, ARAI Takayuki, WATANABA Shinji, DUH Kevin

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Scientific Research (B)

Institution: Tokyo Institute of Technology

2014/04/01 - 2018/03/31

More details Close

The purpose of this study is to make self-standing speech and language information processing systems that can learn from a small amount of labeled and a significant amount of unlabeled speech data as well as can automatically optimize its structure and learning conditions. We have proposed evolution strategy based automation method for neural network-based system development, series of semi-supervised learning methods for statistical speech models, and a reinforcement learning method of speech recognition systems. A high-performance Japanese speech recognition system integrating the research results have been published and widely used.
Development of Easy Japanese composition support system using sentence difficulty estimation and speech synthesis

Ito Akinori, CHIBA Yuya, NAGANO Takeshi

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Scientific Research (B)

Institution: Tohoku University

2014/04/01 - 2017/03/31

More details Close

We conducted development of Easy Japanese composition support system YANSIS and related investigations. We developed a method of automatic estimation of difficulty of a sentence, and investigated relation between intelligibility of Japanese speech listened by non-Japanese-native speakers and speech rate, pause, and speech degradation by reverberation. This investigation revealed the most appropriate speech rate for Easy Japanese speech. In addition, we implemented the function of automatic sentence difficulty estimation and speech synthesizer into YANSIS.
A study of speech synthesis for achieving synthetic speech with high quality and variability based on hybrid approach

NOSE Takashi

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Young Scientists (B)

Institution: Tohoku University

2013/04/01 - 2015/03/31

More details Close

The purpose of this research is to establish hybrid speech synthesis framework that can synthesize human-like speech with various emotional expressions and/or speaking styles using only a limited amount of speech data. We achieved the following six issues in this research. (1) Flexible control of non- or para-linguistic information appearing in synthetic speech. (2) Automatic training of prosodic variations, (3)Expansion to the multi-lingual or cross-lingual speech synthesis, (4)Application to singing voice synthesis, (5) Efficient designing of speech corpus for synthesis, and (6) Improving subjective quality of synthetic speech by modifying the conventional parameter generation method .
Research on speech synthesis using non-parametric modeling based on Gaussian process regression

KOBAYASHI Takao, NOSE Takashi, KORIYAMA Tomoki

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Challenging Exploratory Research

Institution: Tokyo Institute of Technology

2013/04/01 - 2015/03/31

More details Close

The purpose of the research is to develop a framework using non-parametric modeling for synthesizing more natural-sounding speech than the conventional HMM-based statistical parametric speech synthesis framework. The proposed modeling approach is based on Gaussian process regression (GPR) and GPR model is designed for directly predicting frame-level acoustic features from corresponding input linguistic information. We have proposed kernel functions for GPR-based speech synthesis and examined several techniques for computational cost reduction, hyper-parameter optimization, and prosody modeling using Gaussian process classification and GPR.
Research on advanced robust speech synthesis and its applications to multi-lingual speech communication

KOBAYASHI Takao, NOSE Takashi, KORIYAMA Tomoki, ARIFIANTO Dhany

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Scientific Research (B)

Institution: Tokyo Institute of Technology

2012/04/01 - 2015/03/31

More details Close

The purpose of the research is to develop advanced techniques that enable us to model acoustic features of prosodic information as well as spectral information with being less dependent on quality and quantity of training speech data for synthesizing natural-sounding and diverse expressive speech. We have proposed several robust techniques such as style control and prosody modeling ones and showed their effectiveness through objective and subjective evaluation tests. We have also applied the proposed techniques to under-resourced languages. Furthermore, we examined a cross-lingual speech synthesis technique for universal speech communication.
A study on speech diversification techniques based on corpus design for advanced humanoid speech synthesis

NOSE Takashi

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Young Scientists (B)

2011 - 2012

More details Close

Our goal in this research is to realize more human-like, natural text-to-speech system with various emotional expressions and speaking styles, and the achievements of our studies are as follows: (1)We proposed a novel corpus-design technique in which accent, style, and sentence-final expression are taken into account. (2)We incorporated user's subjective emotional intensities into acoustic model training to improve the performance of expressive speech synthesis. (3)We proposed an automatic labeling technique of emphasis expression using a parameter generation technique of fundamental frequency to realize emphatic speech synthesis. (4)We proposed cross-lingual speech synthesis using only a target speaker's native language speech samples to synthesis multi-lingual speech at a low cost.
Research on robust spoken language interfaces for diverse voice variability and expressivity

KOBAYASHI Takao, NAGAHASHI Hiroshi, NOSE Takashi

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Scientific Research (B)

Institution: Tokyo Institute of Technology

2009 - 2011

More details Close

The purpose of the research is to develop techniques that make the human-computer interaction using speech input/output more robust for variations of users' emotional states, speaking styles, preferences, and expressivity. We have proposed techniques using a quantized fundamental frequency prosodic context for robust speech synthesis and an extended context set for spontaneous conversational speech synthesis. We have also proposed techniques for robust speech recognition including extraction of paralinguistic information and rapid model adaptation.
Study on speech synthesis for humanoid spoken dialog system

NOSE Takashi

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Research Activity Start-up

Institution: Tokyo Institute of Technology

2009 - 2010

More details Close

Two novel techniques and an investigation were presented that is key technologies of speech synthesis for the development of humanoid spoken dialog system as follows. (1) Spontaneous speech synthesis based on statistical parametric modeling (2) Speaker-independent voice conversion based on statistical parametric modeling. (3) Investigation of phonetic and prosodic contextual factors in speech synthesis.

Show all Show first 5