Details of the Researcher

PHOTO

Takashi Nose
Section
Graduate School of Engineering
Job title
Associate Professor
Degree
  • 博士(工学)(東京工業大学)

Education 1

  • Tokyo Institute of Technology Graduate School, Division of Integrated Science and Engineering 物理情報システム専攻

    2006/04 - 2009/03

Committee Memberships 5

  • 日本音響学会東北支部 会計幹事

    2024/04 - Present

  • 日本音響学会東北支部 会計監査

    2018/04 - 2020/03

  • 日本音響学会東北支部 庶務幹事

    2016/04 - 2018/03

  • 日本音響学会東北支部 会計幹事

    2014/04 - 2016/03

  • 音声研究会 幹事補佐

    2014/04 - 2016/03

Professional Memberships 5

  • ISCA

  • 情報処理学会

  • 音響学会

  • 電子情報通信学会

  • IEEE

Research Interests 7

  • マルチメディア情報処理

  • 音楽情報処理

  • 音声符号化

  • 音声対話

  • 音声認識

  • 音声合成

  • 音声情報処理

Research Areas 2

  • Informatics / Intelligent robotics /

  • Informatics / Perceptual information processing /

Papers 161

  1. The Development of an Emotional Embodied Conversational Agent and the Evaluation of the Effect of Response Delay on User Impression

    Simon Christophe Jolibois, Akinori Ito, Takashi Nose

    Applied Sciences 2025/04/11

    DOI: 10.3390/app15084256  

  2. Adaptive Fine-Grained Pruning via Binary Search for Efficient Environmental Sound Classification

    Changlong Wang, Akinori Ito, Takashi Nose

    IEEE Access 2025

    DOI: 10.1109/ACCESS.2025.3617879  

  3. Generation of Listening Motion of Embodied Conversational Agents Using Speech and Text Information

    Haruki Ito, Akinori Ito, Takashi Nose

    2025

    DOI: 10.1007/978-3-032-05994-9_10  

  4. Unified model for voice conversion of speech and singing voice using adaptive pitch constraints

    Shogo Fukawa, Takashi Nose, Shuhei Imai, Akinori Ito

    Acoustical Science and Technology 46 (1) 120-123 2025/01/01

    Publisher: Acoustical Society of Japan

    DOI: 10.1250/ast.e24.47  

    ISSN: 1346-3969

    eISSN: 1347-5177

  5. We open our mouths when we are silent

    Shoki Kawanishi, Yuya Chiba, Akinori Ito, Takashi Nose

    Acoustical Science and Technology 46 (1) 96-99 2025/01/01

    Publisher: Acoustical Society of Japan

    DOI: 10.1250/ast.e24.21  

    ISSN: 1346-3969

    eISSN: 1347-5177

  6. Selection of key sentences from lecture video transcription and its application to feedback to the learner

    Miki Takeuchi, Akinori Ito, Takashi Nose

    Proceedings of the 2024 8th International Conference on Education and Multimedia Technology 218-223 2024/06/22

    Publisher: ACM

    DOI: 10.1145/3678726.3678733  

  7. Character Expressions in Meta-Learning for Extremely Low Resource Language Speech Recognition

    Rui Zhou, Akinori Ito, Takashi Nose

    Proceedings of the 2024 16th International Conference on Machine Learning and Computing 2024/02/02

    Publisher: ACM

    DOI: 10.1145/3651671.3651730  

  8. Evaluation of Environmental Sound Classification using Vision Transformer

    Changlong Wang, Akinori Ito, Takashi Nose, Chia-Ping Chen

    Proceedings of the 2024 16th International Conference on Machine Learning and Computing 665-669 2024/02/02

    Publisher: ACM

    DOI: 10.1145/3651671.3651733  

  9. Toward Photo-Realistic Facial Animation Generation Based on Keypoint Features

    Zikai Shu, Takashi Nose, Akinori Ito

    Proceedings of the 2024 16th International Conference on Machine Learning and Computing 39 334-339 2024/02/02

    Publisher: ACM

    DOI: 10.1145/3651671.3651731  

  10. Scheduled Curiosity-Deep Dyna-Q: Efficient Exploration for Dialog Policy Learning

    Niu, X., Ito, A., Nose, T.

    IEEE Access 12 2024/01/31

    DOI: 10.1109/ACCESS.2024.3376418  

    ISSN: 2169-3536

  11. Simultaneous Adaptation of Acoustic and Language Models for Emotional Speech Recognition Using Tweet Data

    Kosaka, T., Saeki, K., Aizawa, Y., Kato, M., Nose, T.

    IEICE Transactions on Information and Systems E107.D (3) 2024

    DOI: 10.1587/transinf.2023HCP0010  

    ISSN: 1745-1361 0916-8532

  12. A Replaceable Curiosity-Driven Candidate Agent Exploration Approach for Task-Oriented Dialog Policy Learning

    Niu, X., Ito, A., Nose, T.

    IEEE Access 12 2024

    DOI: 10.1109/ACCESS.2024.3462719  

    ISSN: 2169-3536

  13. Multilingual Meta-Transfer Learning for Low-Resource Speech Recognition

    Zhou, R., Koshikawa, T., Ito, A., Nose, T., Chen, C.-P.

    IEEE Access 2024

    DOI: 10.1109/ACCESS.2024.3486711  

    ISSN: 2169-3536

  14. Fast end-to-end non-parallel voice conversion based on speaker-adaptive neural vocoder with cycle-consistent learning

    Shuhei Imai, Aoi Kanagaki, Takashi Nose, Shogo Fukawa, Akinori Ito

    Acoustical Science and Technology 2024

    Publisher: Acoustical Society of Japan

    DOI: 10.1250/ast.e24.46  

    ISSN: 1346-3969

    eISSN: 1347-5177

  15. Multimodal Expressive Embodied Conversational Agent Design

    Simon Jolibois, Akinori Ito, Takashi Nose

    Communications in Computer and Information Science 244-249 2023/07/09

    Publisher: Springer Nature Switzerland

    DOI: 10.1007/978-3-031-35989-7_31  

    ISSN: 1865-0929

    eISSN: 1865-0937

  16. Effect of Data Size and Machine Translation on the Accuracy of Automatic Personality Classification

    Yuki Fukazawa, Akinori Ito, Takashi Nose

    Advances in Intelligent Information Hiding and Multimedia Signal Processing 405-413 2023/05/24

    Publisher: Springer Nature Singapore

    DOI: 10.1007/978-981-99-0105-0_36  

    ISSN: 2190-3018

    eISSN: 2190-3026

  17. Spoken term detection from utterances of minority languages

    Ito, A., Mizuochi, S., Nose, T.

    Issues in Japanese Psycholinguistics from Comparative Perspectives: Volume 1: Cross-Linguistic Studies 2023

    DOI: 10.1515/9783110778946-014  

  18. Response Sentence Modification Using a Sentence Vector for a Flexible Response Generation of Retrieval-based Dialogue Systems

    Ryota Yahagi, Akinori Ito, Takashi Nose, Yuya Chiba

    2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2022/11/07

    Publisher: IEEE

    DOI: 10.23919/apsipaasc55919.2022.9979841  

  19. Design and Construction of Japanese Multimodal Utterance Corpus with Improved Emotion Balance and Naturalness

    Daisuke Horii, Akinori Ito, Takashi Nose

    2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2022/11/07

    Publisher: IEEE

    DOI: 10.23919/apsipaasc55919.2022.9980272  

  20. Multimodal Dialogue Response Timing Estimation Using Dialogue Context Encoder

    Ryota Yahagi, Yuya Chiba, Takashi Nose, Akinori Ito

    Lecture Notes in Electrical Engineering 133-141 2022/11/01

    Publisher: Springer Nature Singapore

    DOI: 10.1007/978-981-19-5538-9_9  

    ISSN: 1876-1100

    eISSN: 1876-1119

  21. Spoken Term Detection of Zero-Resource Language Using Posteriorgram of Multiple Languages

    Satoru MIZUOCHI, Takashi NOSE, Akinori ITO

    Interdisciplinary Information Sciences 28 (1) 1-13 2022

    Publisher: Graduate School of Information Sciences, Tohoku University

    DOI: 10.4036/iis.2022.a.04  

    ISSN: 1340-9050

    eISSN: 1347-6157

  22. Analysis of Feature Extraction by Convolutional Neural Network for Speech Emotion Recognition

    Daisuke Horii, Akinori Ito, Takashi Nose

    2021 IEEE 10th Global Conference on Consumer Electronics (GCCE) 2021/10/12

    Publisher: IEEE

    DOI: 10.1109/gcce53005.2021.9621964  

  23. Improvement of Automatic English Pronunciation Assessment with Small Number of Utterances Using Sentence Speakability

    Satsuki Naijo, Akinori Ito, Takashi Nose

    Interspeech 2021 2021/08/30

    Publisher: ISCA

    DOI: 10.21437/interspeech.2021-1132  

  24. Neural Spoken-Response Generation Using Prosodic and Linguistic Context for Conversational Systems

    Yoshihiro Yamazaki, Yuya Chiba, Takashi Nose, Akinori Ito

    Interspeech 2021 2021/08/30

    Publisher: ISCA

    DOI: 10.21437/interspeech.2021-381  

  25. SMOC corpus: A large-scale Japanese spontaneous multimodal one-on-one chat-talk corpus for dialog systems

    Yoshihiro Yamazaki, Yuya Chiba, Takashi Nose, Akinori Ito

    Acoustical Science and Technology 42 (4) 210-213 2021/07/01

    Publisher: Acoustical Society of Japan

    DOI: 10.1250/ast.42.210  

    ISSN: 1346-3969

    eISSN: 1347-5177

  26. CycleGAN-Based High-Quality Non-Parallel Voice Conversion with Spectrogram and WaveRNN

    Aoi Kanagaki, Masaya Tanaka, Takashi Nose, Ryohei Shimizu, Akira Ito, Akinori Ito

    2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020 356-357 2020/10/13

    DOI: 10.1109/GCCE50665.2020.9291952  

  27. Incremental response generation using prefix-to-prefix model for dialogue system

    Ryota Yahagi, Yuya Chiba, Takashi Nose, Akinori Ito

    2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020 349-350 2020/10/13

    DOI: 10.1109/GCCE50665.2020.9291883  

  28. A study on minimum spectral error analysis of speech

    Takuma Hayasaka, Takashi Nose, Akinori Ito

    2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020 362-363 2020/10/13

    DOI: 10.1109/GCCE50665.2020.9291840  

  29. Filler prediction based on bidirectional LSTM for generation of natural response of spoken dialog

    Yoshihiro Yamazaki, Yuya Chiba, Takashi Nose, Akinori Ito

    2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020 360-361 2020/10/13

    DOI: 10.1109/GCCE50665.2020.9291867  

  30. Successive Japanese lyrics generation based on encoder-decoder model

    Rikiya Takahashi, Takashi Nose, Yuya Chiba, Akinori Ito

    2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020 126-127 2020/10/13

    DOI: 10.1109/GCCE50665.2020.9291718  

  31. Analysis and Estimation of Sentence Speakability for English Pronunciation Evaluation

    Satsuki Naijo, Yuya Chiba, Takashi Nose, Akinori Ito

    2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020 353-355 2020/10/13

    DOI: 10.1109/GCCE50665.2020.9292072  

  32. LJSing: large-scale singing voice corpus of single Japanese singer

    Takuto Fujimura, Takashi Nose, Akinori Ito

    2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020 364-365 2020/10/13

    DOI: 10.1109/GCCE50665.2020.9291704  

  33. Improving Pronunciation Clarity of Dysarthric Speech Using CycleGAN with Multiple Speakers

    Shuhei Imai, Takashi Nose, Aoi Kanagaki, Satoshi Watanabe, Akinori Ito

    2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020 366-367 2020/10/13

    DOI: 10.1109/GCCE50665.2020.9292041  

  34. Spoken term detection based on acoustic models trained in multiple languages for zero-resource language

    Satoru Mizuochi, Yuya Chiba, Takashi Nose, Akinori Ito

    2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020 351-352 2020/10/13

    DOI: 10.1109/GCCE50665.2020.9291761  

  35. Integration of accent sandhi and prosodic features estimation for japanese text-to-speech synthesis

    Daisuke Fujimaki, Takashi Nose, Akinori Ito

    2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020 358-359 2020/10/13

    DOI: 10.1109/GCCE50665.2020.9291906  

  36. Language modeling in speech recognition for grammatical error detection based on neural machine translation

    Jiang Fu, Yuya Chiba, Takashi Nose, Akinori Ito

    Acoustical Science and Technology 41 (5) 788-791 2020/09/01

    Publisher: Acoustical Society of Japan

    DOI: 10.1250/ast.41.788  

    ISSN: 1346-3969

    eISSN: 1347-5177

  37. Scyclone: High-Quality and Parallel-Data-Free Voice Conversion Using Spectrogram and Cycle-Consistent Adversarial Networks

    Masaya Tanaka, Takashi Nose, Aoi Kanagaki, Ryohei Shimizu, Akira Ito

    2020/05/07

    More details Close

    This paper proposes Scyclone, a high-quality voice conversion (VC) technique without parallel data training. Scyclone improves speech naturalness and speaker similarity of the converted speech by introducing CycleGAN-based spectrogram conversion with a simplified WaveRNN-based vocoder. In Scyclone, a linear spectrogram is used as the conversion features instead of vocoder parameters, which avoids quality degradation due to extraction errors in fundamental frequency and voiced/unvoiced parameters. The spectrogram of source and target speakers are modeled by modified CycleGAN networks, and the waveform is reconstructed using the simplified WaveRNN with a single Gaussian probability density function. The subjective experiments with completely unpaired training data show that Scyclone is significantly better than CycleGAN-VC2, one of the existing state-of-the-art parallel-data-free VC techniques.

  38. Automatic assessment of English proficiency for Japanese learners without reference sentences based on deep neural network acoustic models

    Jiang Fu, Yuya Chiba, Takashi Nose, Akinori Ito

    Speech Communication 116 86-97 2020/01

    DOI: 10.1016/j.specom.2019.12.002  

    ISSN: 0167-6393

  39. A symbol-level melody completion based on a convolutional neural network with generative adversarial learning

    Kosuke Nakamura, Takashi Nose, Yuya Chiba, Akinori Ito

    Journal of Information Processing 28 248-257 2020

    DOI: 10.2197/ipsjjip.28.248  

    ISSN: 0387-5806

    eISSN: 1882-6652

  40. Construction and analysis of a multimodal chat-talk corpus for dialog systems considering interpersonal closeness

    Yoshihiro Yamazaki, Yuya Chiba, Takashi Nose, Akinori Ito

    LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings 443-448 2020

  41. Multi-stream attention-based BLSTM with feature segmentation for speech emotion recognition

    Yuya Chiba, Takashi Nose, Akinori Ito

    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2020-October 3301-3305 2020

    DOI: 10.21437/Interspeech.2020-1199  

    ISSN: 2308-457X

    eISSN: 1990-9772

  42. Developing a Multi-Platform Speech Recording System Toward Open Service of Building Large-Scale Speech Corpora

    Keita Ishizuka, Takashi Nose

    2019/12/19

    More details Close

    This paper briefly reports our ongoing attempt at the development of a multi-platform browser-based speech recording system. We designed the system toward a service of providing open service of building large-scale speech corpora at a low-cost for any researchers and developers related to speech processing. The recent increase in the use of crowdsourcing services, e.g., Amazon Mechanical Turk, enable us to reduce the cost of collecting speakers in the web, and there have been many attempts to develop the automated speech collecting platforms or application that is designed for the use the crowdsourcing. However, one of the major problems in the previous studies and developments for the attempts is that most of the systems are not a form of common service of speech recording and corpus building, and each corpus builder is necessary to develop the system in their own environment including a web server. For this problem, we develope a new platform where both the corpus builders and recording participants can commonly use a single system and service by creating their user accounts. A brief introduction of the system is given in this paper as the start of this challenge.

  43. Improving human scoring of prosody using parametric speech synthesis Peer-reviewed

    Prafianto, H., Nose, T., Chiba, Y., Ito, A.

    Speech Communication 111 14 2019/08

    Publisher: Elsevier {BV}

    DOI: 10.1016/j.specom.2019.06.001  

    ISSN: 0167-6393

  44. Multi-condition training for noise-robust speech emotion recognition

    Yuya Chiba, Takashi Nose, Akinori Ito

    Acoustical Science and Technology 40 (6) 406-409 2019

    DOI: 10.1250/ast.40.406  

    ISSN: 1346-3969

    eISSN: 1347-5177

  45. Evaluation of English Speech Recognition for Japanese Learners Using DNN-Based Acoustic Models Peer-reviewed

    Jiang Fu, Yuya Chiba, Takashi Nose, Akinori Ito

    Smart Innovation, Systems and Technologies 110 93-100 2019/01

  46. Comparison of Speech Recognition Performance Between Kaldi and Google Cloud Speech API Peer-reviewed

    Takashi Kimura, Takashi Nose, Shinji Hirooka, Yuya Chiba, Akinori Ito

    Smart Innovation, Systems and Technologies 110 109-115 2019/01

  47. Segmental Pitch Control Using Speech Input Based on Differential Contexts and Features for Customizable Neural Speech Synthesis Peer-reviewed

    Shinya Hanabusa, Takashi Nose, Akinori Ito

    Smart Innovation, Systems and Technologies 110 124-131 2019/01

  48. Melody Completion Based on Convolutional Neural Networks and Generative Adversarial Learning Peer-reviewed

    Kosuke Nakamura, Takashi Nose, Yuya Chiba, Akinori Ito

    Smart Innovation, Systems and Technologies 110 116-123 2019/01

  49. Two-Stage Sequence-to-Sequence Neural Voice Conversion with Low-to-High Definition Spectrogram Mapping Peer-reviewed

    Sou Miyamoto, Takashi Nose, Kazuyuki Hiroshiba, Yuri Odagiri, Akinori Ito

    Smart Innovation, Systems and Technologies 110 132-139 2019/01

  50. DNN-Based Talking Movie Generation with Face Direction Consideration Peer-reviewed

    Toru Ishikawa, Takashi Nose, Akinori Ito

    Smart Innovation, Systems and Technologies 110 157-164 2019/01

  51. A Study on a Spoken Dialogue System with Cooperative Emotional Speech Synthesis Using Acoustic and Linguistic Information Peer-reviewed

    Mai Yamanaka, Yuya Chiba, Takashi Nose, Akinori Ito

    Smart Innovation, Systems and Technologies 110 101-108 2019/01

  52. Improvement of accent sandhi rules based on Japanese accent dictionaries Peer-reviewed

    Hiroto Aoyama, Takashi Nose, Yuya Chiba, Akinori Ito

    Smart Innovation, Systems and Technologies 110 140-148 2019/01

    DOI: 10.1007/978-3-030-03748-2_17  

    ISSN: 2190-3018

  53. Data collection and analysis for automatically generating record of human behaviors by environmental sound recognition Peer-reviewed

    Takahiro Furuya, Yuya Chiba, Takashi Nose, Akinori Ito

    Smart Innovation, Systems and Technologies 110 149-156 2019/01/01

    DOI: 10.1007/978-3-030-03748-2_18  

    ISSN: 2190-3018

  54. Effect of mutual self-disclosure in spoken dialog system on user impression Peer-reviewed

    Shunsuke Tada, Yuya Chiba, Takashi Nose, Akinori Ito

    Proceedings of 2018 APSIPA-ASC 806-810 2018/11

  55. Improving User Impression in Spoken Dialog System with Gradual Speech Form Control. Peer-reviewed

    Yukiko Kageyama, Yuya Chiba, Takashi Nose, Akinori Ito

    Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, Melbourne, Australia, July 12-14, 2018 235-240 2018/07

    Publisher: Association for Computational Linguistics

  56. An Analysis of the Effect of Emotional Speech Synthesis on Non-Task-Oriented Dialogue System. Peer-reviewed

    Yuya Chiba, Takashi Nose, Taketo Kase, Mai Yamanaka, Akinori Ito

    Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, Melbourne, Australia, July 12-14, 2018 371-375 2018/07

    Publisher: Association for Computational Linguistics

  57. Analyses of example sentences collected by conversation for example-based non-task-oriented dialog system Peer-reviewed

    Kageyama, Y., Chiba, Y., Nose, T., Ito, A.

    IAENG International Journal of Computer Science 45 (2) 285-293 2018/05

    ISSN: 1819-9224 1819-656X

  58. Analyzing effect of physical expression on English proficiency for multimodal computer-assisted language learning Peer-reviewed

    Haoran Wu, Yuya Chiba, Takashi Nose, Akinori Ito

    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2018-September 1746-1750 2018/01/01

    Publisher: ISCA

    DOI: 10.21437/Interspeech.2018-1425  

    ISSN: 2308-457X

  59. Analysis of preferred speaking rate and pause in spoken Easy Japanese for non-native listeners Peer-reviewed

    Hafiyan Prafiyanto, Takashi Nose, Yuya Chiba, Akinori Ito

    Acoustical Science and Technology 39 92-100 2018/01/01

    DOI: 10.1250/ast.39.92  

    ISSN: 1346-3969

  60. Dialog-based interactive movie recommendation: Comparison of dialog strategies Peer-reviewed

    Hayato Mori, Yuya Chiba, Takashi Nose, Akinori Ito

    Smart Innovation, Systems and Technologies 82 77-83 2018

    Publisher: Springer Science and Business Media Deutschland GmbH

    DOI: 10.1007/978-3-319-63859-1_10  

    ISSN: 2190-3026 2190-3018

    eISSN: 2190-3026

  61. A study on 2D photo-realistic facial animation generation using 3D facial feature points and deep neural networks Peer-reviewed

    Kazuki Sato, Takashi Nose, Akira Ito, Yuya Chiba, Akinori Ito, Takahiro Shinozaki

    Smart Innovation, Systems and Technologies 82 113-118 2018

    Publisher: Springer Science and Business Media Deutschland GmbH

    DOI: 10.1007/978-3-319-63859-1_15  

    ISSN: 2190-3026 2190-3018

  62. Voice conversion from arbitrary speakers based on deep neural networks with adversarial learning Peer-reviewed

    Sou Miyamoto, Takashi Nose, Suzunosuke Ito, Harunori Koike, Yuya Chiba, Akinori Ito, Takahiro Shinozaki

    Smart Innovation, Systems and Technologies 82 97-103 2018

    Publisher: Springer Science and Business Media Deutschland GmbH

    DOI: 10.1007/978-3-319-63859-1_13  

    ISSN: 2190-3026 2190-3018

    eISSN: 2190-3026

  63. Response selection of interview-based dialog system using user focus and semantic orientation Peer-reviewed

    Shunsuke Tada, Yuya Chiba, Takashi Nose, Akinori Ito

    Smart Innovation, Systems and Technologies 82 84-90 2018

    Publisher: Springer Science and Business Media Deutschland GmbH

    DOI: 10.1007/978-3-319-63859-1_11  

    ISSN: 2190-3026 2190-3018

    eISSN: 2190-3026

  64. Development and evaluation of julius-compatible interface for Kaldi ASR Peer-reviewed

    Yusuke Yamada, Takashi Nose, Yuya Chiba, Akinori Ito, Takahiro Shinozaki

    Smart Innovation, Systems and Technologies 82 91-96 2018

    Publisher: Springer Science and Business Media Deutschland GmbH

    DOI: 10.1007/978-3-319-63859-1_12  

    ISSN: 2190-3026 2190-3018

    eISSN: 2190-3026

  65. Detection of singing mistakes from singing voice Peer-reviewed

    Isao Miyagawa, Yuya Chiba, Takashi Nose, Akinori Ito

    Smart Innovation, Systems and Technologies 82 130-136 2018

    Publisher: Springer Science and Business Media Deutschland GmbH

    DOI: 10.1007/978-3-319-63859-1_17  

    ISSN: 2190-3026 2190-3018

    eISSN: 2190-3026

  66. Evaluation of nonlinear tempo modification methods based on sinusoidal modeling Peer-reviewed

    Kosuke Nakamura, Yuya Chiba, Takashi Nose, Akinori Ito

    Smart Innovation, Systems and Technologies 82 104-111 2018

    Publisher: Springer Science and Business Media Deutschland GmbH

    DOI: 10.1007/978-3-319-63859-1_14  

    ISSN: 2190-3026 2190-3018

    eISSN: 2190-3026

  67. Analysis of Efficient Multimodal Features for Estimating User’s Willingness to Talk: Comparison of Human-Machine and Human-Human Dialog Peer-reviewed

    Yuya Chiba, Takashi Nose, Akinori Ito

    Proceeding of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2018-February 1-4 2017/12/13

    DOI: 10.1109/APSIPA.2017.8282069  

  68. HMM-Based Photo-Realistic Talking Face Synthesis Using Facial Expression Parameter Mapping with Deep Neural Networks Peer-reviewed

    Kazuki Sato, Takashi Nose, Akinori Ito

    Journal of Computer and Communications 5 (10) 55-65 2017/08

    DOI: 10.4236/jcc.2017.510006  

  69. 日常音識別による活動記録自動生成のためのデータの収集と分析

    古谷崇拓, 千葉祐弥, 能勢隆, 伊藤彰則

    情報処理学会研究報告 1-6 2017/06/17

  70. Cluster-based approach to discriminate the user's state whether a user is embarrassed or thinking to an answer to a prompt Peer-reviewed

    Yuya Chiba, Takashi Nose, Akinori Ito

    JOURNAL ON MULTIMODAL USER INTERFACES 11 (2) 185-196 2017/06

    DOI: 10.1007/s12193-017-0238-y  

    ISSN: 1783-7677

    eISSN: 1783-8738

  71. Sentence Selection Based on Extended Entropy Using Phonetic and Prosodic Contexts for Statistical Parametric Speech Synthesis Peer-reviewed

    Takashi Nose, Yusuke Arao, Takao Kobayashi, Komei Sugiura, Yoshinori Shiga

    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 25 (5) 1107-1116 2017/05

    DOI: 10.1109/TASLP.2017.2688585  

    ISSN: 2329-9290

    eISSN: 2329-9304

  72. Dimensional paralinguistic information control based on multiple-regression HSMM for spontaneous dialogue speech synthesis with robust parameter estimation Peer-reviewed

    Tomohiro Nagata, Hiroki Mori, Takashi Nose

    SPEECH COMMUNICATION 88 137-148 2017/04

    DOI: 10.1016/j.specom.2017.01.002  

    ISSN: 0167-6393

    eISSN: 1872-7182

  73. A Study on Tailor-Made Speech Synthesis Based on Deep Neural Networks Peer-reviewed

    Shuhei Yamada, Takashi Nose, Akinori Ito

    ADVANCES IN INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, VOL 1 63 159-166 2017

    DOI: 10.1007/978-3-319-50209-0_20  

    ISSN: 2190-3018

  74. Synthesis of Photo-Realistic Facial Animation from Text Based on HMM and DNN with Animation Unit Peer-reviewed

    Kazuki Sato, Takashi Nose, Akinori Ito

    ADVANCES IN INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, VOL 2 64 29-36 2017

    DOI: 10.1007/978-3-319-50212-0_4  

    ISSN: 2190-3018

  75. Development of an Easy Japanese Writing Support System with Text-to-Speech Function Peer-reviewed

    Takeshi Nagano, Hafiyan Prafianto, Takashi Nose, Akinori Ito

    ADVANCES IN INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, VOL 2 64 221-228 2017

    DOI: 10.1007/978-3-319-50212-0_27  

    ISSN: 2190-3018

  76. クロスリンガル音声合成のための共有決定木コンテクストクラスタリングを用いた話者適応 Peer-reviewed

    長濱大樹, 能勢隆, 郡山知樹, 小林隆夫

    電子情報通信学会論文誌D J100-D (3) 385-393 2017

  77. 統計モデルに基づく多様な音声の合成技術 Peer-reviewed

    能勢隆

    電子情報通信学会論文誌D J100-D (4) 556-569 2017

  78. Collection of example sentences for non-task-oriented dialog using a spoken dialog system and comparison with hand-crafted DB Peer-reviewed

    Yukiko Kageyama, Yuya Chiba, Takashi Nose, Akinori Ito

    Communications in Computer and Information Science 713 458-464 2017

    Publisher: Springer Verlag

    DOI: 10.1007/978-3-319-58750-9_63  

    ISSN: 1865-0929

  79. Construction and analysis of phonetically and prosodically balanced emotional speech database Peer-reviewed

    Takeishi, E, Nose, T, Chiba, Y, Ito, A

    2016 Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques, O-COCOSDA 2016 16-21 2016/10

    DOI: 10.1109/ICSDA.2016.7918977  

  80. Efficient Implementation of Global Variance Compensation for Parametric Speech Synthesis Peer-reviewed

    Takashi Nose

    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 24 (10) 1694-1704 2016/10

    DOI: 10.1109/TASLP.2016.2580298  

    ISSN: 2329-9290

  81. Estimating the user's state before exchanging utterances using intermediate acoustic features for spoken dialog systems Peer-reviewed

    Chiba, Y., Nose, T., Ito, M., Ito, A.

    IAENG International Journal of Computer Science 43 (1) 1-9 2016/02/29

    ISSN: 1819-9224 1819-656X

  82. A PRECISE EVALUATION METHOD OF PROSODIC QUALITY OF NON-NATIVE SPEAKERS USING AVERAGE VOICE AND PROSODY SUBSTITUTION Peer-reviewed

    Hafiyan Prafianto, Takashi Nose, Akinori Ito

    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP) 208-212 2016

    DOI: 10.1109/ICALIP.2016.7846620  

  83. DNNを利用したAnimation Unitの変換に基づく顔画像変換の検討 Peer-reviewed

    齋藤優貴, 能勢隆, 伊藤彰則

    電子情報通信学会論文誌 J199-D (11) 1112-1115 2016

  84. Prosodically rich speech synthesis interface using limited data of celebrity voice Peer-reviewed

    Takashi Nose, Taiki Kamei

    Journal of Computer and Communications 4 (16) 79-94 2016

  85. 発話状態推定に基づく協調的感情音声合成による音声対話システムの評価 Peer-reviewed

    加瀬嵩人, 能勢隆, 千葉祐弥, 伊藤彰則

    電子情報通信学会論文誌 J199-A (1) 25-35 2016/01

  86. Investigation of Pause Insertion Effect in Spoken Easy Japanese for Non-Native Listeners Peer-reviewed

    Hafiyan Prafianto, Takeshi Nagano, Takashi Nose, Akinori Ito

    Proceedings of 12th Western Pacific Acoustics Conference 507-511 2015/12/08

  87. Automatic Generation of Proper Noun Entries in a Speech Recognizer for Local Information Recognition Peer-reviewed

    Kenta Shiga, Takashi Nose, Akinori Ito, Ryo Masumura, Hirokazu Masataki

    Proceedings of 12th Western Pacific Acoustics Conference 2015/12/08

  88. Real-time talking avatar on the internet using Kinect and voice conversion Peer-reviewed

    Takashi Nose, Yuki Igarashi

    International Journal of Advanced Computer Science and Applications 6 (12) 301-307 2015/12

  89. A Computer-Assisted English Conversation Training System for Response-Timing-Aware Oral Conversation Exercise Peer-reviewed

    Naoto Suzuki, Yutaka Hiroi, Yuya Chiba, Takashi Nose, Akinori Ito

    情報処理学会論文誌 56 (11) 2177-2189 2015/11/01

  90. HMM-based expressive singing voice synthesis with singing style control and robust pitch modeling Peer-reviewed

    Takashi Nose, Misa Kanemoto, Tomoki Koriyama, Takao Kobayashi

    COMPUTER SPEECH AND LANGUAGE 34 (1) 308-322 2015/11

    DOI: 10.1016/j.csl.2015.04.001  

    ISSN: 0885-2308

    eISSN: 1095-8363

  91. Conversion of Speaker's Face Image Using PCA and Animation Unit for Video Chatting Peer-reviewed

    Saito, Y, Nose, T, Shinozaki, T, Ito, A

    Proceedings - 2015 International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2015 433-436 2015/09/25

    Publisher: IEEE

    DOI: 10.1109/IIH-MSP.2015.85  

  92. Tempo Modification of Mixed Music Signal by Nonlinear Time Scaling and Sinusoidal Modeling Peer-reviewed

    Nishino, T, Nose, T, Ito, A

    Proceedings - 2015 International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2015 146-149 2015/09/24

    Publisher: IEEE

    DOI: 10.1109/IIH-MSP.2015.86  

  93. Entropy-based sentence selection for speech synthesis using phonetic and prosodic contexts Peer-reviewed

    Takashi Nose, Yusuke Arao, Takao Kobayashi, Komei Sugiura, Yoshinori Shiga, Akinori Ito

    Proceedings of 16th Annual Conference of the International Speech Communication Association 3491-3495 2015/09/10

  94. On appropriateness and estimation of the emotion of synthesized response speech in a spoken dialogue system Peer-reviewed

    Taketo Kase, Takashi Nose, Akinori Ito

    Communications in Computer and Information Science 528 747-752 2015/01/01

    DOI: 10.1007/978-3-319-21380-4_126  

    ISSN: 1865-0929

  95. Statistical Parametric Speech Synthesis Based on Gaussian Process Regression Peer-reviewed

    Tomoki Koriyama, Takashi Nose, Takao Kobayashi

    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING 8 (2) 173-183 2014/04

    DOI: 10.1109/JSTSP.2013.2283461  

    ISSN: 1932-4553

    eISSN: 1941-0484

  96. A Parameter Generation Algorithm Using Local Variance for HMM-Based Speech Synthesis Peer-reviewed

    Takashi Nose, Vataya Chunwijitra, Takao Kobayashi

    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING 8 (2) 221-228 2014/04

    DOI: 10.1109/JSTSP.2013.2283459  

    ISSN: 1932-4553

    eISSN: 1941-0484

  97. Prosodic variation enhancement using unsupervised context labeling for HMM-based expressive speech synthesis Peer-reviewed

    Yu Maeno, Takashi Nose, Takao Kobayashi, Tomoki Koriyama, Yusuke Ijima, Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka

    SPEECH COMMUNICATION 57 144-154 2014/02

    DOI: 10.1016/j.specom.2013.09.014  

    ISSN: 0167-6393

    eISSN: 1872-7182

  98. PARAMETRIC SPEECH SYNTHESIS USING LOCAL AND GLOBAL SPARSE GAUSSIAN PROCESSES Peer-reviewed

    Tomoki Koriyama, Takashi Nose, Takao Kobayashi

    2014 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP) 2014

    ISSN: 2161-0363

  99. Speech Recognition in a Home Environment Using Parallel Decoding with GMM-Based Noise Modeling Peer-reviewed

    Kohei Machida, Takashi Nose, Akinori Ito

    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) 2014

    DOI: 10.1109/APSIPA.2014.7041622  

  100. PARAMETRIC SPEECH SYNTHESIS BASED ON GAUSSIAN PROCESS REGRESSION USING GLOBAL VARIANCE AND HYPERPARAMETER OPTIMIZATION Peer-reviewed

    Tomoki Koriyama, Takashi Nose, Takao Kobayashi

    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) 3862-3866 2014

    DOI: 10.1109/ICASSP.2014.6854319  

    ISSN: 1520-6149

  101. Tone modeling using stress information for HMM-based Thai speech synthesis Peer-reviewed

    Decha Moungsri, Tomoki Koriyama, Tashi Nose, Takao Kobayashi

    Proceedings of the 7th International Conference on Speech Prosody 1057-1061 2014

  102. Controlling Switching Pause Using an AR Agent for Interactive CALL System Peer-reviewed

    Naoto Suzuki, Takashi Nose, Akinori Ito, Yutaka Hiroi

    Communications in Computer and Information Science 435 588-593 2014

    Publisher: Springer Verlag

    DOI: 10.1007/978-3-319-07854-0_102  

    ISSN: 1865-0929

  103. Subjective Evaluation of Packet Loss RecoveryTechniques for Voice over IP Peer-reviewed

    Masahito Okamoto, Takashi Nose, Akinori Ito, Takeshi Nagano

    2014 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), VOLS 1-2 711-714 2014

    DOI: 10.1109/ICALIP.2014.7009887  

  104. A Study on the Effect of Speech Rate on Perception of Spoken Easy Japanese Using Speech Synthesis Peer-reviewed

    Hafiyan Prafianto, Takashi Nose, Yuya Chiba, Akinori Ito, Kazuyuki Sato

    2014 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), VOLS 1-2 476-479 2014

    DOI: 10.1109/ICALIP.2014.7009839  

  105. Robot: Have I Done Something Wrong? -Analysis of Prosodic Features of Speech Commands under the Robot's Unintended Behavior- Peer-reviewed

    Noriko Totsuka, Yuya Chiba, Takashi Nose, Akinori Ito

    2014 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), VOLS 1-2 887-890 2014

    DOI: 10.1109/ICALIP.2014.7009922  

  106. Tempo modification of music signal using sinusoidal model and LPC-based residue model Peer-reviewed

    Akinori Ito, Yuki Igarashi, Masashi Ito, Takashi Nose

    Proceedings of the 21st International Congress on Sound and Vibration 1 1-8 2014

  107. User modeling by using bag-of-behaviors for building a dialog system sensitive to the interlocutor's internal state Peer-reviewed

    Yuya Chiba, Masashi Ito, Takashi Nose, Akinori Ito

    Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue 74-78 2014

  108. Quantized F0 Context and Its Applications to Speech Synthesis, Speech Coding and Voice Conversion Peer-reviewed

    Takashi Nose, Takao Kobayashi

    2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014) 578-581 2014

    DOI: 10.1109/IIH-MSP.2014.149  

  109. Analysis of English pronunciation of singing voices sung by Japanese speakers Peer-reviewed

    Kazumichi Yoshida, Takashi Nose, Akinori Ito

    2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014) 554-557 2014

    DOI: 10.1109/IIH-MSP.2014.143  

  110. Transform Mapping Using Shared Decision Tree Context Clustering for HMM-Based Cross-Lingual Speech Synthesis Peer-reviewed

    Daiki Nagahama, Takashi Nose, Tomoki Koriyama, Takao Kobayashi

    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4 770-774 2014

    ISSN: 2308-457X

  111. Accent type and phrase boundary estimation using acoustic and language models for automatic prosodic labeling Peer-reviewed

    Tomoki Koriyama, Hiroshi Suzuki, Takashi Nose, Takahiro Shinozaki, Akinori Ito

    Proceedings of 15th Annual Conference of the International Speech Communication Association 2337-2341 2014

  112. Analysis of spectral enhancement using global variance in HMM-based speech synthesis Peer-reviewed

    Takashi Nose, Akinori Ito

    Proceedings of 15th Annual Conference of the International Speech Communication Association 2917-2921 2014

    ISSN: 2308-457X

    eISSN: 1990-9772

  113. Frame-level acoustic modeling based on Gaussian process regression for statistical nonparametric speech synthesis Peer-reviewed

    Tomoki Koriyama, Takashi Nose, Takao Kobayashi

    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 8007-8011 2013/10/18

    DOI: 10.1109/ICASSP.2013.6639224  

    ISSN: 1520-6149

  114. An intuitive style control technique in HMM-based expressive speech synthesis using subjective style intensity and multiple-regression global variance model Peer-reviewed

    Takashi Nose, Takao Kobayashi

    SPEECH COMMUNICATION 55 (2) 347-357 2013/02

    DOI: 10.1016/j.specom.2012.09.003  

    ISSN: 0167-6393

    eISSN: 1872-7182

  115. [招待講演] 統計モデルに基づく音声合成における話者・スタイルの多様化 Invited

    能勢 隆

    電子情報通信学会技術研究報告 Vol. 112 (No. 422) 67-72 2013

  116. HMM-BASED EXPRESSIVE SPEECH SYNTHESIS BASED ON PHRASE-LEVEL F0 CONTEXT LABELING Peer-reviewed

    Yu Maeno, Takashi Nose, Takao Kobayashi, Tomoki Koriyama, Yusuke Ijima, Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka

    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) 7859-7863 2013

    DOI: 10.1109/ICASSP.2013.6639194  

    ISSN: 1520-6149

  117. SPEAKER-INDEPENDENT STYLE CONVERSION FOR HMM-BASED EXPRESSIVE SPEECH SYNTHESIS Peer-reviewed

    Hiroki Kanagawa, Takashi Nose, Takao Kobayashi

    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) 7864-7868 2013

    DOI: 10.1109/ICASSP.2013.6639195  

    ISSN: 1520-6149

  118. A style control technique for singing voice synthesis based on multiple-regression HSMM Peer-reviewed

    Takashi Nose, Misa Kanemoto, Tomoki Koriyama, Takao Kobayashi

    Proceedings of 14th Annual Conference of the International Speech Communication Association 378-382 2013

  119. Statistical nonparametric speech synthesis using sparse Gaussian processes Peer-reviewed

    Tomoki Koriyama, Takashi Nose, Takao Kobayashi

    Proceedings of 14th Annual Conference of the International Speech Communication Association 1072-1076 2013

  120. Robust Estimation of Multiple-Regression HMM Parameters for Dimension-Based Expressive Dialogue Speech Synthesis Peer-reviewed

    Tomohiro Nagata, Hiroki Mori, Takashi Nose

    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 1548-1552 2013

    ISSN: 2308-457X

  121. Very low bit-rate F0 coding for phonetic vocoders using MSD-HMM with quantized F0 symbols Peer-reviewed

    Takashi Nose, Takao Kobayashi

    SPEECH COMMUNICATION 54 (3) 384-392 2012/03

    DOI: 10.1016/j.specom.2011.10.002  

    ISSN: 0167-6393

    eISSN: 1872-7182

  122. A tone-modeling technique using a quantized F0 context to improve tone correctness in average-voice-based speech synthesis Peer-reviewed

    Vataya Chunwijitra, Takashi Nose, Takao Kobayashi

    SPEECH COMMUNICATION 54 (2) 245-255 2012/02

    DOI: 10.1016/j.specom.2011.08.006  

    ISSN: 0167-6393

    eISSN: 1872-7182

  123. HMMに基づく対話音声合成における多様な韻律生成のためのコンテクストの拡張 Peer-reviewed

    郡山知樹, 能勢 隆, 小林隆夫

    電子情報通信学会論文誌 Vol. J95-D (No. 3) 597-607 2012

  124. AN F0 MODELING TECHNIQUE BASED ON PROSODIC EVENTS FOR SPONTANEOUS SPEECH SYNTHESIS Peer-reviewed

    Tomoki Koriyama, Takashi Nose, Takao Kobayashi

    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) 4589-4592 2012

    DOI: 10.1109/ICASSP.2012.6288940  

    ISSN: 1520-6149

  125. Discontinuous Observation HMM for Prosodic-Event-Based F0 Generation Peer-reviewed

    Tomoki Koriyama, Takashi Nose, Takao Kobayashi

    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 462-465 2012

  126. A speech parameter generation algorithm using local variance for HMM-based speech synthesis Peer-reviewed

    Vataya Chunwijitra, Takashi Nose, Takao Kobayashi

    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 1150-1153 2012

  127. Speaker-independent HMM-based voice conversion using adaptive quantization of the fundamental frequency Peer-reviewed

    Takashi Nose, Takao Kobayashi

    SPEECH COMMUNICATION 53 (7) 973-985 2011/09

    DOI: 10.1016/j.specom.2011.05.001  

    ISSN: 0167-6393

    eISSN: 1872-7182

  128. TONAL CONTEXT LABELING USING QUANTIZED F-0 SYMBOLS FOR IMPROVING TONE CORRECTNESS IN AVERAGE-VOICE-BASED SPEECH SYNTHESIS Peer-reviewed

    Vataya Chunwijitra, Takashi Nose, Takao Kobayashi

    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING 4708-4711 2011

    DOI: 10.1109/ICASSP.2011.5947406  

    ISSN: 1520-6149

  129. VERY LOW BIT-RATE F0 CODING FOR PHONETIC VOCODER USING MSD-HMM WITH QUANTIZED F0 CONTEXT Peer-reviewed

    Takashi Nose, Takao Kobayashi

    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING 5236-5239 2011

    DOI: 10.1109/ICASSP.2011.5947538  

    ISSN: 1520-6149

  130. A Perceptual Expressivity Modeling Technique for Speech Synthesis Based on Multiple-Regression HSMM Peer-reviewed

    Takashi Nose, Takao Kobayashi

    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 116-119 2011

  131. HMM-Based Emphatic Speech Synthesis Using Unsupervised Context Labeling Peer-reviewed

    Yu Maeno, Takashi Nose, Takao Kobayashi, Yusuke Ijima, Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka

    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 1860-+ 2011

  132. Performance Prediction of Speech Recognition Using Average-Voice-Based Speech Synthesis Peer-reviewed

    Tatsuhiko Saito, Takashi Nose, Takao Kobayashi, Yohei Okato, Akio Horii

    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 1964-+ 2011

  133. On the Use of Extended Context for HMM-based Spontaneous Conversational Speech Synthesis Peer-reviewed

    Tomoki Koriyama, Takashi Nose, Takao Kobayashi

    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 2668-2671 2011

  134. Recent development of HMM-based expressive speech synthesis and its applications Peer-reviewed

    Takashi Nose, Takao Kobayashi

    Proceedings of 2011 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 1-4 2011

  135. HMM-Based Voice Conversion Using Quantized F0 Context Peer-reviewed

    Takashi Nose, Yuhei Ota, Takao Kobayashi

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E93D (9) 2483-2490 2010/09

    DOI: 10.1587/transinf.E93.D.2483  

    ISSN: 0916-8532

  136. A Rapid Model Adaptation Technique for Emotional Speech Recognition with Style Estimation Based on Multiple-Regression HMM Peer-reviewed

    Yusuke Ijima, Takashi Nose, Makoto Tachibana, Takao Kobayashi

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E93D (1) 107-115 2010/01

    DOI: 10.1587/transinf.E93.D.107  

    ISSN: 0916-8532

  137. A Technique for Estimating Intensity of Emotional Expressions and Speaking Styles in Speech Based on Multiple-Regression HSMM Peer-reviewed

    Takashi Nose, Takao Kobayashi

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E93D (1) 116-124 2010/01

    DOI: 10.1587/transinf.E93.D.116  

    ISSN: 0916-8532

  138. HMM-BASED SPEECH SYNTHESIS WITH UNSUPERVISED LABELING OF ACCENTUAL CONTEXT BASED ON F0 QUANTIZATION AND AVERAGE VOICE MODEL Peer-reviewed

    Takashi Nose, Koujirou Ooki, Takao Kobayashi

    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING 4622-4625 2010

    DOI: 10.1109/ICASSP.2010.5495548  

    ISSN: 1520-6149

  139. Learning lexicons from spoken utterances based on statistical model selection Peer-reviewed

    Ryo Taguchi, Naoto Iwahashi, Kotaro Funakoshi, Mikio Nakano, Takashi Nose, Tsuneo Nitta

    Transactions of the Japanese Society for Artificial Intelligence 25 (4) 549-559 2010

    DOI: 10.1527/tjsai.25.549  

    ISSN: 1346-0714 1346-8030

  140. HMM-based robust voice conversion using adaptive F0 quantization Peer-reviewed

    Takashi Nose, Takao Kobayashi

    Proceedings of 7th ISCA Workshop on Speech Synthesis 80-85 2010

  141. Evaluation of Prosodic Contextual Factors for HMM-based Speech Synthesis Peer-reviewed

    Shuji Yokomizo, Takashi Nose, Takao Kobayashi

    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2 430-433 2010

  142. Conversational Spontaneous Speech Synthesis Using Average Voice Model Peer-reviewed

    Tomoki Koriyama, Takashi Nose, Takao Kobayashi

    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2 853-856 2010

  143. Speaker-independent HMM-based Voice Conversion Using Quantized Fundamental Frequency Peer-reviewed

    Takashi Nose, Takao Kobayashi

    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4 1724-1727 2010

  144. Grounding new words on the physical world in multi-domain human-robot dialogues Peer-reviewed

    Mikio Nakano, Naoto Iwahashi, Takayuki Nagai, Taisuke Sumii, Xiang Zuo, Ryo Taguchi, Takashi Nose, Akira Mizutani, Tomoaki Nakamura, Muhammad Attamimi, Hiromi Narimatsu, Kotaro Funakoshi, Yuji Hasegawa

    AAAI Publications, 2010 AAAI Fall Symposium Series 74-79 2010

  145. Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis Peer-reviewed

    Junichi Yamagishi, Takashi Nose, Heiga Zen, Zhen-Hua Ling, Tomoki Toda, Keiichi Tokuda, Simon King, Steve Renals

    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 17 (6) 1208-1230 2009/08

    DOI: 10.1109/TASL.2009.2016394  

    ISSN: 1558-7916

    eISSN: 1558-7924

  146. HMM-Based Style Control for Expressive Speech Synthesis with Arbitrary Speaker's Voice Using Model Adaptation Peer-reviewed

    Takashi Nose, Makoto Tachibana, Takao Kobayashi

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E92D (3) 489-497 2009/03

    DOI: 10.1587/transinf.E92.D.489  

    ISSN: 0916-8532

  147. EMOTIONAL SPEECH RECOGNITION BASED ON STYLE ESTIMATION AND ADAPTATION WITH MULTIPLE-REGRESSION HMM Peer-reviewed

    Yusuke Ijima, Makoto Tachibana, Takashi Nose, Takao Kobayashi

    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS 4157-4160 2009

    DOI: 10.1109/ICASSP.2009.4960544  

    ISSN: 1520-6149

  148. Speaking Style Adaptation for Spontaneous Speech Recognition Using Multiple-Regression HMM Peer-reviewed

    Yusuke Ijima, Takeshi Matsubara, Takashi Nose, Takao Kobayashi

    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 548-551 2009

  149. HMM-based Speaker Characteristics Emphasis Using Average Voice Model Peer-reviewed

    Takashi Nose, Junichi Adada, Takao Kobayashi

    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 2599-2602 2009

  150. Learning Lexicons from Spoken Utterances Based on Statistical Model Selection Peer-reviewed

    Ryo Taguchi, Naoto Iwahashi, Takashi Nose, Kotaro Funakoshi, Mikio Nakano

    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 2687-2690 2009

  151. Recent development of the HMM-based speech synthesis system (HTS) Peer-reviewed

    Heiga Zen, Keiichiro Oura, Takashi Nose, Junichi Yamagishi, Shinji Sako, Tomoki Toda, Takashi Masuko, Alan W. Black, Keiichi Tokuda

    Proceedings of 2009 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 121-130 2009

  152. Performance evaluation of the speaker-independent HMM-based speech synthesis system "HTS-2007" for the Blizzard Challenge 2007 Peer-reviewed

    Junichi Yamagishi, Takashi Nose, Heiga Zen, Tomoki Toda, Keiichi Tokuda

    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12 3957-+ 2008

    DOI: 10.1109/ICASSP.2008.4518520  

    ISSN: 1520-6149

  153. Speaker and style adaptation using average voice model for style control in HMM-based speech synthesis Peer-reviewed

    Makoto Tachibana, Shinsuke Izawa, Takashi Nose, Takao Kobayashi

    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12 4633-4636 2008

    DOI: 10.1109/ICASSP.2008.4518689  

    ISSN: 1520-6149

  154. An On-line Adaptation Technique for Emotional Speech Recognition Using Style Estimation with Multiple-Regression HMM Peer-reviewed

    Yusuke Ijima, Makoto Tachibana, Takashi Nose, Takao Kobayashi

    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 1297-1300 2008

  155. An Estimation Technique of Style Expressiveness for Emotional Speech Using Model Adaptation Based on Multiple-Regression HSMM Peer-reviewed

    Takashi Nose, Yoichi Kato, Makoto Tachibana, Takao Kobayashi

    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 2759-2762 2008

  156. A style control technique for HMM-based expressive speech synthesis Peer-reviewed

    Takashi Nose, Junichi Yamagishi, Takashi Masuko, Takao Kobayashi

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E90D (9) 1406-1413 2007/09

    DOI: 10.1093/ietisy/e90-d.9.1406  

    ISSN: 0916-8532

  157. A speaker adaptation technique for MRHSMM-based style control of. synthetic speech Peer-reviewed

    Takashi Nose, Yoichi Kato, Takao Kobayashi

    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3 833-+ 2007

    DOI: 10.1109/ICASSP.2007.367042  

    ISSN: 1520-6149

  158. The HMM-based speech synthesis system version 2.0 Peer-reviewed

    Heiga Zen, Takashi Nose, Junichi Yamagishi, Shinji Sako, Takashi Masuko, Alan W. Black, Keiichi Tokuda

    Proceedings of 6th ISCA Workshop on Speech Synthesis 294-299 2007

  159. Style Estimation of Speech Based on Multiple Regression Hidden Semi-Markov Model Peer-reviewed

    Takashi Nose, Yoichi Kato, Takao Kobayashi

    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4 2900-2903 2007

  160. A Style Control Technique for Speech Synthesis Using Multiple Regression HSMM Peer-reviewed

    Takashi Nose, Junichi Yamagishi, Takao Kobayashi

    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 1324-1327 2006

  161. A Technique for Controlling Voice Quality of Synthetic Speech Using Multiple Regression HSMM Peer-reviewed

    Makoto Tachibana, Takashi Nose, Junichi Yamagishi, Takao Kobayashi

    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 2438-2441 2006

Show all ︎Show first 5

Misc. 52

  1. Invited Talk : Synthesis, Recognition and Conversion of Various Speech Using Deep Learning and Their Applications

    117 (160) 3-8 2017/07/27

    Publisher: 電子情報通信学会

    ISSN: 0913-5685

  2. A Study on DNN-Based Speech Synthesis Using Vector Quantization of Spectral Features

    116 (414) 65-70 2017/01/21

    Publisher: 電子情報通信学会

    ISSN: 0913-5685

  3. Poster Presentation : A Study on Singer-Independent Singing Voice Conversion Using Read Speech Based on Neural Network

    116 (414) 17-22 2017/01/21

    Publisher: 電子情報通信学会

    ISSN: 0913-5685

  4. Improvement of Accent Sandhi Rules Based on Accent Dictionary for Japanese Text-to-Speech Systems

    116 (378) 31-36 2016/12/20

    Publisher: 電子情報通信学会

    ISSN: 0913-5685

  5. Poster Presentation : Development of the Julius-compatible interface for the speech recognition engine of Kaldi toolkit

    116 (378) 49-51 2016/12/20

    Publisher: 電子情報通信学会

    ISSN: 0913-5685

  6. Poster Presentation : F0 control by modeling differential features in DNN-based speech synthesis

    116 (378) 37-42 2016/12/20

    Publisher: 電子情報通信学会

    ISSN: 0913-5685

  7. A Study on Colorization in Photo-Realistic Facial Animation Synthesis from Text Based on HMM and DNN with Animation Unit

    116 (220) 67-72 2016/09/15

    Publisher: 電子情報通信学会

    ISSN: 0913-5685

  8. A Study on Colorization in Photo-Realistic Facial Animation Synthesis from Text Based on HMM and DNN with Animation Unit

    40 (31) 67-72 2016/09

    Publisher: 映像情報メディア学会

    ISSN: 1342-6893

  9. Study of Photo-realistic Face Moving Image Generation from the Text Using the Facial Feature

    116 (33) 43-48 2016/05/19

    Publisher: 電子情報通信学会

    ISSN: 0913-5685

  10. A study on quick model training in HMM-based speech synthesis

    115 (253) 27-32 2015/10/15

    Publisher: 電子情報通信学会

    ISSN: 0913-5685

  11. Automatic generation of abbreviated named entities for localized speech recognition

    115 (184) 7-12 2015/08/21

    Publisher: 電子情報通信学会

    ISSN: 0913-5685

  12. HMM音声合成におけるアクセントラベリング基準が合成音声に与える影響の分析

    高橋 遼太, 能勢 隆, 伊藤 彰則

    情報処理学会研究報告. SLP, 音声言語情報処理 2015 (1) 1-6 2015/05/18

    Publisher: 一般社団法人情報処理学会

    More details Close

    本論文では,従来の HMM 音声合成において曖昧であったアクセントラベリング基準について検討を行い,合成音声への影響を調べる.具体的には,アクセント型の表現およびアクセント句境界の基準について検討する.アクセント型については,尾高型が 0 型とモーラ長型の 2 通りの表現があることに着目し,それらを用いた場合に合成音声の F0 がどのような影響を受けるかについて客観評価を行う.また,2 段階クラスタリングを用いる効果についても検証する.アクセント句境界については,アクセント句によっては 0 型と 1 型の 2 つのアクセント句で表現する場合と,それらを結合し 1 つのアクセント句として表現する場合があり,これらの違いが合成音声に与える影響を調べる.またこれらの評価において,日本語アクセントの高低の誤りを客観的指標として導入し,この指標の有効性について分析を行う.

  13. シナリオ対話における感情音声合成を用いた対話システムの評価と感情付与方法の検討

    加瀬 嵩人, 能勢 隆, 千葉 祐弥, 伊藤 彰則

    情報処理学会研究報告. SLP, 音声言語情報処理 2015 (9) 1-7 2015/05/18

    Publisher: 一般社団法人情報処理学会

    More details Close

    近年,非タスク指向型の音声対話システムへの需要が拡大しており,様々な研究がされている.それらほとんどの研究は言語的な観点から適切な応答の生成を目指したものである.一方で人間同士の会話においては,感情表現や発話様式などのパラ言語情報を効果的に利用することにより,対話を円滑に進めることができると考えられる.そこで我々はシステムの応答の内容ではなく,応答の仕方に着目し,感情音声合成を対話システムに用いることを試みる.本研究ではまず,適切な感情付与を人手により与えた場合に実際に対話システムの質が向上するかを複数のシナリオを作成して主観基準により評価する.次に,感情付与を自動化するために,システム発話に応じた付与とユーザ発話に協調した付与の 2 つの手法について検討を行う.評価結果から,感情を自動付与することで対話におけるユーザの主観評価スコアが向上すること,またユーザ発話に協調した感情付与がより効果的であることを示す.

  14. ユーザの対話意欲自動推定を目標とした対話データの分析と音声画像特徴量の検討

    千葉 祐弥, 能勢 隆, 伊藤 彰則

    研究報告音声言語情報処理(SLP) 2015 (10) 1-6 2015/02/20

    Publisher: 一般社団法人情報処理学会

    More details Close

    対話型システムがユーザに適応して話題の提供や情報推薦を行うためには,ユーザの情報を効率的に獲得できることが望ましい.本研究では,ユーザに対して積極的に質問するインタビュー型の音声対話システムを想定する.このようなシステムとの対話では,ユーザが話したいと思う話題に関してはより詳細な情報が得られる可能性がある一方,ユーザが話したくない話題に関しては有益な情報が得られない可能性が高いと考えられるため,システムはユーザの対話意欲を考慮して質問や話題の選択を行う必要がある.本稿では,ユーザの対話意欲を自動推定するための初期検討として,人間同士のインタビュー対話の分析とその自動識別を行った.分析から,対話者自身が自分の対話意欲の高低を自覚できている場合,70~80% 程度の精度で第三者にあたる評価者が対話意欲を判断できることが示唆された.また,評価者のアンケートに挙げられたマルチモーダル情報を利用することで,人間と同程度の精度で自動識別できることが示された.

  15. Waveletを用いた特徴量抽出法とその高精度化手法の評価

    松井 清彰, 能勢 隆, 伊藤 彰則

    研究報告音声言語情報処理(SLP) 2015 (5) 1-6 2015/02/20

    Publisher: 一般社団法人情報処理学会

    More details Close

    音声認識の普及のために,より安価な音声認識システムの実現が必要である.音声認識の低演算量化に関しては様々な先行研究が行われているが,特徴量抽出処理に関しては研究が不十分である.そのため我々は,Wavelet 変換を用いた新しい低演算量特徴量抽出法およびその高精度化手法について提案してきた.本論文では,Haar Wavelet 及び Daubechies Wavelet の 2 種類の Wavelet を用いて特徴量抽出を行い,その性能を MFCC と比較した.その結果,高精度化手法を用いることで,若干の認識率の向上が見られた.また,フレーム間の動的特徴量である Δ 特徴量及び MFCC と同様に,DCT 出力の高次削減によって,さらに認識率を向上させることができた.一方,計算時間に関しては,最もシンプルな Wavelet を用いることで,MFCC の 5 倍以上の計算速度を確保できることが分かった.

  16. Performance Evaluation of Large-Scale Training Sentence Set Construction Based on Entropy in Statistical Speech Synthesis

    能勢隆, 荒生侑介, 荒生侑介, 小林隆夫, 杉浦孔明, 志賀芳則

    電子情報通信学会技術研究報告 115 (184(SP2015 50-58)) 2015

    ISSN: 0913-5685

  17. 英会話学習システムの複数回使用時における学習者の交替潜時の変化に関する検討

    鈴木直人, 廣井富, 藤原祐磨, 千葉祐弥, 能勢隆, 伊藤彰則

    日本音響学会研究発表会講演論文集(CD-ROM) 2015 2015

    ISSN: 1880-7658

  18. 英会話学習システムにおける応答タイミング練習方法の有効性の検証

    鈴木直人, 廣井富, 藤原祐磨, 千葉祐弥, 能勢隆, 伊藤彰則

    情報処理学会研究報告(Web) 2015 (SLP-105) 2015

  19. 日本人による英語歌唱音声の発音評価手法の検討

    吉田一道, 能勢隆, 伊藤彰則

    研究報告音楽情報科学(MUS) 2014 (9) 1-6 2014/11/13

    More details Close

    我々は日本人による英語歌唱音声の英語発音の自動評価を目指している.本研究では,日本人による英語歌詞朗読音声,歌唱音声のデータベースを構築し,英語ネイティブ話者と日本語ネイティブ話者による主観評価を行った.また,英語ネイティブ話者と日本語ネイティブ話者による英語歌詞朗読音声と英語歌唱音声の評価を比較し,歌唱音声では発話音声と比較して伸ばすフレーズに発音誤りが生じやすいということが示唆された.さらに,HMM による英語歌唱の自動発音評価手法について検討し,日米 2 言語のネイティブ話者による発話音声から学習した HMM を用いた簡単な発音誤り判定実験を行った.その結果,発音誤り判定時の尤度差の閾値や歌唱時に伸ばすフレーズの発音誤りの検討により,更に検出精度を向上させられる可能性がある事を論じた.

  20. 日本人による英語歌唱音声の発音評価手法の検討

    吉田一道, 能勢隆, 伊藤彰則

    研究報告デジタルコンテンツクリエーション(DCC) 2014 (9) 1-6 2014/11/13

    More details Close

    我々は日本人による英語歌唱音声の英語発音の自動評価を目指している.本研究では,日本人による英語歌詞朗読音声,歌唱音声のデータベースを構築し,英語ネイティブ話者と日本語ネイティブ話者による主観評価を行った.また,英語ネイティブ話者と日本語ネイティブ話者による英語歌詞朗読音声と英語歌唱音声の評価を比較し,歌唱音声では発話音声と比較して伸ばすフレーズに発音誤りが生じやすいということが示唆された.さらに,HMM による英語歌唱の自動発音評価手法について検討し,日米 2 言語のネイティブ話者による発話音声から学習した HMM を用いた簡単な発音誤り判定実験を行った.その結果,発音誤り判定時の尤度差の閾値や歌唱時に伸ばすフレーズの発音誤りの検討により,更に検出精度を向上させられる可能性がある事を論じた.

  21. A Study on Intuitive Control of Emotional Expressions and Speaking Styles Using Facial Features by Kinect

    BI Yu, NOSE Takashi, ITO Akinori

    IEICE technical report. Speech 114 (303) 25-30 2014/11/13

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    This paper proposes a style control technique of synthetic speech based on multiple regression HSMM (MRHSMM) using facial features. In the proposed technique, styles and their intensities are represented by Animation Unit (AU) parameters and are modeled by an assumption that mean parameters of acoustic models are given as multiple regressions of the AU parameters. Since correlation among AU parameters is problematic in the modeling, we conducted orthogonalization and dimiensionality reduction in advance. When synthesizing speech, we can generated synthetic speech with an intended style by inputting the corresponding facial expression. In this study, we examine the appropriate number of AU parameters and discuss the performance difference depending on the users.

  22. A Study on Hyperparameter Optimization for Speech Synthesis Based on Gaussian Process Regression

    KORIYAMA Tomoki, NOSE Takashi, KOBAYASHI Takao

    IEICE technical report. Speech 113 (404) 19-24 2014/01/23

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    In a statistical parametric speech synthesis framework based on Gaussian process regression, it is important to use an appropriate kernel function. However the parameters of the kernel function, which are hyperparameters of Gaussian processes, were not optimized in our previous work. In this study, we examine hyperparameter optimization algorithm based on an empirical Bayes approach. We show that the proposed method can enhance the predictive likelihood and improve the naturalness of synthesic speech through objective and subjective evaluation results.

  23. 英会話学習システムにおけるCGキャラクタの効果と学習者の発話タイミング制御のための付加表現に関する検討

    鈴木直人, 廣井富, 藤原祐磨, 千葉祐弥, 能勢隆, 伊藤彰則

    日本音響学会研究発表会講演論文集(CD-ROM) 2014 2014

    ISSN: 1880-7658

  24. ARキャラクタとの英会話練習時における交替潜時のタイムプレッシャーによる制御

    鈴木直人, 廣井富, 藤原祐磨, 黒田尚孝, 戸塚典子, 千葉祐弥, 能勢隆, 伊藤彰則

    日本音響学会研究発表会講演論文集(CD-ROM) 2014 2014

    ISSN: 1880-7658

  25. Automatic Estimation of Accent Phrase Boundaries Using Language and Acoustic Models

    SUZUKI Hiroshi, KORIYAMA Tomoki, NOSE Takashi, SHINOZAKI Takahiro, KOBAYASHI Takao

    IEICE technical report. Speech 113 (366) 97-102 2013/12/19

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    This paper proposes a technique for automatically estimating accent phrase boundaries for text-to-speech synthesis systems. To construct speech synthesis systems, we need to prepare a database that has annotations of prosodic information including accents. However, manual annotation for this purpose generally requires costly process. In contrast, the proposed method utilizes conditional random field (CRF) for the language models of accent phrase boundary and accent type, and uses hidden markov model (HMM) for the acoustic feature model. In this paper, we confirmed that the proposed method improved the estimation accuracy for reading-style speech data compared with conventional method.

  26. Automatic Estimation of Accent Phrase Boundaries Using Language and Acoustic Models

    Hiroshi Suzuki, Tomoki Koriyama, Takashi Nose, Takahiro Shinozaki, Takao Kobayashi

    IPSJ SIG Notes 2013 (16) 1-6 2013/12/12

    Publisher: Information Processing Society of Japan (IPSJ)

    More details Close

    This paper proposes a technique for automatically estimating accent phrase boundaries for text-to-speech synthesis systems. To construct speech synthesis systems, we need to prepare a database that has annotations of prosodic information including accents. However, manual annotation for this purpose generally requires costly process. In contrast, the proposed method utilizes conditional random field (CRF) for the language models of accent phrase boundary and accent type, and uses hidden markov model (HMM) for the acoustic feature model. In this paper, we confirmed that the proposed method improved the estimation accuracy for reading-style speech data compared with conventional method.

  27. A Study on a Style Control Based on Multiple-Regression HSMM for Synthesizing Singing Voices with Various Expressivity

    NOSE Takashi, KANEMOTO Misa, KORIYAMA Tomoki, KOBAYASHI Takao

    IEICE technical report. Speech 112 (422) 79-84 2013/01/30

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    This paper proposes a style control technique based on multiple regression HSMM (MRHSMM) for changing styles and their intensities appearing in synthetic singing voices. In the proposed technique, styles and their intensities are represented by low-dimensional vectors called style vectors and are modeled by an assumption that mean parameters of acoustic models are given as multiple regressions of the style vectors. When synthesizing speech, we can weaken or emphasize the intensity of each style by setting a desired style vector. In addition, the idea of pitch adaptive training is introduced into the MRHSMM to improve the modeling accuracy of F0 associated with musical notes. The novel vibrato modeling technique is also presented to extract vibrato parameters from singing voices that sometimes have unclear vibrato expressions. Subjective evaluations show that we can intuitively contorol styles and their intensities while keeping naturalness of synthetic speech.

  28. A study on Speaker-Normalized Style Conversion for Arbitrary Speaker's Expressive Speech Synthesis

    KANAGAWA Hiroki, NOSE Takashi, KOBAYASHI Takao

    IEICE technical report. Speech 112 (422) 73-78 2013/01/30

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    This paper proposes a technique for improving naturalness of synthetic speech using a framework of speaker adaptive training in HMM-based style conversion. In the style conversion, speaker-independent linear transforms are estimated using neutral- and target-style speech data of multiple speakers, and estimated transforms are applied to a new speaker's neutral-style model. As a result, we can convert the style expressivity of the acoustic model to the target style without preparing any target-style speech of the speaker. When spectral and prosodic features of training speakers are significantly different from each other, naturalness of synthetic speech fioni the converted model decreases. The proposed technique attempt to alleviate this problem by normalizing speaker characteristics using an approach similar to speaker adaptive training. From the objective and subjective evaluation results, we show that the speaker normalization technique can provide more natural sounding speech.

  29. A Study on Multi-Class Local Prosodic Context for Expressive Prosody Generation

    MAENO Yu, NOSE Takashi, KOBAYASHI Takao, KORIYAMA Tomoki, IJIMA Yusuke, NAKAJIMA Hideharu, MIZUNO Hideyuki, YOSHIOKA Osamu

    IEICE technical report. Speech 112 (422) 85-90 2013/01/23

    Publisher: The Institute of Electronics, Information and Communication Engineers

    More details Close

    This paper describes a technique for reproducing local prosodic variability which appears in expressive speech including various speaking styles. Synthetic speech generated using only linguistic contexts in HMM-based speech synthesis tends to have smaller prosody variation compared with the original speech. To add more variation in synthetic speech, we define novel phrase-level prosodic contexts from the residual information of prosodic features between original and synthetic speech for training data. Specifically, we create the prosodic contexts of F0, duration, and power feature by using average difference between original and synthetic speech in each phrase. We evaluate the potential of the proposed technique under a condition where the appropriate prosodic contexts of test sentences are known in synthesis phase. We also examine whether users can intuitively modify the pitch by adjusting proposed prosodic contexts.

  30. Modeling of Local Variance of Spectral Features and Its Application to Parameter Generation in HMM-based Speech Synthesis

    NOSE Takashi, CHUNWIJITRA Vataya, KOBAYASHI Takao

    IEICE technical report. Speech 112 (281) 43-48 2012/11/01

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    In this paper, we describe a technique for modeling local variance (LV) of speech features and propose a novel parameter generation algorithm using the LV model for HMM-based speech synthesis. In the proposed technique, We define the LV as a feature that represents the local variation around each frame of the spectral features and model them using context-dependent phone HMMs. To appropriately model the dynamic characteristics of LVs, we take into account the dynamic features of LVs as well as the static one. In the parameter generation process, a spectral parameter sequence is estimated so as to maximize a target function where conventional HMMs and LV models are combined. By using the LV models, the proposed technique can impose a more precise variance restriction in the parameter generation than the conventional technique where the global variance (GV) model is used. Through objective and subjective evaluations, we examine the effectiveness of the proposed technique.

  31. A Study on Automatic Prosodic Context Labeling for Emphatic Speech Synthesis

    MAENO Yu, NOSE Takashi, KOBAYASHI Takao, IJIMA Yusuke, NAKAJIMA Hideharu, MIZUNO Hideyuki, YOSHIOKA Osamu

    IEICE technical report. Speech 112 (81) 1-6 2012/06/07

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    This paper describes automatic prosodic context labeling of training data for synthesizing expressive speech in HMM-based speech synthesis framework by focusing on emphasis expression. We have proposed unsupervised labeling technique with emphasis context using the property of the difference between original and generated F0 patterns. In this approach, there is a problem that the threshold which is used to judge whether a phrase is emphasized or not has to be pre-determined. To overcome this problem, we propose a technique for determining an optimal threshold automatically based on a behavior of F0 pattern in emphatic speech. Experimental results show that the proposed technique gives a similar result to the labeling obtained by subjectively and the emphasis expression is well reproduced in synthetic speech.

  32. A Study on Phone Duration Modeling Using Dynamic Features for HMM-Based Speech Synthesis

    2011 (33) 1-6 2011/12/12

  33. On the use of prosodic-event-based HMM in F0 generation of conversational speech

    KORIYAMA Tomoki, NOSE Takashi, KOBAYASHI Takao

    IEICE technical report. Natural language understanding and models of communication 111 (364) 185-190 2011/12/12

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    In this paper, we propose prosodic-event-based HMM for effectively modeling F0 pattern of spontaneous conversational speech in HMM-based speech synthesis. The prosodic-event-based HMM uses the segment such as pitch falling by accent or pitch rising of boundary pitch movement (BPM) as a modeling unit of HMM. The proposed HMM is expected to reduce the model parameters of F0 because there are less prosodic events derived from F0 features than phones that strongly depends on spectral features. We performed the objective and subjective experiments using spontaneous conversational speech data, and the results show that the prosodic-event-based HMM can significantly reduce the number of model parameters while keeping the quality of the synthetic speech.

  34. An MRHSMM-based conversational speech synthesis with controllability of paralinguistic information

    NAGATA Tomohiro, MORI Hiroki, NOSE Takashi

    IEICE technical report. Natural language understanding and models of communication 111 (364) 179-184 2011/12/12

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    In this paper, we aim at the realization of the speech synthesis that can control paralinguistic information using multiple regression HSMM, incorporated a multiple regression model in hidden semi-Markov model (HSMM)-based speech synthesis scheme. In this study, the paralinguistic information is expressed as a coordinate on space comprised of a small number dimension and the dimensions are used as an explanation variable of the multiple regression model. Two dimensions that considered to be a general index to express emotional state for "PLEASANTNESS" and "AROUSAL" are used. When learning model, evaluated values are used subjectively for each dimenstions. And when synthesize speech, we synthesize any speech that reflected emotion by giving arbitrary values. We examine the influence that two dimensions give synthesized speech with acoustic features of synthesized speech. Additionally, we have three subjective experiments for synthesized speech. First, the result of a naturally test show that synthesized speech are natural. Next, the result of a reproducibility test show that reproducibility of given emotion. Finally, the result of a emotional expression test show that synthesized speech transmit an aimed emotion.

  35. A Study on Speaker-Independent Style Conversion in HMM Speech Synthesis

    KANAGAWA Hiroki, NOSE Takashi, KOBAYASHI Takao

    IEICE technical report. Natural language understanding and models of communication 111 (364) 191-196 2011/12/12

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    This paper proposes a technique for synthesizing speech of a desired style using speaker-independent style conversion in HMM-based speech synthesis. The HMM-based style adaptation technique has been proposed that can synthesize speech of arbitrary sentences with a target style. However, this technique cannot be used when the speech data of the target style is not available. To overcome the problem, we extend the speaker-dependent style conversion in the style adaptation to speaker-independent one. Specifically, first we prepare neutral- and target-style speech data of multiple speakers and train a neutral-style average voice model. The style conversion from the average voice model to the target style one is trained using linear transformation. We then apply the transformation matrices to the neutral-style model of the target speaker. Finally, we obtain the target-style model of the target speaker and synthesize the style-converted speech. We evaluate the proposed technique in terms of speaker and style characteristics and naturalness.

  36. A Study on Phone Duration Modeling Using Dynamic Features for HMM-Based Speech Synthesis

    NOSE Takashi, KOBAYASHI Takao

    IEICE technical report. Natural language understanding and models of communication 111 (364) 197-202 2011/12/12

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    This paper proposes a technique for modeling and generating phone durations using their dynamic features to improve prediction accuracy of phone durations in HMM-based speech synthesis. For the duration modeling, a technique with explicit state-duration modeling based on hidden semi-Markov model (HSMM) has been proposed. However, the HSMM cannot directly model phone durations, and the relation of phone durations among adjacent phonemes are represented only by context labels. In the proposed technique, phone durations are regarded as observable data obtained by manual labeling or forced alignment and are directly modeled using single Gaussian distributions. To explicitly take into account the correlation of phone durtions in the model training and speech synthesis, we use not only static phone durations but also dynamic ones. When synthesizing speech, we generate a phone-duration sequence from the trained duration models using a parameter generation algorithm with static and dynamic features. We evaluate the performance of our duration modeling technique by comparing to other techniques with static or static log-duration features.

  37. Performance Evaluation of Contexts for Conversational Speech Synthesis Using Corpus of Spontaneous Japanese

    KORIYAMA Tomoki, NOSE Takashi, KOBAYASHI Takao

    IEICE technical report 111 (28) 155-160 2011/05/05

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    This paper proposes an extended context set for generating the prosodic variability of spontaneous speech in HMM-based conversational speech synthesis. Since the conventional context set used for HMM-based reading-style speech synthesis is insufficient for conversational speech synthesis, we introduce new contexts derived from the Corpus of Spontaneous Japanese. We compare the context sets with and without newly introduced contexts, and the experimental results show that the contexts about phone prolongation and X-JToBI tone tier label are effective. Furthermore, we examine the stopping criteria for decision-tree clustering and the automatic estimation of a part of contexts for practical applications.

  38. Study on HMM-based F0 Coding for Very Low Bit-Rate Vocoder

    2010 (5) 1-6 2011/02

    Publisher: 情報処理学会

    ISSN: 1884-0930

  39. Study on HMM-based F0 Coding for Very Low Bit-Rate Vocoder

    NOSE Takashi, KUMAMOTO Masashi, KOBAYASHI Takao

    IEICE technical report 110 (356) 189-194 2010/12/13

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    This paper presents a novel F0 coding technique for very low bit-rate HMM-based phonetic vocoder. Our technique is based on the multi-space distribution HMM (MSD-HMM) with quantized F0 symbols used as a prosodic context. By introducing the F0 symbol, we can model F0 values without using manually labeled speech data including accent information. In the encoding process, the F0 sequence extracted from an input utterance is converted into the quantized F0 symbol sequence, and these symbols are transmitted with the phonemes and state durations obtained by a phoneme recognizer. In the decoding process, context-dependent labels are created from the phonemes and F0 symbols, and the spectral and F0 sequences are generated using the pre-trained MSD-HMM on the basis of a maximum likelihood criterion. The experimental results show that the degradation of F0 quality through the coding process is not annoying even if the bit-rate for F0 is less than 50 bit/s.

  40. A Study on Conversational Speech Synthesis Based on Average Voice Model

    KORIYAMA Tomoki, NOSE Takashi, KOBAYASHI Takao

    IEICE technical report 109 (375) 33-38 2010/01/14

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    This paper describes a conversational speech synthesis technique using average voice model and model adaptation based on hidden semi-Markov model (HSMM). In conversational speech, the acoustic features are affected by various factors such as speaker individuality, speaking style, and speaker's intention, and it is not easy to generate natural sounding speech using a small amount of speech data of a target speaker. To overcome this problem, the proposed technique utilizes an average voice model trained in advance using multiple speakers' speech data and adapts the model to the target speaker's one using a speaker adaptation technique. We can generate synthetic speech even if the available speech data of the target speaker is very limited. In this study, we evaluate the performance of the proposed technique by objective measures. We use two types of average voice models, one is trained with read speech, and the other with conversational speech. The experimental results show that the distortion of spectral and pitch features between synthetic and original speech samples decreases when using the proposed technique.

  41. Performance evaluation of Voice Conversion Based on F0 Quantization and Non-parallel Training

    OTA Yuhei, NOSE Takashi, KOBAYASHI Takao

    IEICE technical report 109 (375) 27-32 2010/01/14

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    This paper describes the performance evaluation results of a context-dependent HMM-based voice conversion technique to show its effectiveness by comparing with a GMM-based one. In the HMM-based conversion, first we extract the phonetic and prosodic information from input speech of a source speaker. Then, converted synthetic speech is generated from the pre-trained acoustic model of a target speaker. To appropriately model the pitch information, we use a roughly quantized F0 symbol sequence as the prosodic context instead of accent information obtained by manual labeling for training data. By using the phonetically and prosodically context-dependent HMMs, the speaker characteristics appearing in segmental and supra-segmental features can be also converted, which is difficult in conventional GMM-based techniques. Objective and subjective experimental results show that the naturalness and speaker individuality of converted speech are significantly improved by using HMM-based voice conversion.

  42. A Study on Voice Conversion Based on F0 Quantization and Non-parallel Training

    OTA Yuhei, NOSE Takashi, KOBAYASHI Takao

    IEICE technical report 109 (356) 171-176 2009/12/14

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    This paper presents a novel voice conversion technique using HMM-based phoneme recognition and speech synthesis with nonparallel training data. In the proposed technique, a phoneme sequence with durations and a rough F0 contour are extracted from input speech of a source speaker using phoneme recognition and F0 quantization, and are transmitted to synthesis part. In the synthesis part, a context-dependent label sequence is generated from the transmitted phonemes, durations, and quantized F0 symbols. Then, converted speech is generated from the label sequence using a target speaker's pre-trained MSD-HMM. In the model training, the models of the source and the target speakers can be trained separately with nonparallel data. For duration modification, linear transformation is applied to each phone duration of input speech. The objective and subjective experimental results show that the proposed technique works well even if the parallel speech data is not available.

  43. HMM-based Speech Synthesis Using Quantized-F0-based Prosodic Context

    OOKI Koujirou, NOSE Takashi, KOBAYASHI Takao

    IEICE technical report 109 (356) 141-146 2009/12/14

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    This paper describes a technique for an HMM-based speech synthesis without using any manual labeling of accent information for a target speaker's training data. The proposed technique utilizes coarsely quantized F0 symbols instead of hand-labeled accent information for the context-dependent label in F0 modeling based on HMM. F0 quantization leads to automatic labeling of F0 contexts for the training data with high accuracy. When synthesizing speech, an F0 contour is firstly generated by using a pre-trained average voice model with a conventional context-dependent label sequence converted from an input text, and then a label sequence for synthesis is created by quantizing the generated F0 contour. Synthetic speech is generated from the target speaker's model with the obtained label. Results of the objective and subjective evaluation tests are shown to demonstrate the effectiveness of the proposed method.

  44. Speaking Style Classification of Spontaneous Speech Using Multiple-Regression HMM

    NOSE Takashi, MATSUBARA Takeshi, IJIMA Yusuke, KOBAYASHI Takao

    IEICE technical report 109 (139) 31-36 2009/07/10

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    This paper describes speaking style classification and speech recognition for spontaneous speech based on multiple-regression HMM (MRHMM). In MRHMM, the mean vector of each probability density function is given by multiple regression of a low-dimensional vector, called style vector. Each component of the style vector corresponds to the intensity of expressivity of speaking style variation, and the type of speaking style can be classified by estimating the style vector for input speech based on an ML criterion. Moreover, in spontaneous speech recognition, acoustic models are adapted on-line by updating model parameters using the estimated style vector for each input utterance. The performance evaluation using the Corpus of Spontaneous Japanese (CSJ) shows that a high classification rate is obtained even when the amount of available training data is very limited. The effectiveness of the proposed technique is also shown by a phoneme recognition experiment.

  45. 自然な対話の中で物体の名前を覚えるロボット

    中野 幹生, 長井 隆行, 能勢 隆, 田口 亮, 水谷 了, 中村 友昭, 船越 孝太郎, 長谷川 雄二, 鳥井 豊隆, 岩橋 直人

    JSAI大会論文集 2009 (0) 1F2OS73-1F2OS73 2009

    Publisher: 一般社団法人 人工知能学会

    More details Close

    <p>発話と画像情報を入力として,物の名前を覚えるロボットが研究されているが, 名前を覚えさせるモードをあらかじめ設定しておかなくてはならなかったり, 名前を覚えさせる発話のパタンが決まっていたりした.本稿では,さまざまな ドメインの対話を行うことができ,対話の途中で物の名前を教示する発話を聞 くと学習を行うことができるロボットのアーキテクチャとその実装について述 べる. </p>

  46. モデル選択による言語獲得手法とその評価

    田口 亮, 岩橋 直人, 能勢 隆, 船越 孝太郎, 中野 幹生

    JSAI大会論文集 2009 (0) 1F2OS72-1F2OS72 2009

    Publisher: 一般社団法人 人工知能学会

    More details Close

    <p>本稿では,単語の知識を持たないロボットが,人の自由な発話から物や場所の名前を学習する手法を提案する.初期の単語候補は,学習データの音素認識結果から生成する.この単語候補を用いて単語認識と意味・文法の学習を行い,統計的モデル選択の基準を元に,音響的,文法的,意味的に不要な単語を削除・連結する.そして再び単語認識を行う.これを繰り返すことで,単語の正しい音素系列と意味が獲得される.</p>

  47. Acoustic Model Training Technique for Speech Recognition Using Style Estimation with Multiple-Regression HMM

    IJIMA Yusuke, TACHIBANA Makoto, NOSE Takashi, KOBAYASHI Takao

    IPSJ SIG Notes 2008 (123) 37-42 2008/12/02

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    We propose a technique for emotional speech recognition based on multiple-regression HMM (MRHMM). To achieve emotional speech recognition for an arbitrary speaker with a small amount of training data, we incorporate a speaker and style adaptation technique into speaker-dependent MRHMM-based emotional speech recognition. In the proposed technique, we first adapt the speaker-independent model to target speaker's respective styles with a small amount of speech data. Then, using obtained speaker- and style-adapted HMMs and low-dimensional style control vector for each training style, the regression matrices of MRHMM are estimated based on least square method and maximum likelihood estimation. We assess the performance of the proposed technique on the recognition of acted emotional speech uttered by both professional narrators and non-professional speakers and show the effectiveness of the technique.

  48. Acoustic Model Training Technique for Speech Recognition Using Style Estimation with Multiple-Regression HMM

    IJIMA Yusuke, TACHIBANA Makoto, NOSE Takashi, KOBAYASHI Takao

    IEICE technical report 108 (337) 37-42 2008/12/02

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    We propose a technique for emotional speech recognition based on multiple-regression HMM (MRHMM). To achieve emotional speech recognition for an arbitrary speaker with a small amount of training data, we incorporate a speaker and style adaptation technique into speaker-dependent MRHMM-based emotional speech recognition. In the proposed technique, we first adapt the speaker-independent model to target speaker's respective styles with a small amount of speech data. Then, using obtained speaker- and style-adapted HMMs and low-dimensional style control vector for each training style, the regression matrices of MRHMM are estimated based on least square method and maximum likelihood estimation. We assess the performance of the proposed technique on the recognition of acted emotional speech uttered by both professional narrators and non-professional speakers and show the effectiveness of the technique.

  49. An MRHSMM-based voice quality control technique for synthetic speech using speaker adaptation from average voice model

    TACHIBANA Makoto, KOUNO Akifumi, NOSE Takashi, KOBAYASHI Takao

    IEICE technical report 108 (265) 41-46 2008/10/16

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    This paper describes a technique for controlling voice quality of synthetic speech using multiple-regression hidden semi-Markov model (MRHSMM). To achieve voice quality control with a small amount of training data, we incorporate a speaker adaptation technique from an average voice model into MRHSMM-based voice quality control. In the proposed technique, we first adapt the average voice model to respective training speakers using a small amount of adaptation data. Then, using obtained speaker-adapted HSMMs and low-dimensional voice quality control vector for each training speaker, the regression matrices of MRHSMM are estimated based on least square method and maximum likelihood estimation. We attempt to control voice quality of synthetic speech using 20 speakers' data of 50 sentences for each speaker. From results of subjective evaluation, we show that the proposed technique can control several voice qualities of synthetic speech. Furthermore, we propose model interpolation technique for the MRHSMMs and show its evaluation results.

  50. Recent developments of the HMM-based speech synthesis system (HTS)

    ZEN Heiga, OURA Keiichiro, NOSE Takashi, YAMAGISHI Junichi, SAKO Shinji, TODA Tomoki, MASUKO Takashi, BLACK Alan W., TOKUDA Keiichi

    IPSJ SIG Notes 2007 (129) 301-306 2007/12/21

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    A statistical parametric speech synthesis approach based on hidden Markov models (HMMs) has grown in popularity over the last few years. In this approach, spectrum, excitation, and duration of speech are simultaneously modeled by context-dependent HMMs, and speech waveforms are generated from the HMMs themselves. Since December 2002, we have publicly released an open-source software toolkit named "HMM-based speech synthesis system (HTS)" to provide a research and development toolkit of statistical parametric speech synthesis. This paper describes recent developments of HTS in detail, as well as future release plans.

  51. Recent developments of the HMM-based speech synthesis system (HTS)

    ZEN Heiga, OURA Keiichiro, NOSE Takashi, YAMAGISHI Junichi, SAKO Shinji, TODA Tomoki, MASUKO Takashi, BLACK Alan W., TOKUDA Keiichi

    IEICE technical report 107 (406) 301-306 2007/12/13

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    A statistical parametric speech synthesis approach based on hidden Markov models (HMMs) has grown in popularity over the last few years. In this approach, spectrum, excitation, and duration of speech are simultaneously modeled by context-dependent HMMs, and speech waveforms are generated from the HMMs themselves. Since December 2002, we have publicly released an open-source software toolkit named "HMM-based speech synthesis system (HTS)" to provide a research and development toolkit of statistical parametric speech synthesis. This paper describes recent developments of HTS in detail, as well as future release plans.

  52. A Speaker Adaptation Technique Using Average Voice Model for MRHSMM-based Style Control of Synthetic Speech

    IZAWA Shinsuke, TACHIBANA Makoto, NOSE Takashi, KOBAYASHI Takao

    IEICE technical report 107 (282) 81-86 2007/10/18

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    This paper describes a technique for synthesizing speech with an arbitrary target speaker's voice as well as desired style expressivity. In the conventional MLLR-based speaker adaptation technique for multiple regression hidden semi-Markov model (MRHSMM), the quality of synthesized speech crucially depends on the initial MRHSMM trained from a certain source speaker's data and it is not always possible to synthesize natural sounding speech with any target speaker's voice. To overcome this problem, we propose a technique for simultaneous adaptation of speaker and style from an average voice model. Experimental results show that the proposed technique provides more natural speech than the conventional one with speaker adaptation only.

Show all ︎Show first 5

Books and Other Publications 3

  1. 音響キーワードブック

    能勢隆

    2016/03/22

  2. 進化するヒトと機械の音声コミュニケーション

    能勢隆

    (株)エヌ・ティー・エス 2015/09

  3. Human Machine Interaction - Getting Closer

    Ryo Taguchi, Naoto Iwahashi, Kotaro Funakoshi, Mikio Nakano, Takashi Nose, Tsuneo Nitta

    2012/01

Research Projects 18

  1. Development of a virtual classmate for assistance of online course

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Scientific Research (B)

    Institution: Tohoku University

    2021/04/01 - 2026/03/31

  2. Development of a virtual classmate for assistance of online course

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Scientific Research (B)

    Institution: Tohoku University

    2021/04/01 - 2026/03/31

  3. 話者・地域・スタイルモーフィング音声合成による実環境リスニング学習支援

    能勢 隆, 伊藤 彰則

    Offer Organization: 日本学術振興会

    System: 科学研究費助成事業

    Category: 基盤研究(B)

    Institution: 東北大学

    2022/04/01 - 2025/03/31

    More details Close

    本研究課題では、「音響工学および音声知覚の観点から、リスニング能力の効率的な向上のための方法論とはなにか?」という学術的問いに対する解を導くため、これまで我々が統計的音声合成、機械学習、対話型英会話学習システムなどの研究により培ってきた個別の要素技術を融合・発展させ、話者・地域・スタイル・訛りといった英語音声の特徴を深層学習に基づくモーフィング技術により段階的にシミュレーション可能な全く新しい実環境リスニング学習支援の実現を目指し、以下の具体的な4項目について検討を行うことを目的とする。(a)多様な話者・地域・スタイルを有する音声コーパスの設計と構築、(b)深層学習に基づくモーフィング音声合成技術の確立、(c)モーフィング音声合成を用いたリスニング学習支援システムの開発、(d)提案システムによる実環境におけるリスニング能力向上の実証実験。2023年度は上記のうち(b)および(c)について話速スタイルの観点から検討を行った。(b)については、Glow-TTSをベースとして話速情報を埋め込むことにより話速および話速に関係するスタイル(話速スタイル)の制御が可能であることを示すとともに、テキストエンコーダの改良により、音声・スタイルの再現性についての改善手法を提案し、その有効性を客観指標により示した。(c)についてはWebベースで利用可能な段階的な話速制御に基づくリスニング学習・評価システムを構築した。(d)については(c)のシステムをクラウドソーシングにおり実際に利用してもらい、従来の話速制御を行わないシステムと比較してリスニング能力が向上することを実験的に示した。

  4. 話者・地域・スタイルモーフィング音声合成による実環境リスニング学習支援

    能勢 隆, 伊藤 彰則

    Offer Organization: 日本学術振興会

    System: 科学研究費助成事業

    Category: 基盤研究(B)

    Institution: 東北大学

    2022/04/01 - 2025/03/31

  5. Research and development of multi-modal interactive English learning system based on deep learning

    ITO Akinori

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Scientific Research (A)

    Institution: Tohoku University

    2017/04/01 - 2021/03/31

    More details Close

    We developed technologies for an English conversation learning system based on deep learning and created a CALL system for practicing English conversation: (1) We established technology for recognizing English speech spoken by Japanese with high accuracy to improve the accuracy of interfaces for speech, facial expressions, and gestures based on deep learning. (2) To establish English pronunciation evaluation and English conversation simulation technology based on deep learning, we investigated the effects of facial expressions and gestures on English proficiency evaluation. In addition, we established a method to evaluate pronunciation with high accuracy for interactive speech. (3) We integrated the technologies to create a spoken dialogue English conversation learning system.

  6. Research and development of a Japanese pronunciation training system using average voice morphing

    NOSE Takashi

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Challenging Exploratory Research

    Institution: Tohoku University

    2016/04/01 - 2019/03/31

    More details Close

    In this study, we aim to make a new framework of realizing low cost, convenient, and convincing system for a Japanese pronunciation training for non-native speakers in Japan. Specifically, we used a statistical parametric speech synthesis with an teacher average-voice model trained using multiple teachers' speech, and achieved a more precise labeling of pronunciation scores by using feature substitution technique for phonetic and prosodic parameters of speech. We trained a prediction model of pronunciation scores for phoneme, accent, and rhythm, and achieved an efficient pronunciation training method by predicting non-native speakers' pronunciation scores.

  7. Study on new vocal design focusing on naturally dehumanized singing

    MORISE Masanori

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Challenging Exploratory Research

    Institution: University of Yamanashi

    2016/04/01 - 2018/03/31

    More details Close

    Vocal design algorithms for approximating the human singing have been proposed with the growth of commercial software such as VOCALOID. On the other hand, there are music contents by the dehumanized singing, and for this purpose, applications such as Auto-Tune are generally used to remove the human-like feature in singing. This study proposes the vocal design algorithm to output the dehumanized and natural singing. In the proposed algorithm, we first proposed the speech analysis/synthesis algorithm. And then, we propose an algorithm for exaggerating several features such as fluctuation of fundamental frequency. We carried out subjective evaluations to verify the effectiveness of the proposed algorithm. The result suggests that the exaggeration can synthesize the singing with a certain level of naturalness and humanness.

  8. Establishment of speech synthesis framework based on Gaussian process regression

    Kobayashi Takao, MOUNGSRI Decha, NAGAHAMA Daiki, NOSE Takashi, ARIFIANTO Dhany

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Scientific Research (B)

    Institution: Tokyo Institute of Technology

    2015/04/01 - 2018/03/31

    More details Close

    The purpose of the research is to develop a novel statistical parametric speech synthesis framework based on Gaussian process regression (GPR). We have proposed prosody generation techniques including pitch pattern prediction and phone duration prediction as well as the spectral parameter generation technique based on GPR. We developed a GPR-based speech synthesis system and showed its effectiveness through assessment of synthetic speech quality. Furthermore, we examined the proposed framework for generating expressive speech. We also examined it for generating more natural-sounding prosody in speech synthesis of a tonal language.

  9. Research of Human-Kind Dialogue System with Recognition and Synthesis of Various Speech Based on State Estimation

    Nose Takashi, MORI Hiroki

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Scientific Research (B)

    Institution: Tohoku University

    2015/04/01 - 2018/03/31

    More details Close

    In this research project, we improved and advanced techniques of recognition and synthesis of various speech, and studied a state estimation technique of system users and its applications to realize a dialogue system kind to users. Specifically, (1) We studied the validity of using emotions and a technique for emotion estimation. (2) We proposed and evaluated a sentence selection technique based on extended entropy where phonetic and prosodic contexts are taken into account. (3) We recorded and analyzed dialogue data for willingness estimation. (4) We constructed a large-scale emotional speech corpus that can be used for emotional speech synthesis/recognition and emotion estimation. (5) We proposed and evaluated variance compensation and taylor-made speech synthesis as a technique of synthesizing various and high-quality speech synthesis.

  10. Affect burst: Analysis and synthesis of unconscious exposition of emotion

    Mori Hiroki, NAGATA Tomohiro

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Scientific Research (B)

    Institution: Utsunomiya University

    2014/04/01 - 2018/03/31

    More details Close

    (1) A multimodal corpus of gaming interaction that easily induces interlocutors' shouts was developed. This corpus contains more than ten times of shouts compared to existing corpora. Analysis of the shouts revealed the acoustical differences to regular words or interjections. (2) A taxonomy of expressive interjections was developed, which enabled the synthesis of the interjections "a" with various forms. A perceptual experiment using the synthesized interjections revealed the relationship between the forms and paralinguistic information. (3) Factors that affect the acoustical properties of laugh calls were identified. Incorporating these factors into the definition of context for the framework of the HMM-based speech synthesis enabled a flexible laughter synthesis. A perceptual experiment revealed the advantage of incorporating these contextual factors with respect to the naturalness of synthesized laughter.

  11. Self-Organized Learning of Speech Recognition and Synthesis Systems

    Shinozaki Takahiro, ARAI Takayuki, WATANABA Shinji, DUH Kevin

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Scientific Research (B)

    Institution: Tokyo Institute of Technology

    2014/04/01 - 2018/03/31

    More details Close

    The purpose of this study is to make self-standing speech and language information processing systems that can learn from a small amount of labeled and a significant amount of unlabeled speech data as well as can automatically optimize its structure and learning conditions. We have proposed evolution strategy based automation method for neural network-based system development, series of semi-supervised learning methods for statistical speech models, and a reinforcement learning method of speech recognition systems. A high-performance Japanese speech recognition system integrating the research results have been published and widely used.

  12. Development of Easy Japanese composition support system using sentence difficulty estimation and speech synthesis

    Ito Akinori, CHIBA Yuya, NAGANO Takeshi

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Scientific Research (B)

    Institution: Tohoku University

    2014/04/01 - 2017/03/31

    More details Close

    We conducted development of Easy Japanese composition support system YANSIS and related investigations. We developed a method of automatic estimation of difficulty of a sentence, and investigated relation between intelligibility of Japanese speech listened by non-Japanese-native speakers and speech rate, pause, and speech degradation by reverberation. This investigation revealed the most appropriate speech rate for Easy Japanese speech. In addition, we implemented the function of automatic sentence difficulty estimation and speech synthesizer into YANSIS.

  13. A study of speech synthesis for achieving synthetic speech with high quality and variability based on hybrid approach

    NOSE Takashi

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Young Scientists (B)

    Institution: Tohoku University

    2013/04/01 - 2015/03/31

    More details Close

    The purpose of this research is to establish hybrid speech synthesis framework that can synthesize human-like speech with various emotional expressions and/or speaking styles using only a limited amount of speech data. We achieved the following six issues in this research. (1) Flexible control of non- or para-linguistic information appearing in synthetic speech. (2) Automatic training of prosodic variations, (3)Expansion to the multi-lingual or cross-lingual speech synthesis, (4)Application to singing voice synthesis, (5) Efficient designing of speech corpus for synthesis, and (6) Improving subjective quality of synthetic speech by modifying the conventional parameter generation method .

  14. Research on speech synthesis using non-parametric modeling based on Gaussian process regression

    KOBAYASHI Takao, NOSE Takashi, KORIYAMA Tomoki

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Challenging Exploratory Research

    Institution: Tokyo Institute of Technology

    2013/04/01 - 2015/03/31

    More details Close

    The purpose of the research is to develop a framework using non-parametric modeling for synthesizing more natural-sounding speech than the conventional HMM-based statistical parametric speech synthesis framework. The proposed modeling approach is based on Gaussian process regression (GPR) and GPR model is designed for directly predicting frame-level acoustic features from corresponding input linguistic information. We have proposed kernel functions for GPR-based speech synthesis and examined several techniques for computational cost reduction, hyper-parameter optimization, and prosody modeling using Gaussian process classification and GPR.

  15. Research on advanced robust speech synthesis and its applications to multi-lingual speech communication

    KOBAYASHI Takao, NOSE Takashi, KORIYAMA Tomoki, ARIFIANTO Dhany

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Scientific Research (B)

    Institution: Tokyo Institute of Technology

    2012/04/01 - 2015/03/31

    More details Close

    The purpose of the research is to develop advanced techniques that enable us to model acoustic features of prosodic information as well as spectral information with being less dependent on quality and quantity of training speech data for synthesizing natural-sounding and diverse expressive speech. We have proposed several robust techniques such as style control and prosody modeling ones and showed their effectiveness through objective and subjective evaluation tests. We have also applied the proposed techniques to under-resourced languages. Furthermore, we examined a cross-lingual speech synthesis technique for universal speech communication.

  16. A study on speech diversification techniques based on corpus design for advanced humanoid speech synthesis

    NOSE Takashi

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Young Scientists (B)

    2011 - 2012

    More details Close

    Our goal in this research is to realize more human-like, natural text-to-speech system with various emotional expressions and speaking styles, and the achievements of our studies are as follows: (1)We proposed a novel corpus-design technique in which accent, style, and sentence-final expression are taken into account. (2)We incorporated user's subjective emotional intensities into acoustic model training to improve the performance of expressive speech synthesis. (3)We proposed an automatic labeling technique of emphasis expression using a parameter generation technique of fundamental frequency to realize emphatic speech synthesis. (4)We proposed cross-lingual speech synthesis using only a target speaker's native language speech samples to synthesis multi-lingual speech at a low cost.

  17. Research on robust spoken language interfaces for diverse voice variability and expressivity

    KOBAYASHI Takao, NAGAHASHI Hiroshi, NOSE Takashi

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Scientific Research (B)

    Institution: Tokyo Institute of Technology

    2009 - 2011

    More details Close

    The purpose of the research is to develop techniques that make the human-computer interaction using speech input/output more robust for variations of users' emotional states, speaking styles, preferences, and expressivity. We have proposed techniques using a quantized fundamental frequency prosodic context for robust speech synthesis and an extended context set for spontaneous conversational speech synthesis. We have also proposed techniques for robust speech recognition including extraction of paralinguistic information and rapid model adaptation.

  18. Study on speech synthesis for humanoid spoken dialog system

    NOSE Takashi

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Research Activity Start-up

    Institution: Tokyo Institute of Technology

    2009 - 2010

    More details Close

    Two novel techniques and an investigation were presented that is key technologies of speech synthesis for the development of humanoid spoken dialog system as follows. (1) Spontaneous speech synthesis based on statistical parametric modeling (2) Speaker-independent voice conversion based on statistical parametric modeling. (3) Investigation of phonetic and prosodic contextual factors in speech synthesis.

Show all Show first 5