Details of the Researcher

PHOTO

Akinori Ito
Section
Graduate School of Engineering
Job title
Professor
Degree
  • 工学博士(東北大学)

  • 工学修士(東北大学)

e-Rad No.
70232428

Research History 7

  • 2010/04 - Present
    Tohoku University Graduate School of Engineering Professor

  • 2002/04 - 2010/03
    Graduate School of Engineering, Tohoku University Associate Professor

  • 1999/10 - 2002/03
    Faculty of Engineering, Yamagata University Associate Professor

  • 1995/04 - 1999/09
    Faculty of Engineering, Yamagata University Lecturer

  • 1998/05 - 1999/04
    College of Engineering, Boston University Visiting Scholar

  • 1992/04 - 1995/03
    Education Center for Information Processing, Tohoku University Assistant Professor

  • 1991/04 - 1992/03
    Research Center for Applied Information Sciences, Tohoku University Assistant Professor

Show all Show first 5

Education 2

  • Tohoku University Graduate School, Division of Engineering Department of Information Engineering

    - 1991/03

  • Tohoku University Faculty of Engineering Department of Commumication Engineering

    - 1986/03

Committee Memberships 42

  • Journal of Information Hiding and Multimedia Signal Processing Associate Editor

    2009/04 - Present

  • 日本音響学会 評議員

    2007/05 - Present

  • 日本音響学会 代議員

    2005/05 - Present

  • 日本音響学会 会長

    2019/05 - 2021/05

  • 日本音響学会 理事

    2009/06 - 2021/05

  • 日本音響学会 編集委員会 委員長

    2015/06 - 2017/06

  • 日本音響学会 編集委員会 委員長

    2015/06 - 2017/06

  • 電子情報通信学会 マルチメディア情報ハイディング・エンリッチメント研究会 委員長

    2015/05 - 2017/04

  • Acoustical Society of Japan Vice President

    2013/06 - 2015/06

  • 日本音響学会 副会長

    2013/06 - 2015/06

  • 情報処理学会 音声言語情報処理研究会 運営委員

    2004/05 - 2015/04

  • 日本音響学会 編集委員会 副主査

    2007/05 - 2009/04

  • 情報処理学会 音楽情報科学研究会 運営委員

    2007/05 - 2009/04

  • 日本音響学会 編集委員会 副主査

    2007/05 - 2009/04

  • 情報処理学会 音楽情報科学研究会 運営委員

    2007/05 - 2009/04

  • 電子情報通信学会 音声研究会 運営委員

    2005/05 - 2008/05

  • 日本音響学会 音声研究会 運営委員

    2005/05 - 2008/05

  • 電子情報通信学会 音声研究会 運営委員

    2005/05 - 2008/05

  • 日本音響学会 音声研究会 運営委員

    2005/05 - 2008/05

  • 日本音響学会 学術委員会 幹事

    2005/09 - 2007/06

  • 日本音響学会 学術委員会 幹事

    2005/09 - 2007/06

  • 日本音響学会 電子化推進委員会 委員

    2005/09 - 2007/05

  • 電子情報通信学会 和文論文誌D 編集委員会 編集幹事

    2005/05 - 2007/04

  • 日本音響学会 編集委員会 編集幹事

    2005/05 - 2007/04

  • 電子情報通信学会 和文論文誌D 編集委員会 編集幹事

    2005/05 - 2007/04

  • 日本音響学会 編集委員会 編集幹事

    2005/05 - 2007/04

  • 日本音響学会 編集委員会 編集委員

    2003/05 - 2005/04

  • 日本音響学会 編集委員会 編集委員

    2003/05 - 2005/04

  • 日本音響学会 東北支部 幹事

    2002/05 - 2005/04

  • 電子情報通信学会 和文論文誌D編集委員会 編集委員

    2002/05 - 2005/04

  • 日本音響学会 東北支部 幹事

    2002/05 - 2005/04

  • 電子情報通信学会 和文論文誌D編集委員会 編集委員

    2002/05 - 2005/04

  • 電子情報通信学会 音声研究会 幹事

    2002/05 - 2004/04

  • 日本音響学会 音声研究会 幹事

    2002/05 - 2004/04

  • 電子情報通信学会 音声研究会 幹事

    2002/05 - 2004/04

  • 日本音響学会 音声研究会 幹事

    2002/05 - 2004/04

  • 情報処理学会 音声言語情報処理研究会 連続音声認識コンソーシアム 実行委員

    2001/01 - 2003/09

  • 情報処理学会 音声言語情報処理研究会 連続音声認識コンソーシアム 実行委員

    2001/01 - 2003/09

  • 情報処理学会 音声言語研究会 連絡委員

    1997/05 - 2001/04

  • 情報処理学会 音声言語研究会 連絡委員

    1997/05 - 2001/04

  • 大学入試センター 教科専門委員会 問題作成部会 委員

    1996/04 - 1997/03

  • 大学入試センター 教科専門委員会 問題作成部会 委員

    1996/04 - 1997/03

Show all ︎Show first 5

Professional Memberships 6

  • Human Interface Society

  • International Speech Communication Association

  • The Institute of Electrical and Electronics Engineers

  • 情報処理学会

  • 電子情報通信学会

  • 日本音響学会

︎Show all ︎Show first 5

Research Interests 5

  • Computer Assisted Language Learning System

  • music information processing

  • natural language processing

  • speech processing

  • speech recognition

Research Areas 2

  • Humanities & social sciences / Foreign language education /

  • Informatics / Intelligent informatics /

Awards 5

  1. Best Paper Award of International Conference on Natural Language Processing and Knowledge Engineering

    2008/10 Organizing Committee of International Conference on Natural Language Processing and Knowledge Engineering

  2. Best Paper Award of International Conference on Intelligent Information Hiding and Multimedia Signal Processing

    2007/11 Organizing Committee of International Conference on Intelligent Information Hiding and Multimedia Signal Processing

  3. Best Paper Award of The 5th International Conference on Education and Information Systems, Technologies and Applications

    2007/07 Organizing Committee of The 5th International Conference on Education and Information Systems, Technologies and Applications

  4. 石田(實)記念財団研究奨励賞

    2003/11/28 石田(實)記念財団 音声言語処理に関する研究

  5. Open Software Prize

    2000/06/07 電子ネットワーク協議会 ソフトウェア“w3m”の開発

Papers 380

  1. Automatic assessment of English proficiency for Japanese learners without reference sentences based on deep neural network acoustic models Peer-reviewed

    Jiang Fu, Yuya Chiba, Takashi Nose, Akinori Ito

    Speech Communication 116 86-97 2020/01

    DOI: 10.1016/j.specom.2019.12.002  

    ISSN: 0167-6393

  2. Japanese Shadowing Training Using Synchronized Partial Captions

    Syuyu Fang, Akinori Ito, Takashi Nose

    2025 13th International Conference on Information and Education Technology (ICIET) 177-181 2025/04/18

    Publisher: IEEE

    DOI: 10.1109/iciet66371.2025.11046256  

  3. Adaptive Depth-Wise Pruning for Efficient Environmental Sound Classification Peer-reviewed

    Changlong Wang, Akinori Ito, Takashi Nose

    IEEE Access 13 69751-69759 2025/04/16

    Publisher: Institute of Electrical and Electronics Engineers (IEEE)

    DOI: 10.1109/access.2025.3561590  

    eISSN: 2169-3536

  4. The Development of an Emotional Embodied Conversational Agent and the Evaluation of the Effect of Response Delay on User Impression Peer-reviewed

    Simon Christophe Jolibois, Akinori Ito, Takashi Nose

    Applied Sciences 15 (8) 4256 2025/04/11

    DOI: 10.3390/app15084256  

  5. Robust Human Tracking Using a 3D LiDAR and Point Cloud Projection for Human-Following Robots Peer-reviewed

    Sora Kitamoto, Yutaka Hiroi, Kenzaburo Miyawaki, Akinori Ito

    Sensors 25 (6) 2025/03/12

    DOI: 10.3390/s25061754  

  6. Reversible Spectral Speech Watermarking with Variable Embedding Locations Against Spectrum-Based Attacks Peer-reviewed

    Xuping Huang, Akinori Ito

    Applied Sciences 15 (1) 381 2025/01/03

    DOI: 10.3390/app15010381  

  7. Unified model for voice conversion of speech and singing voice using adaptive pitch constraints Peer-reviewed

    Shogo Fukawa, Takashi Nose, Shuhei Imai, Akinori Ito

    Acoustical Science and Technology 46 (1) 120-123 2025/01/01

    Publisher: Acoustical Society of Japan

    DOI: 10.1250/ast.e24.47  

    ISSN: 1346-3969

    eISSN: 1347-5177

  8. We open our mouths when we are silent Peer-reviewed

    Shoki Kawanishi, Yuya Chiba, Akinori Ito, Takashi Nose

    Acoustical Science and Technology 46 (1) 96-99 2025/01/01

    Publisher: Acoustical Society of Japan

    DOI: 10.1250/ast.e24.21  

    ISSN: 1346-3969

    eISSN: 1347-5177

  9. Fast end-to-end non-parallel voice conversion based on speaker-adaptive neural vocoder with cycle-consistent learning Peer-reviewed

    Shuhei Imai, Aoi Kanagaki, Takashi Nose, Shogo Fukawa, Akinori Ito

    Acoustical Science and Technology 46 (1) 116-119 2025/01/01

    Publisher: Acoustical Society of Japan

    DOI: 10.1250/ast.e24.46  

    ISSN: 1346-3969

    eISSN: 1347-5177

  10. LLM as decoder: Investigating Lattice-based Speech Recognition Hypotheses Rescoring Using LLM Peer-reviewed

    Sheng Li, Yuka Ko, Akinori Ito

    2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 1-5 2024/12/03

    Publisher: IEEE

    DOI: 10.1109/apsipaasc63619.2025.10848752  

  11. A Study on Variable Embedding Locations of Reversible Spectral Speech Watermarking

    Xuping Huang, Akinori Ito

    2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 1-6 2024/12/03

    Publisher: IEEE

    DOI: 10.1109/apsipaasc63619.2025.10848605  

  12. Suboptimal Allocation of Defense Schedule Using Simulated Annealing Peer-reviewed

    Akinori Ito

    Journal for Academic Computing and Networking 28 106-113 2024/11

    DOI: 10.24669/jacn.28.1_106  

  13. Selection of key sentences from lecture video transcription and its application to feedback to the learner Peer-reviewed

    Miki Takeuchi, Akinori Ito, Takashi Nose

    Proceedings of the 2024 8th International Conference on Education and Multimedia Technology 218-223 2024/06/22

    Publisher: ACM

    DOI: 10.1145/3678726.3678733  

  14. Development of a Personal Guide Robot That Leads a Guest Hand-in-Hand While Keeping a Distance Peer-reviewed

    Hironobu Wakabayashi, Yutaka Hiroi, Kenzaburo Miyawaki, Akinori Ito

    Sensors 24 (7) 2345-2345 2024/04/07

    Publisher: MDPI AG

    DOI: 10.3390/s24072345  

    eISSN: 1424-8220

    More details Close

    This paper proposes a novel tour guide robot, “ASAHI ReBorn”, which can lead a guest by hand one-on-one while maintaining a proper distance from the guest. The robot uses a stretchable arm interface to hold the guest’s hand and adjusts its speed according to the guest’s pace. The robot also follows a given guide path accurately using the Robot Side method, a robot navigation method that follows a pre-defined path quickly and accurately. In addition, a control method is introduced that limits the angular velocity of the robot to avoid the robot’s quick turn while guiding the guest. We evaluated the performance and usability of the proposed robot through experiments and user studies. The tour-guiding experiment revealed that the proposed method that keeps distance between the robot and the guest using the stretchable arm enables the guests to look around the exhibits compared with the condition where the robot moved at a constant velocity.

  15. Imperceptible and Reversible Acoustic Watermarking Based on Modified Integer Discrete Cosine Transform Coefficient Expansion Peer-reviewed

    Xuping Huang, Akinori Ito

    Applied Sciences 14 (7) 2757-2757 2024/03/25

    Publisher: MDPI AG

    DOI: 10.3390/app14072757  

    eISSN: 2076-3417

    More details Close

    This paper aims to explore an alternative reversible digital watermarking solution to guarantee the integrity of and detect tampering with data of probative importance. Since the payload for verification is embedded in the contents, algorithms for reversible embedding and extraction, imperceptibility, payload capacity, and computational time are issues to evaluate. Thus, we propose a reversible and imperceptible audio information-hiding algorithm based on modified integer discrete cosine transform (intDCT) coefficient expansion. In this work, the original signal is segmented into fixed-length frames, and then intDCT is applied to each frame to transform signals from the time domain into integer DCT coefficients. Expansion is applied to DCT coefficients at a higher frequency to reserve hiding capacity. Objective evaluation of speech quality is conducted using listening quality objective mean opinion (MOS-LQO) and the segmental signal-to-noise ratio (segSNR). The audio quality of different frame lengths and capacities is evaluated. Averages of 4.41 for MOS-LQO and 23.314 [dB] for segSNR for 112 ITU-T test signals were obtained with a capacity of 8000 bps, which assured imperceptibility with the sufficient capacity of the proposed method. This shows comparable audio quality to conventional work based on Linear Predictive Coding (LPC) regarding MOS-LQO. However, all segSNR scores of the proposed method have comparable or better performance in the time domain. Additionally, comparing histograms of the normalized maximum absolute value of stego data shows a lower possibility of overflow than the LPC method. A computational cost, including hiding and transforming, is an average of 4.884 s to process a 10 s audio clip. Blind tampering detection without the original data is achieved by the proposed embedding and extraction method.

  16. Character Expressions in Meta-Learning for Extremely Low Resource Language Speech Recognition Peer-reviewed

    Rui Zhou, Akinori Ito, Takashi Nose

    Proceedings of the 2024 16th International Conference on Machine Learning and Computing 2024/02/02

    Publisher: ACM

    DOI: 10.1145/3651671.3651730  

  17. Evaluation of Environmental Sound Classification using Vision Transformer Peer-reviewed

    Changlong Wang, Akinori Ito, Takashi Nose, Chia-Ping Chen

    Proceedings of the 2024 16th International Conference on Machine Learning and Computing 665-669 2024/02/02

    Publisher: ACM

    DOI: 10.1145/3651671.3651733  

  18. Toward Photo-Realistic Facial Animation Generation Based on Keypoint Features Peer-reviewed

    Zikai Shu, Takashi Nose, Akinori Ito

    Proceedings of the 2024 16th International Conference on Machine Learning and Computing 39 334-339 2024/02/02

    Publisher: ACM

    DOI: 10.1145/3651671.3651731  

  19. Speaker Intimacy Estimation in Chat-Talks Based on Verbal and Non-Verbal Information Peer-reviewed

    Yuya Chiba, Akinori Ito

    IEEE Access 12 184592-184606 2024

    DOI: 10.1109/ACCESS.2024.3507945  

  20. A Replaceable Curiosity-Driven Candidate Agent Exploration Approach for Task-Oriented Dialog Policy Learning Peer-reviewed

    Xuecheng Niu, Akinori Ito, Takashi Nose

    IEEE Access 2024

    DOI: 10.1109/ACCESS.2024.3462719  

  21. Multilingual Meta-Transfer Learning for Low-Resource Speech Recognition Peer-reviewed

    Rui Zhou, Takaki Koshikawa, Akinori Ito, Takashi Nose, Chia-Ping Chen

    IEEE Access 2024

    DOI: 10.1109/ACCESS.2024.3486711  

  22. Scheduled Curiosity-Deep Dyna-Q: Efficient Exploration for Dialog Policy Learning Peer-reviewed

    Xuecheng Niu, Akinori Ito, Takashi Nose

    IEEE Access 12 46940-46952 2024

    DOI: 10.1109/ACCESS.2024.3376418  

    eISSN: 2169-3536

  23. Development of a Play-Tag Robot with Human–Robot Contact Peer-reviewed

    Yutaka Hiroi, Kenzaburo Miyawaki, Akinori Ito

    Applied Sciences 13 (23) 12909-12909 2023/12/01

    Publisher: MDPI AG

    DOI: 10.3390/app132312909  

    eISSN: 2076-3417

    More details Close

    Many robots that play with humans have been developed so far, but developing a robot that physically contacts humans while playing is challenging. We have developed robots that play tag with humans, which find players, approach them, and move away from them. However, the developed algorithm for approaching a player was insufficient because it did not consider how the arms are attached to the robot. Therefore, in this paper, we assume that the arms are fixed on both sides of the robot and develop a new algorithm to approach the player and touch them with an arm. Since the algorithm aims to move along a circular orbit around a player, we call this algorithm “the go-round mode”. To investigate the effectiveness of the proposed method, we conducted two experiments. The first is a simulation experiment, which showed that the proposed method outperformed the previous one. In the second experiment, we implemented the proposed method in a real robot and conducted an experiment to chase and touch the player. As a result, the robot could touch the player in all the trials without collision.

  24. Multimodal Expressive Embodied Conversational Agent Design Peer-reviewed

    Simon Jolibois, Akinori Ito, Takashi Nose

    Communications in Computer and Information Science 244-249 2023/07/09

    Publisher: Springer Nature Switzerland

    DOI: 10.1007/978-3-031-35989-7_31  

    ISSN: 1865-0929

    eISSN: 1865-0937

  25. Spoken term detection from utterances of minority languages Invited Peer-reviewed

    Akinori Ito, Satoru Mizuochi, Takashi Nose

    Issues in Japanese Psycholingustics from Comparative Perspectives 1 2023/07

  26. Effect of Data Size and Machine Translation on the Accuracy of Automatic Personality Classification Peer-reviewed

    Yuki Fukazawa, Akinori Ito, Takashi Nose

    Advances in Intelligent Information Hiding and Multimedia Signal Processing 405-413 2023/05/24

    Publisher: Springer Nature Singapore

    DOI: 10.1007/978-981-99-0105-0_36  

    ISSN: 2190-3018

    eISSN: 2190-3026

  27. Spoken Dialogue System Development Without Speech Recognition Towards Language Revitalization Peer-reviewed

    Akinori Ito

    Advances in Intelligent Information Hiding and Multimedia Signal Processing 393-404 2023/05/24

    Publisher: Springer Nature Singapore

    DOI: 10.1007/978-981-99-0105-0_35  

    ISSN: 2190-3018

    eISSN: 2190-3026

  28. A Robotic System for Remote Teaching of Technical Drawing Peer-reviewed

    Yuataka Hiroi, Akinori Ito

    Education Sciences 13 (4) 2023/03/28

    DOI: 10.3390/educsci13040347  

  29. Personality Analysis of Entrepreneurial Text for Entrepreneurship Education Peer-reviewed

    Akinori Ito, Kotaro Takeda, Shuichi Ishida

    2023 5th International Conference on Natural Language Processing (ICNLP) 2023/03

    Publisher: IEEE

    DOI: 10.1109/icnlp58431.2023.00047  

  30. Path Following Algorithm with Small Error for Guide Robot Peer-reviewed

    Hironobu Wakabayashi, Yutaka Hiroi, Kenzaburo Miyawaki, Akinori Ito

    Robot Intelligence Technology and Applications 7 56-67 2023/03/01

    Publisher: Springer International Publishing

    DOI: 10.1007/978-3-031-26889-2_6  

    ISSN: 2367-3370

    eISSN: 2367-3389

  31. Confidence-based Utterance Selection for a Recognizer-free Spoken Dialogue System Peer-reviewed

    Akinori Ito

    Proceedings of the 2023 15th International Conference on Machine Learning and Computing 481-484 2023/02/17

    Publisher: ACM

    DOI: 10.1145/3587716.3587796  

  32. Response Sentence Modification Using a Sentence Vector for a Flexible Response Generation of Retrieval-based Dialogue Systems Peer-reviewed

    Ryota Yahagi, Akinori Ito, Takashi Nose, Yuya Chiba

    2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2022/11/07

    Publisher: IEEE

    DOI: 10.23919/apsipaasc55919.2022.9979841  

  33. Design and Construction of Japanese Multimodal Utterance Corpus with Improved Emotion Balance and Naturalness Peer-reviewed

    Daisuke Horii, Akinori Ito, Takashi Nose

    2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2022/11/07

    Publisher: IEEE

    DOI: 10.23919/apsipaasc55919.2022.9980272  

  34. Multimodal Dialogue Response Timing Estimation Using Dialogue Context Encoder Peer-reviewed

    Ryota Yahagi, Yuya Chiba, Takashi Nose, Akinori Ito

    Lecture Notes in Electrical Engineering 133-141 2022/11/01

    Publisher: Springer Nature Singapore

    DOI: 10.1007/978-981-19-5538-9_9  

    ISSN: 1876-1100

    eISSN: 1876-1119

  35. Combination of deep-learning-based audio separation and speech enhancement for noise reduction of extracted signal from polyphonic music Peer-reviewed

    Soichiro Kobayashi, Takashi Nose, Akinori Ito

    Proceedings of the 24th International Congress of Acoustics 2022/10

  36. Successive Binary Partition K-means Method for Clustering with Less Cluster Size Bias Peer-reviewed

    Akinori Ito

    2022 7th International Conference on Signal and Image Processing (ICSIP) 2022/07/20

    Publisher: IEEE

    DOI: 10.1109/icsip55141.2022.9886452  

  37. Development of a Teleoperated Play Tag Robot with Semi-Automatic Play Peer-reviewed

    Yoshitaka Kasai, Yutaka Hiroi, Kenzaburo Miyawaki, Akinori Ito

    2022 IEEE/SICE International Symposium on System Integration (SII) 2022/01/09

    Publisher: IEEE

    DOI: 10.1109/sii52469.2022.9708883  

  38. Spoken Term Detection of Zero-Resource Language Using Posteriorgram of Multiple Languages

    Satoru MIZUOCHI, Takashi NOSE, Akinori ITO

    Interdisciplinary Information Sciences 28 (1) 1-13 2022

    Publisher: Graduate School of Information Sciences, Tohoku University

    DOI: 10.4036/iis.2022.a.04  

    ISSN: 1340-9050

    eISSN: 1347-6157

  39. Study on the Background Music Cancellation System for Speech Privacy Peer-reviewed

    Jianning Huang, Akinori Ito

    2021 IEEE 6th International Conference on Signal and Image Processing (ICSIP) 2021/10/22

    Publisher: IEEE

    DOI: 10.1109/icsip52628.2021.9688835  

  40. Analysis of Feature Extraction by Convolutional Neural Network for Speech Emotion Recognition Peer-reviewed

    Daisuke Horii, Akinori Ito, Takashi Nose

    2021 IEEE 10th Global Conference on Consumer Electronics (GCCE) 2021/10/12

    Publisher: IEEE

    DOI: 10.1109/gcce53005.2021.9621964  

  41. Speaker Intimacy in Chat-Talks: Analysis and Recognition based on Verbal and Non-Verbal Information Peer-reviewed

    Chiba, Yuya, Yoshihiro Yamazaki, Akinori Ito

    Proceedings of the 25th Workshop on the Semantics and Pragmatics of Dialogue 2021/09

  42. Effect of Training Data Selection for Speech Recognition of Emotional Speech Peer-reviewed

    Yusuke Yamada, Yuya Chiba, Takashi Nose, Akinori Ito

    International Journal of Machine Learning and Computing 11 (5) 362-366 2021/09

  43. Improvement of Automatic English Pronunciation Assessment with Small Number of Utterances Using Sentence Speakability Peer-reviewed

    Satsuki Naijo, Akinori Ito, Takashi Nose

    Interspeech 2021 2021/08/30

    Publisher: ISCA

    DOI: 10.21437/interspeech.2021-1132  

  44. Neural Spoken-Response Generation Using Prosodic and Linguistic Context for Conversational Systems Peer-reviewed

    Yoshihiro Yamazaki, Yuya Chiba, Takashi Nose, Akinori Ito

    Interspeech 2021 2021/08/30

    Publisher: ISCA

    DOI: 10.21437/interspeech.2021-381  

  45. Development of a Mobile Robot That Plays Tag with Touch-and-Away Behavior Using a Laser Range Finder Peer-reviewed

    Yoshitaka Kasai, Yutaka Hiroi, Kenzaburo Miyawaki, Akinori Ito

    Applied Sciences 11 (16) 7522-7522 2021/08/17

    Publisher: MDPI AG

    DOI: 10.3390/app11167522  

    eISSN: 2076-3417

    More details Close

    The development of robots that play with humans is a challenging topic for robotics. We are developing a robot that plays tag with human players. To realize such a robot, it needs to observe the players and obstacles around it, chase a target player, and touch the player without collision. To achieve this task, we propose two methods. The first one is the player tracking method, by which the robot moves towards a virtual circle surrounding the target player. We used a laser range finder (LRF) as a sensor for player tracking. The second one is a motion control method after approaching the player. Here, the robot moves away from the player by moving towards the opposite side to the player. We conducted a simulation experiment and an experiment using a real robot. Both experiments proved that with the proposed tracking method, the robot properly chased the player and moved away from the player without collision. The contribution of this paper is the development of a robot control method to approach a human and then move away safely.

  46. SMOC corpus: A large-scale Japanese spontaneous multimodal one-on-one chat-talk corpus for dialog systems Peer-reviewed

    Yoshihiro Yamazaki, Yuya Chiba, Takashi Nose, Akinori Ito

    Acoustical Science and Technology 42 (4) 210-213 2021/07/01

    Publisher: Acoustical Society of Japan

    DOI: 10.1250/ast.42.210  

    ISSN: 1346-3969

    eISSN: 1347-5177

  47. A Light-weight Hand-waving Gesture Recognition Method Using Kinect V2 and Frequency Analysis Peer-reviewed

    Yuki Misaki, Yutaka Hiroi, Akinori Ito

    2021 IEEE/SICE International Symposium on System Integration, SII 2021 750-755 2021/01/11

    DOI: 10.1109/IEEECONF49454.2021.9382709  

  48. CycleGAN-Based High-Quality Non-Parallel Voice Conversion with Spectrogram and WaveRNN Peer-reviewed

    Aoi Kanagaki, Masaya Tanaka, Takashi Nose, Ryohei Shimizu, Akira Ito, Akinori Ito

    2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020 356-357 2020/10/13

    DOI: 10.1109/GCCE50665.2020.9291952  

  49. Incremental response generation using prefix-to-prefix model for dialogue system Peer-reviewed

    Ryota Yahagi, Yuya Chiba, Takashi Nose, Akinori Ito

    2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020 349-350 2020/10/13

    DOI: 10.1109/GCCE50665.2020.9291883  

  50. A study on minimum spectral error analysis of speech Peer-reviewed

    Takuma Hayasaka, Takashi Nose, Akinori Ito

    2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020 362-363 2020/10/13

    DOI: 10.1109/GCCE50665.2020.9291840  

  51. Filler prediction based on bidirectional LSTM for generation of natural response of spoken dialog Peer-reviewed

    Yoshihiro Yamazaki, Yuya Chiba, Takashi Nose, Akinori Ito

    2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020 360-361 2020/10/13

    DOI: 10.1109/GCCE50665.2020.9291867  

  52. Successive Japanese lyrics generation based on encoder-decoder model Peer-reviewed

    Rikiya Takahashi, Takashi Nose, Yuya Chiba, Akinori Ito

    2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020 126-127 2020/10/13

    DOI: 10.1109/GCCE50665.2020.9291718  

  53. Analysis and Estimation of Sentence Speakability for English Pronunciation Evaluation Peer-reviewed

    Satsuki Naijo, Yuya Chiba, Takashi Nose, Akinori Ito

    2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020 353-355 2020/10/13

    DOI: 10.1109/GCCE50665.2020.9292072  

  54. LJSing: large-scale singing voice corpus of single Japanese singer Peer-reviewed

    Takuto Fujimura, Takashi Nose, Akinori Ito

    2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020 364-365 2020/10/13

    DOI: 10.1109/GCCE50665.2020.9291704  

  55. Improving Pronunciation Clarity of Dysarthric Speech Using CycleGAN with Multiple Speakers Peer-reviewed

    Shuhei Imai, Takashi Nose, Aoi Kanagaki, Satoshi Watanabe, Akinori Ito

    2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020 366-367 2020/10/13

    DOI: 10.1109/GCCE50665.2020.9292041  

  56. Spoken term detection based on acoustic models trained in multiple languages for zero-resource language Peer-reviewed

    Satoru Mizuochi, Yuya Chiba, Takashi Nose, Akinori Ito

    2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020 351-352 2020/10/13

    DOI: 10.1109/GCCE50665.2020.9291761  

  57. Integration of accent sandhi and prosodic features estimation for japanese text-to-speech synthesis Peer-reviewed

    Daisuke Fujimaki, Takashi Nose, Akinori Ito

    2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020 358-359 2020/10/13

    DOI: 10.1109/GCCE50665.2020.9291906  

  58. Language modeling in speech recognition for grammatical error detection based on neural machine translation Peer-reviewed

    Jiang Fu, Yuya Chiba, Takashi Nose, Akinori Ito

    Acoustical Science and Technology 41 (5) 788-791 2020/09/01

    Publisher: Acoustical Society of Japan

    DOI: 10.1250/ast.41.788  

    ISSN: 1346-3969

    eISSN: 1347-5177

  59. Construction and analysis of a multimodal chat-talk corpus for dialog systems considering interpersonal closeness Peer-reviewed

    Yoshihiro Yamazaki, Yuya Chiba, Takashi Nose, Akinori Ito

    LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings 443-448 2020

  60. Multi-stream attention-based BLSTM with feature segmentation for speech emotion recognition Peer-reviewed

    Yuya Chiba, Takashi Nose, Akinori Ito

    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2020-October 3301-3305 2020

    DOI: 10.21437/Interspeech.2020-1199  

    ISSN: 2308-457X

    eISSN: 1990-9772

  61. A symbol-level melody completion based on a convolutional neural network with generative adversarial learning Peer-reviewed

    Kosuke Nakamura, Takashi Nose, Yuya Chiba, Akinori Ito

    Journal of Information Processing 28 248-257 2020

    DOI: 10.2197/ipsjjip.28.248  

    ISSN: 0387-5806

    eISSN: 1882-6652

  62. Human-machine metacommunication towards development of a human-like agent: A short review Peer-reviewed

    Akinori Ito

    Acoustical Science and Technology 41 (1) 166-169 2020

    DOI: 10.1250/ast.41.166  

    ISSN: 1346-3969

    eISSN: 1347-5177

  63. Evaluation of Person Tracking Methods for Human-Robot Physical Play Peer-reviewed

    Koyuki Ikemoto, Yutaka Hiroi, Akinori Ito

    Proceedings of the 2020 IEEE/SICE International Symposium on System Integration, SII 2020 416-421 2020/01

    DOI: 10.1109/SII46433.2020.9026275  

  64. A pedestrian avoidance method considering personal space for a guide robot Peer-reviewed

    Yutaka Hiroi, Akinori Ito

    Robotics 8 (4) 2019/12/01

    DOI: 10.3390/ROBOTICS8040097  

    eISSN: 2218-6581

  65. Realization of a robot system that plays "darumasan-ga-koronda" game with humans Peer-reviewed

    Yutaka Hiroi, Akinori Ito

    Robotics 8 (3) 2019/09/01

    DOI: 10.3390/robotics8030055  

    eISSN: 2218-6581

  66. Improving human scoring of prosody using parametric speech synthesis Peer-reviewed

    Hafiyan Prafianto, Takashi Nose, Yuya Chiba, Akinori Ito

    Speech Communication 111 14-21 2019/08

    Publisher: Elsevier {BV}

    DOI: 10.1016/j.specom.2019.06.001  

    ISSN: 0167-6393

  67. Effect of Mutual Self-Disclosure in Spoken Dialog System on User Impression Peer-reviewed

    Shunsuke Tada, Yuya Chiba, Takashi Nose, Akinori Ito

    2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings 806-810 2019/03/04

    DOI: 10.23919/APSIPA.2018.8659630  

  68. Latent words recurrent neural network language models for automatic speech recognition Peer-reviewed

    Ryo Masumura, Taichi Asami, Takanobu Oba, Sumitaka Sakauchi, Akinori Ito

    IEICE Transactions on Information and Systems E102D (12) 2557-2567 2019

    DOI: 10.1587/transinf.2018EDP7242  

    ISSN: 0916-8532

    eISSN: 1745-1361

  69. Preface

    Jeng Shyang Pan, Akinori Ito, Pei Wei Tsai, Lakhmi C. Jain

    Smart Innovation, Systems and Technologies 110 v-vi 2019

    DOI: 10.1109/ICB.2012.6199777  

    ISSN: 2190-3018

    eISSN: 2190-3026

  70. Multi-condition training for noise-robust speech emotion recognition Peer-reviewed

    Yuya Chiba, Takashi Nose, Akinori Ito

    Acoustical Science and Technology 40 (6) 406-409 2019

    DOI: 10.1250/ast.40.406  

    ISSN: 1346-3969

    eISSN: 1347-5177

  71. Evaluation of English speech recognition for Japanese learners using DNN-based acoustic models Peer-reviewed

    Jiang Fu, Yuya Chiba, Takashi Nose, Akinori Ito

    Smart Innovation, Systems and Technologies 110 93-100 2019

    DOI: 10.1007/978-3-030-03748-2_11  

    ISSN: 2190-3018

    eISSN: 2190-3026

  72. Comparison of speech recognition performance between kaldi and google cloud speech API Peer-reviewed

    Takashi Kimura, Takashi Nose, Shinji Hirooka, Yuya Chiba, Akinori Ito

    Smart Innovation, Systems and Technologies 110 109-115 2019

    DOI: 10.1007/978-3-030-03748-2_13  

    ISSN: 2190-3018

    eISSN: 2190-3026

  73. Segmental pitch control using speech input based on differential contexts and features for customizable neural speech synthesis Peer-reviewed

    Shinya Hanabusa, Takashi Nose, Akinori Ito

    Smart Innovation, Systems and Technologies 110 124-131 2019

    DOI: 10.1007/978-3-030-03748-2_15  

    ISSN: 2190-3018

    eISSN: 2190-3026

  74. Melody completion based on convolutional neural networks and generative adversarial learning Peer-reviewed

    Kosuke Nakamura, Takashi Nose, Yuya Chiba, Akinori Ito

    Smart Innovation, Systems and Technologies 110 116-123 2019

    DOI: 10.1007/978-3-030-03748-2_14  

    ISSN: 2190-3018

    eISSN: 2190-3026

  75. Two-stage sequence-to-sequence neural voice conversion with low-to-high definition spectrogram mapping Peer-reviewed

    Sou Miyamoto, Takashi Nose, Kazuyuki Hiroshiba, Yuri Odagiri, Akinori Ito

    Smart Innovation, Systems and Technologies 110 132-139 2019

    DOI: 10.1007/978-3-030-03748-2_16  

    ISSN: 2190-3018

    eISSN: 2190-3026

  76. DNN-based talking movie generation with face direction consideration Peer-reviewed

    Toru Ishikawa, Takashi Nose, Akinori Ito

    Smart Innovation, Systems and Technologies 110 157-164 2019

    DOI: 10.1007/978-3-030-03748-2_19  

    ISSN: 2190-3018

    eISSN: 2190-3026

  77. A study on a spoken dialogue system with cooperative emotional speech synthesis using acoustic and linguistic information Peer-reviewed

    Mai Yamanaka, Yuya Chiba, Takashi Nose, Akinori Ito

    Smart Innovation, Systems and Technologies 110 101-108 2019

    DOI: 10.1007/978-3-030-03748-2_12  

    ISSN: 2190-3018

    eISSN: 2190-3026

  78. Leveraging a small corpus by different frame shifts for training of a speech recognizer Peer-reviewed

    Akinori Ito

    Smart Innovation, Systems and Technologies 110 82-89 2019

    DOI: 10.1007/978-3-030-03748-2_10  

    ISSN: 2190-3018

    eISSN: 2190-3026

  79. Muting machine speech using audio watermarking Peer-reviewed

    Akinori Ito

    Smart Innovation, Systems and Technologies 110 74-81 2019

    DOI: 10.1007/978-3-030-03748-2_9  

    ISSN: 2190-3018

    eISSN: 2190-3026

  80. Improvement of accent sandhi rules based on Japanese accent dictionaries Peer-reviewed

    Hiroto Aoyama, Takashi Nose, Yuya Chiba, Akinori Ito

    Smart Innovation, Systems and Technologies 110 140-148 2019

    DOI: 10.1007/978-3-030-03748-2_17  

    ISSN: 2190-3018

    eISSN: 2190-3026

  81. Multiple player detection and tracking method using a laser range finder for a robot that plays with human Peer-reviewed

    Yuko Nakamori, Yutaka Hiroi, Akinori Ito

    ROBOMECH Journal 5 (1) 25 2018/12/01

    DOI: 10.1186/s40648-018-0122-x  

    eISSN: 2197-4225

  82. A study on ship type identification by use of deep neural network

    西村 竜一, 天間 克宏, 服部 聖彦, 金子 健司, 伊藤 彰則, 藤井 豊展, 木島 明博

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 118 (234) 1-6 2018/10

    Publisher: 電子情報通信学会

    ISSN: 0913-5685

  83. An Analysis of the Effect of Emotional Speech Synthesis on Non-Task-Oriented Dialogue System. Peer-reviewed

    Yuya Chiba, Takashi Nose, Taketo Kase, Mai Yamanaka, Akinori Ito

    Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, Melbourne, Australia, July 12-14, 2018 371-375 2018/07

    Publisher: Association for Computational Linguistics

  84. Improving User Impression in Spoken Dialog System with Gradual Speech Form Control. Peer-reviewed

    Yukiko Kageyama, Yuya Chiba, Takashi Nose, Akinori Ito

    Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, Melbourne, Australia, July 12-14, 2018 235-240 2018/07

    Publisher: Association for Computational Linguistics

  85. Domain adaptation based on mixture of latent words language models for automatic speech recognition Peer-reviewed

    Ryo Masumura, Taichi Asami, Takanobu Oba, Hirokazu Masataki, Sumitaka Sakauchi, Akinori Ito

    IEICE Transactions on Information and Systems E101D (6) 1581-1590 2018/06

    Publisher: Institute of Electronics, Information and Communication, Engineers, IEICE

    DOI: 10.1587/transinf.2017EDP7210  

    ISSN: 0916-8532

    eISSN: 1745-1361

  86. Analyses of example sentences collected by conversation for example-based non-task-oriented dialog system Peer-reviewed

    Yukiko Kageyama, Yuya Chiba, Takashi Nose, Akinori Ito

    IAENG International Journal of Computer Science 45 (2) 285-293 2018/05/28

    ISSN: 1819-656X

    eISSN: 1819-9224

  87. Spoken term detection of zero-resource language using machine learning Peer-reviewed

    Akinori Ito, Masatoshi Koizumi

    ACM International Conference Proceeding Series 45-49 2018/02/26

    DOI: 10.1145/3193063.3193068  

  88. Analysis of efficient multimodal features for estimating user's willingness to talk: Comparison of human-machine and human-human dialog Peer-reviewed

    Yuya Chiba, Takashi Nose, Akinori Ito

    Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017 2018-February 428-431 2018/02/05

    Publisher: IEEE

    DOI: 10.1109/APSIPA.2017.8282069  

  89. Analysis of efficient multimodal features for estimating user's willingness to talk: Comparison of human-machine and human-human dialog Peer-reviewed

    Yuya Chiba, Takashi Nose, Akinori Ito

    Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017 2018-February 428-431 2018/02/05

    DOI: 10.1109/APSIPA.2017.8282069  

  90. Enhancement of person detection and tracking for a robot that plays with human Peer-reviewed

    Yuko Nakamori, Yutaka Hiroi, Akinori Ito

    SII 2017 - 2017 IEEE/SICE International Symposium on System Integration 2018-January 494-499 2018/02/01

    Publisher: IEEE

    DOI: 10.1109/SII.2017.8279261  

  91. Enhancement of person detection and tracking for a robot that plays with human Peer-reviewed

    Yuko Nakamori, Yutaka Hiroi, Akinori Ito

    SII 2017 - 2017 IEEE/SICE International Symposium on System Integration 2018-January 494-499 2018/02/01

    DOI: 10.1109/SII.2017.8279261  

  92. Special section on enriched multimedia — Potential and possibility of multimedia contents for the future

    Akinori Ito

    IEICE Transactions on Information and Systems E101D (1) 1 2018

    DOI: 10.1587/transinf.2017MUF0001  

    ISSN: 0916-8532

    eISSN: 1745-1361

  93. Dialog-based interactive movie recommendation: Comparison of dialog strategies Peer-reviewed

    Hayato Mori, Yuya Chiba, Takashi Nose, Akinori Ito

    Smart Innovation, Systems and Technologies 82 77-83 2018

    Publisher: Springer Science and Business Media Deutschland GmbH

    DOI: 10.1007/978-3-319-63859-1_10  

    ISSN: 2190-3018

    eISSN: 2190-3026

  94. Response selection of interview-based dialog system using user focus and semantic orientation Peer-reviewed

    Shunsuke Tada, Yuya Chiba, Takashi Nose, Akinori Ito

    Smart Innovation, Systems and Technologies 82 84-90 2018

    Publisher: Springer Science and Business Media Deutschland GmbH

    DOI: 10.1007/978-3-319-63859-1_11  

    ISSN: 2190-3018

    eISSN: 2190-3026

  95. Detection of singing mistakes from singing voice Peer-reviewed

    Isao Miyagawa, Yuya Chiba, Takashi Nose, Akinori Ito

    Smart Innovation, Systems and Technologies 82 130-136 2018

    Publisher: Springer Science and Business Media Deutschland GmbH

    DOI: 10.1007/978-3-319-63859-1_17  

    ISSN: 2190-3018

    eISSN: 2190-3026

  96. Evaluation of nonlinear tempo modification methods based on sinusoidal modeling Peer-reviewed

    Kosuke Nakamura, Yuya Chiba, Takashi Nose, Akinori Ito

    Smart Innovation, Systems and Technologies 82 104-111 2018

    Publisher: Springer Science and Business Media Deutschland GmbH

    DOI: 10.1007/978-3-319-63859-1_14  

    ISSN: 2190-3018

    eISSN: 2190-3026

  97. Development and evaluation of julius-compatible interface for Kaldi ASR Peer-reviewed

    Yusuke Yamada, Takashi Nose, Yuya Chiba, Akinori Ito, Takahiro Shinozaki

    Smart Innovation, Systems and Technologies 82 91-96 2018

    Publisher: Springer Science and Business Media Deutschland GmbH

    DOI: 10.1007/978-3-319-63859-1_12  

    ISSN: 2190-3018

    eISSN: 2190-3026

  98. Voice conversion from arbitrary speakers based on deep neural networks with adversarial learning Peer-reviewed

    Sou Miyamoto, Takashi Nose, Suzunosuke Ito, Harunori Koike, Yuya Chiba, Akinori Ito, Takahiro Shinozaki

    Smart Innovation, Systems and Technologies 82 97-103 2018

    Publisher: Springer Science and Business Media Deutschland GmbH

    DOI: 10.1007/978-3-319-63859-1_13  

    ISSN: 2190-3018

    eISSN: 2190-3026

  99. A study on 2D photo-realistic facial animation generation using 3D facial feature points and deep neural networks Peer-reviewed

    Kazuki Sato, Takashi Nose, Akira Ito, Yuya Chiba, Akinori Ito, Takahiro Shinozaki

    Smart Innovation, Systems and Technologies 82 113-118 2018

    Publisher: Springer Science and Business Media Deutschland GmbH

    DOI: 10.1007/978-3-319-63859-1_15  

    ISSN: 2190-3018

    eISSN: 2190-3026

  100. Foreword

    Akinori Ito

    IEICE Transactions on Information and Systems E101D (1) 1 2018/01

    DOI: 10.1587/transinf.2017MUF0001  

    ISSN: 0916-8532

    eISSN: 1745-1361

  101. Analyzing effect of physical expression on English proficiency for multimodal computer-assisted language learning Peer-reviewed

    Haoran Wu, Yuya Chiba, Takashi Nose, Akinori Ito

    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2018-September 1746-1750 2018/01/01

    Publisher: ISCA

    DOI: 10.21437/Interspeech.2018-1425  

    ISSN: 2308-457X

    eISSN: 1990-9772

  102. Analysis of preferred speaking rate and pause in spoken Easy Japanese for non-native listeners Peer-reviewed

    Hafiyan Prafiyanto, Takashi Nose, Yuya Chiba, Akinori Ito

    Acoustical Science and Technology 39 (2) 92-100 2018

    Publisher: Acoustical Society of Japan

    DOI: 10.1250/ast.39.92  

    ISSN: 1346-3969

    eISSN: 1347-5177

  103. Guest editorial: Introduction to the special issue on the enrichment of sound, speech and music media

    Yôiti Suzuki, Akinori Ito, Kazuhiro Kondo

    Journal of Information Hiding and Multimedia Signal Processing 8 (6) 1323-1324 2017/11

    Publisher: Ubiquitous International

    ISSN: 2073-4212

    eISSN: 2073-4239

  104. Enrichment of audio signal using side information Peer-reviewed

    Akinori Ito

    Journal of Information Hiding and Multimedia Signal Processing 8 (6) 1325-1334 2017/11

    ISSN: 2073-4212

    eISSN: 2073-4239

  105. Manipulating vocal signal in mixed music sounds using side information based on the fundamental frequency Peer-reviewed

    Akinori Ito, Yuto Sasaki

    Journal of Information Hiding and Multimedia Signal Processing 8 (6) 1372-1381 2017/11

    ISSN: 2073-4212

    eISSN: 2073-4239

  106. HMM-Based Photo-Realistic Talking Face Synthesis Using Facial Expression Parameter Mapping with Deep Neural Networks Peer-reviewed

    Kazuki Sato, Takashi Nose, Akinori Ito

    Journal of Computer and Communications 5 (10) 55-65 2017/08

    DOI: 10.4236/jcc.2017.510006  

  107. 日常音識別による活動記録自動生成のためのデータの収集と分析

    古谷崇拓, 千葉祐弥, 能勢隆, 伊藤彰則

    情報処理学会研究報告 1-6 2017/06/17

  108. Cluster-based approach to discriminate the user’s state whether a user is embarrassed or thinking to an answer to a prompt Peer-reviewed

    Yuya Chiba, Takashi Nose, Akinori Ito

    Journal on Multimodal User Interfaces 11 (2) 185-196 2017/06/01

    DOI: 10.1007/s12193-017-0238-y  

    ISSN: 1783-7677

    eISSN: 1783-8738

  109. Construction and analysis of phonetically and prosodically balanced emotional speech database Peer-reviewed

    Emika Takeishi, Takashi Nose, Yuya Chiba, Akinori Ito

    2016 Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques, O-COCOSDA 2016 16-21 2017/05/03

    Publisher: Institute of Electrical and Electronics Engineers Inc.

    DOI: 10.1109/ICSDA.2016.7918977  

  110. Recognition of sounds using square cauchy mixture distribution Peer-reviewed

    Akinori Ito

    2016 IEEE International Conference on Signal and Image Processing, ICSIP 2016 726-730 2017/03/27

    DOI: 10.1109/SIPROCESS.2016.7888359  

  111. A precise evaluation method of prosodic quality of non-native speakers using average voice and prosody substitution Peer-reviewed

    Hafiyan Prafianto, Takashi Nose, Akinori Ito

    ICALIP 2016 - 2016 International Conference on Audio, Language and Image Processing - Proceedings 208-212 2017/02/07

    DOI: 10.1109/ICALIP.2016.7846620  

  112. A Compression Method for Spherical Microphone Array Recordings using Principal Component Analysis Peer-reviewed

    Hironori Sato, Arif Wicaksono, Shuichi Sakamoto, Cesar Salvador, Jorge Trevino, Yôiti Suzuki, Akinori Ito

    Proc. 2017 RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing (NCSP'17) 2PM1-3-4 433-436 2017/02

  113. Special section on enriched multimedia -new technology trends in creation, utilization and protection of multimedia information

    Akinori Ito

    IEICE Transactions on Information and Systems E100D (1) 1 2017/01

    ISSN: 0916-8532

    eISSN: 1745-1361

  114. Demonstration experiment of data hiding into OOXML document for suppression of plagiarism Peer-reviewed

    Akinori Ito

    Smart Innovation, Systems and Technologies 63 3-10 2017

    DOI: 10.1007/978-3-319-50209-0_1  

    ISSN: 2190-3018

    eISSN: 2190-3026

  115. Estimation of user’s willingness to talk about the topic: Analysis of interviews between humans Peer-reviewed

    Yuya Chiba, Akinori Ito

    Lecture Notes in Electrical Engineering 999 LNEE 411-419 2017

    Publisher: Springer Verlag

    DOI: 10.1007/978-981-10-2585-3_34  

    ISSN: 1876-1100

    eISSN: 1876-1119

  116. Collection of example sentences for non-task-oriented dialog using a spoken dialog system and comparison with hand-crafted DB Peer-reviewed

    Yukiko Kageyama, Yuya Chiba, Takashi Nose, Akinori Ito

    Communications in Computer and Information Science 713 458-464 2017

    Publisher: Springer Verlag

    DOI: 10.1007/978-3-319-58750-9_63  

    ISSN: 1865-0929

  117. Synthesis of photo-realistic facial animation from text based on HMM and DNN with animation unit Peer-reviewed

    Kazuki Sato, Takashi Nose, Akinori Ito

    Smart Innovation, Systems and Technologies 64 29-36 2017

    DOI: 10.1007/978-3-319-50212-0_4  

    ISSN: 2190-3018

    eISSN: 2190-3026

  118. Development of an easy Japanese writing support system with text-to-speech function Peer-reviewed

    Takeshi Nagano, Hafiyan Prafianto, Takashi Nose, Akinori Ito

    Smart Innovation, Systems and Technologies 64 221-228 2017

    DOI: 10.1007/978-3-319-50212-0_27  

    ISSN: 2190-3018

    eISSN: 2190-3026

  119. A study on tailor-made speech synthesis based on deep neural networks Peer-reviewed

    Shuhei Yamada, Takashi Nose, Akinori Ito

    Smart Innovation, Systems and Technologies 63 159-166 2017

    DOI: 10.1007/978-3-319-50209-0_20  

    ISSN: 2190-3018

    eISSN: 2190-3026

  120. Foreword. Invited

    Akinori Ito

    IEICE Transactions 100-D (1) 1 2017

    DOI: 10.1587/transinf.2016MUF0001  

  121. A Crowd Avoidance Method Using Circular Avoidance Path for Robust Person Following Peer-reviewed

    Kohei Morishita, Yutaka Hiroi, Akinori Ito

    Journal of Robotics 2017 1 2017

    Publisher: Hindawi Limited

    DOI: 10.1155/2017/3148202  

    ISSN: 1687-9600

    eISSN: 1687-9619

  122. Multiple description vector quantizer design based on redundant representation of central code Peer-reviewed

    Akinori Ito

    European Signal Processing Conference 2016-November 106-109 2016/11/28

    DOI: 10.1109/EUSIPCO.2016.7760219  

    ISSN: 2219-5491

  123. Investigation of combining various major language model technologies including data expansion and adaptation Peer-reviewed

    Ryo Masumura, Taichi Asami, Takanobu Oba, Hirokazu Masataki, Sumitaka Sakauchi, Akinori Ito

    IEICE Transactions on Information and Systems E99D (10) 2452-2461 2016/10

    DOI: 10.1587/transinf.2016SLP0013  

    ISSN: 0916-8532

    eISSN: 1745-1361

  124. Tempo Modification of Mixed Music Signal by Nonlinear Time Scaling and Sinusoidal Modeling Peer-reviewed

    Tsukasa Nishino, Takashi Nose, Akinori Ito

    Proceedings - 2015 International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2015 146-149 2016/02/19

    DOI: 10.1109/IIH-MSP.2015.86  

  125. Conversion of Speaker's Face Image Using PCA and Animation Unit for Video Chatting Peer-reviewed

    Yuki Saito, Takashi Nose, Takahiro Shinozaki, Akinori Ito

    Proceedings - 2015 International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2015 433-436 2016/02/19

    DOI: 10.1109/IIH-MSP.2015.85  

  126. Playing with a Robot: Realization of «red Light, Green Light» Using a Laser Range Finder Peer-reviewed

    Keisuke Sakai, Yutaka Hiroi, Akinori Ito

    Proceedings - 2015 3rd International Conference on Robot, Vision and Signal Processing, RVSP 2015 1-4 2016/02/03

    DOI: 10.1109/RVSP.2015.9  

  127. Estimating the user's state before exchanging utterances using intermediate acoustic features for spoken dialog systems Peer-reviewed

    Yuya Chiba, Takashi Nose, Masashi Ito, Akinori Ito

    IAENG International Journal of Computer Science 43 (1) 1-9 2016/02/01

    ISSN: 1819-656X

    eISSN: 1819-9224

  128. DNNを利用したAnimation Unitの変換に基づく顔画像変換の検討 Peer-reviewed

    齋藤優貴, 能勢隆, 伊藤彰則

    電子情報通信学会論文誌 J199-D (11) 1112-1115 2016

  129. Multiple Description Vector Quantizer Design Based on Redundant Representation of Central Code Peer-reviewed

    Akinori Ito

    2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO) 106-109 2016

    DOI: 10.1109/EUSIPCO.2016.7760219  

    ISSN: 2076-1465

  130. Influence of the height of a robot on comfortableness of verbal interaction Peer-reviewed

    Yutaka Hiroi, Akinori Ito

    IAENG International Journal of Computer Science 43 (4) 447-455 2016

    ISSN: 1819-656X

    eISSN: 1819-9224

  131. 発話状態推定に基づく協調的感情音声合成による音声対話システムの評価 Peer-reviewed

    加瀬嵩人, 能勢隆, 千葉祐弥, 伊藤彰則

    電子情報通信学会誌A J199-A (1) 25-35 2016/01/01

  132. Estimation of User's Willingness to Talk About the Topic: Analysis of Interviews Between Humans. Peer-reviewed

    Yuya Chiba, Akinori Ito

    Dialogues with Social Robots - Enablements, Analyses, and Evaluation, Seventh International Workshop on Spoken Dialogue Systems, IWSDS 2016, Saariselkä, Finland, January 13-16, 2016 411-419 2016

    Publisher: Springer

    DOI: 10.1007/978-981-10-2585-3_34  

  133. Investigation of Pause Insertion Effect in Spoken Easy Japanese for Non-Native Listeners Peer-reviewed

    Hafiyan Prafianto, Takeshi Nagano, Takashi Nose, Akinori Ito

    Proceedings of 12th Western Pacific Acoustics Conference 507-511 2015/12/08

  134. Automatic Generation of Proper Noun Entries in a Speech Recognizer for Local Information Recognition Peer-reviewed

    Kenta Shiga, Takashi Nose, Akinori Ito, Ryo Masumura, Hirokazu Masataki

    Proceedings of 12th Western Pacific Acoustics Conference 2015/12/08

  135. Development of a mobile robot moving on a handrail - Control for preceding a person keeping a distance Peer-reviewed

    Yuma Fujiwara, Yutaka Hiroi, Yuki Tanaka, Akinori Ito

    Proceedings - IEEE International Workshop on Robot and Human Interactive Communication 2015-November 413-418 2015/11/20

    DOI: 10.1109/ROMAN.2015.7333579  

  136. YANSIS: An “Easy Japanese” writing support system Peer-reviewed

    Takeshi Nagano, Akinori Ito

    Proceedings of 8th International Conference ICT for Language Learning 2015/11/12

  137. A Computer-Assisted English Conversation Training System for Response-Timing-Aware Oral Conversation Exercise Peer-reviewed

    Naoto Suzuki, Yutaka Hiroi, Yuya Chiba, Takashi Nose, Akinori Ito

    情報処理学会論文誌 56 (11) 2177-2189 2015/11/01

    ISSN: 1882-7764

  138. Investigation of Precision of Human Perception of Pointing Gesture and a Method for Precision Improvement Peer-reviewed

    廣井 富, 伊藤 彰則

    情報処理学会論文誌 56 (8) 1634-1645 2015/08/15

    ISSN: 1882-7764

  139. Robot: Have i done something wrong? - Analysis of prosodic features of speech commands under the robot's unintended behavior Peer-reviewed

    Noriko Totsuka, Yuya Chiba, Takashi Nose, Akinori Ito

    ICALIP 2014 - 2014 International Conference on Audio, Language and Image Processing, Proceedings 887-890 2015/01/13

    DOI: 10.1109/ICALIP.2014.7009922  

  140. Subjective evaluation of packet loss recovery techniques for voice over IP Peer-reviewed

    Masahito Okamoto, Takashi Nose, Akinori Ito, Takeshi Nagano

    ICALIP 2014 - 2014 International Conference on Audio, Language and Image Processing, Proceedings 711-714 2015/01/13

    DOI: 10.1109/ICALIP.2014.7009887  

  141. A study on the effect of speech rate on perception of spoken easy Japanese using speech synthesis Peer-reviewed

    Hafiyan Prafianto, Takashi Nose, Yuya Chiba, Akinori Ito, Kazuyuki Sato

    ICALIP 2014 - 2014 International Conference on Audio, Language and Image Processing, Proceedings 476-479 2015/01/13

    DOI: 10.1109/ICALIP.2014.7009839  

  142. Hierarchical Latent Words Language Models for Robust Modeling to Out-Of Domain Tasks. Peer-reviewed

    Ryo Masumura, Taichi Asami, Takanobu Oba, Hirokazu Masataki, Sumitaka Sakauchi, Akinori Ito

    Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015 1896-1901 2015

  143. On appropriateness and estimation of the emotion of synthesized response speech in a spoken dialogue system Peer-reviewed

    Taketo Kase, Takashi Nose, Akinori Ito

    Communications in Computer and Information Science 528 747-752 2015

    DOI: 10.1007/978-3-319-21380-4_126  

    ISSN: 1865-0929

  144. On appropriateness and estimation of the emotion of synthesized response speech in a spoken dialogue system Peer-reviewed

    Taketo Kase, Takashi Nose, Akinori Ito

    Communications in Computer and Information Science 528 747-752 2015

    Publisher: Springer Verlag

    DOI: 10.1007/978-3-319-21380-4_126  

    ISSN: 1865-0929

  145. Entropy-Based Sentence Selection for Speech Synthesis Using Phonetic and Prosodic Contexts Peer-reviewed

    Takashi Nose, Yusuke Arao, Takao Kobayashi, Komei Sugiura, Yoshinori Shiga, Akinori Ito

    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 3491-3495 2015

  146. Tempo Modification of Mixed Music Signal by Nonlinear Time Scaling and Sinusoidal Modeling Peer-reviewed

    Tsukasa Nishino, Takashi Nose, Akinori Ito

    2015 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP) 146-149 2015

    DOI: 10.1109/IIH-MSP.2015.86  

  147. Conversion of Speaker's Face Image Using PCA and Animation Unit for Video Chatting Peer-reviewed

    Yuki Saito, Takashi Nose, Takahiro Shinozaki, Akinori Ito

    2015 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP) 433-436 2015

    DOI: 10.1109/IIH-MSP.2015.85  

  148. On Appropriateness and Estimation of the Emotion of Synthesized Response Speech in a Spoken Dialogue System Peer-reviewed

    Taketo Kase, Takashi Nose, Akinori Ito

    HCI INTERNATIONAL 2015 - POSTERS' EXTENDED ABSTRACTS, PT I 528 747-752 2015

    DOI: 10.1007/978-3-319-21380-4_126  

    ISSN: 1865-0929

  149. Latent words recurrent neural network language models Peer-reviewed

    Ryo Masumura, Taichi Asami, Takanobu Oba, Hirokazu Masataki, Sumitaka Sakauchi, Akinori Ito

    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2015-January 2380-2384 2015

    ISSN: 2308-457X

    eISSN: 1990-9772

  150. Combinations of various language model technologies including data expansion and adaptation in spontaneous speech recognition Peer-reviewed

    Ryo Masumura, Taichi Asami, Takanobu Oba, Hirokazu Masataki, Sumitaka Sakauchi, Akinori Ito

    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2015-January 463-467 2015

    ISSN: 2308-457X

    eISSN: 1990-9772

  151. Hierarchical latent words language models for robust modeling to out-of domain tasks Peer-reviewed

    Ryo Masumura, Taichi Asami, Takanobu Oba, Hirokazu Masataki, Sumitaka Sakauchi, Akinori Ito

    Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing 1896-1901 2015

    Publisher: The Association for Computational Linguistics

    DOI: 10.18653/v1/d15-1217  

  152. Entropy-based sentence selection for speech synthesis using phonetic and prosodic contexts Peer-reviewed

    Takashi Nose, Yusuke Arao, Takao Kobayashi, Komei Sugiura, Yoshinori Shiga, Akinori Ito

    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2015-January 3491-3495 2015

    ISSN: 2308-457X

    eISSN: 1990-9772

  153. Preface Peer-reviewed

    Junzo Watada, Akinori Ito, Jeng Shyang Pan, Han Chieh Chao, Chien Ming Chen

    Proceedings - 2014 10th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2014 XXV 2014/12/24

    DOI: 10.1109/IIH-MSP.2014.5  

  154. Analysis of english pronunciation of singing voices sung by Japanese speakers Peer-reviewed

    Kazumichi Yoshida, Takashi Nose, Akinori Ito

    Proceedings - 2014 10th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2014 554-557 2014/12/24

    DOI: 10.1109/IIH-MSP.2014.143  

  155. Assessing the intended enthusiasm of singing voice using energy variance Peer-reviewed

    Akinori Ito

    Proceedings - 2014 10th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2014 558-561 2014/12/24

    DOI: 10.1109/IIH-MSP.2014.144  

  156. Teaching a robot where objects are: Specification of object location using human following and human orientation estimation Peer-reviewed

    Keisuke Sakai, Yutaka Hiroi, Akinori Ito

    World Automation Congress Proceedings 490-495 2014/10/24

    DOI: 10.1109/WAC.2014.6936012  

    ISSN: 2154-4824

    eISSN: 2154-4832

  157. Analysis of spectral enhancement using global variance in HMM-based speech synthesis Peer-reviewed

    Takashi Nose, Akinori Ito

    Proceedings of Interspeech 2014/09/18

  158. Accent type and phrase boundary estimation using acoustic and language models for automatic prosodic labeling Peer-reviewed

    Tomoki Koriyama, Hiroshi Suzuki, Takashi Nose, Takahiro Shinozaki, Akinori Ito

    Proceedings of Interspeech 2014/09/17

  159. User modeling by using bag-of-behaviors for building a dialog system sensitive to the interlocutor's internal state Peer-reviewed

    Yuya Chiba, Masashi Ito, Takashi Nose, Akinori Ito

    Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue 2014/07/18

  160. TEMPO MODIFICATION OF MUSIC SIGNAL USING SINUSOIDAL MODEL AND LPC-BASED RESIDUE MODEL Peer-reviewed

    Akinori Ito, Yuki Igarashi, Masashi Ito, Takashi Nose

    Proceedings of International Congress on Sound and Vibration 2014/07/13

  161. User Modeling by Using Bag-of-Behaviors for Building a Dialog System Sensitive to the Interlocutor’s Internal State Peer-reviewed

    Yuya Chiba, Takashi Nose, Akinori Ito, Masashi Ito

    Proceedings of 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue 74 2014/06/18

  162. Packet loss concealment of voice-over IP packet using redundant parameter transmission under severe loss conditions Peer-reviewed

    Takeshi Nagano, Akinori Ito

    Journal of Information Hiding and Multimedia Signal Processing 5 (2) 285-294 2014/04

    ISSN: 2073-4212

    eISSN: 2073-4239

  163. Modeling User's State During Dialog Turn Using HMM For Multi-modal Spoken Dialog System Peer-reviewed

    Yuya Chiba, Masashi Ito, Akinori Ito

    Proceedings of The Seventh International Conference on Advances in Computer-Human Interactions 343-346 2014/03/02

  164. 低リソースな計算機による音声認識の検討

    長野 雄, 伊藤 彰則, 大河 雄一

    日本音響学会2014年春季研究発表会講演論文集 67-70 2014/03

    Publisher:

    ISSN: 1880-7658

  165. Automatic evaluation of singing enthusiasm for karaoke Peer-reviewed

    Ryunosuke Daido, Masashi Ito, Shozo Makino, Akinori Ito

    Computer Speech and Language 28 (2) 501-517 2014/03

    DOI: 10.1016/j.csl.2012.07.007  

    ISSN: 0885-2308

    eISSN: 1095-8363

  166. Speech recognition in a home environment using parallel decoding with GMM-based noise modeling Peer-reviewed

    Kohei Machida, Takashi Nose, Akinori Ito

    2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014 2014/02/12

    DOI: 10.1109/APSIPA.2014.7041622  

  167. Controlling Switching Pause Using an AR Agent for Interactive CALL System Peer-reviewed

    Naoto Suzuki, Takashi Nose, Akinori Ito, Yutaka Hiroi

    Communications in Computer and Information Science 435 PART II 588-593 2014

    Publisher: Springer Verlag

    DOI: 10.1007/978-3-319-07854-0_102  

    ISSN: 1865-0929

  168. Manipulation of vocal signal in mixed music signal using side information of F0 and backing spectrum Peer-reviewed

    Akinori Ito, Yuto Sasaki

    International Conference on Signal Processing Proceedings, ICSP 2015-January (October) 605-609 2014

    DOI: 10.1109/ICOSP.2014.7015075  

    ISSN: 2164-5221

  169. Analysis of spectral enhancement using global variance in HMM-based speech synthesis Peer-reviewed

    Takashi Nose, Akinori Ito

    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2917-2921 2014

    ISSN: 2308-457X

    eISSN: 1990-9772

  170. User modeling by using bag-of-behaviors for building a dialog system sensitive to the interlocutor's internal state Peer-reviewed

    Yuya Chiba, Takashi Nose, Akinori Ito, Masashi Ito

    SIGDIAL 2014 - 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Proceedings of the Conference 74-78 2014

    DOI: 10.3115/v1/w14-4310  

  171. Tempo modification of music signal using sinusoidal model and LPC-based residue model Peer-reviewed

    Akinori Ito, Yuki Igarashi, Masashi Ito, Takashi Nose

    21st International Congress on Sound and Vibration 2014, ICSV 2014 1 928-935 2014

  172. Modeling user's state during dialog turn using HMM for multi-modal spoken dialog system Peer-reviewed

    Yuya Chiba, Akinori Ito, Masashi Ito

    ACHI 2014 - 7th International Conference on Advances in Computer-Human Interactions 343-346 2014

  173. Foreword to the special issue on the speech communication and its related technologies Peer-reviewed

    Akinori Ito

    Acoustical Science and Technology 34 (2) 63 2013

    DOI: 10.1250/ast.34.63  

    ISSN: 1346-3969

    eISSN: 1347-5177

  174. ASAHI: OK for failure a robot for supporting daily life, equipped with a robot avatar Peer-reviewed

    Yutaka Hiroi, Akinori Ito

    ACM/IEEE International Conference on Human-Robot Interaction 141-142 2013

    DOI: 10.1109/HRI.2013.6483541  

    ISSN: 2167-2148

    eISSN: 2167-2148

  175. Evaluation of robot design using virtual reality Peer-reviewed

    Yutaka Hiroi, Akinori Ito

    Transactions of the Virtual Reality Society of Japan 18 (2) 161-170 2013

    Publisher: THE VIRTUAL REALITY SOCIETY OF JAPAN

    DOI: 10.18974/tvrsj.18.2_161  

    ISSN: 1344-011X

    More details Close

    We can make a robot suitable for users' preference by designing its appearance and interaction through subjective evaluation. However, for evaluating users' impressions using real robots, it is necessary to build many robots with various specifications such as height, which is time-consuming and costly. In this paper, we propose a robot design methodology based on augmented reality (AR). We conducted experiments to evaluate a robot's head size using both AR and real robots, and similar results were obtained from both evaluation experiments in an environment with simple background. Next, we conducted experiments to evaluate a robot's head size using both AR and real robots in a real environment, and similar results were obtained from both evaluation experiments. From these experiments, we can conclude that the CG-based robot evaluation is as effective as that using real robots. In addition, the AR technology enables us to evaluate the robot in a real environment, which realizes more realistic evaluation of robot design without building real robots.

  176. Estimation of User's State during a Dialog Turn with Sequential Multi-modal Features Peer-reviewed

    Yuya Chiba, Masashi Ito, Akinori Ito

    Communications in Computer and Information Science 374 (PART II) 572-576 2013

    Publisher: Springer Verlag

    DOI: 10.1007/978-3-642-39476-8_115  

    ISSN: 1865-0929

  177. Multi-modal voice activity detection by embedding image features into speech signal Peer-reviewed

    Yohei Abe, Akinori Ito

    Proceedings - 2013 9th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2013 271-274 2013

    Publisher: IEEE Computer Society

    DOI: 10.1109/IIH-MSP.2013.76  

  178. Acoustic features and auditory impressions of death growl and screaming voice Peer-reviewed

    Keizo Kato, Akinori Ito

    Proceedings - 2013 9th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2013 460-463 2013

    DOI: 10.1109/IIH-MSP.2013.120  

  179. Speech recognition under noisy environments using multiple microphones based on asynchronous and intermittent measurements Peer-reviewed

    Kohei Machida, Akinori Ito

    2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013 1-4 2013

    DOI: 10.1109/APSIPA.2013.6694362  

  180. ASAHI: OK for Failure A Robot for Supporting Daily Life, Equipped with a Robot Avatar Peer-reviewed

    Yutaka Hiroi, Akinori Ito

    PROCEEDINGS OF THE 8TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI 2013) 141-+ 2013

    DOI: 10.1109/HRI.2013.6483541  

    ISSN: 2167-2121

  181. A packet loss recovery of G.729 speech using discriminative model and N-gram Peer-reviewed

    Takeshi Nagano, Akinori Ito

    Proceedings - 2013 9th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2013 267-270 2013

    DOI: 10.1109/IIH-MSP.2013.75  

  182. Evaluation of sinusoidal modeling for polyphonic music signal Peer-reviewed

    Yuki Igarashi, Masashi Ito, Akinori Ito

    Proceedings - 2013 9th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2013 464-467 2013

    DOI: 10.1109/IIH-MSP.2013.121  

  183. A Mobile Robot System With Semi-Autonomous Navigation Using Simple And Robust Person Following Behavior Peer-reviewed

    Yutaka Hiroi, Shohei Matsunaka, Akinori Ito

    Journal of Man, Machine and Technology 1 (1) 44-62 2012/12

    DOI: 10.4156/jmmt.vol1.issue1.4  

  184. Packet Loss Concealment of VoIP Under Severe Loss Conditions Peer-reviewed

    Akinori Ito, Takeshi Nagano

    International Symposium on Wireless Personal Multimedia Communication 2012/09/24

  185. Advanced Information Hiding for G.711 Telephone Speech Peer-reviewed

    Akinori Ito, Yoiti Suzuki

    Multimedia Information Hiding Technologies and Methodologies for Controlling Data 2012/09/23

  186. The Available Telecommunications Services at Serious Disaster Invited

    Sadao Shoji, Takafumi Aoki, Akinori Ito, Shinichiro Omachi, Koichi Ito

    IEICE Technical Report 112 (209) 69-70 2012/09

    More details Close

    NS2012-64,IN2012-62,CS2012-53

  187. Model shrinkage for discriminative language models Peer-reviewed

    Takanobu Oba, Takaaki Hori, Atsushi Nakamura, Akinori Ito

    IEICE Transactions on Information and Systems E95-D (5) 1465-1474 2012/05

    DOI: 10.1587/transinf.E95.D.1465  

    ISSN: 0916-8532

    eISSN: 1745-1361

  188. On short essays carried in the acoustical science and technology

    Ito, A.

    Acoustical Science and Technology 33 (1) 72-72 2012

    DOI: 10.1250/ast.33.72  

  189. 混合音響信号の正弦波モデルによる分析合成

    五十嵐 佑樹, 伊藤 仁, 伊藤 彰則

    電気関係学会東北支部連合大会講演論文集 2012 187-187 2012

    Publisher: 電気関係学会東北支部連合大会実行委員会

    DOI: 10.11528/tsjc.2012.0_187  

  190. 口唇画像情報の音声信号へのデータハイディング

    阿部 洋平, 伊藤 彰則

    電気関係学会東北支部連合大会講演論文集 2012 188-188 2012

    Publisher: 電気関係学会東北支部連合大会実行委員会

    DOI: 10.11528/tsjc.2012.0_188  

  191. 断片的な環境測定に基づく雑音除去の検討

    町田 晃平, 伊藤 彰則

    電気関係学会東北支部連合大会講演論文集 2012 184-184 2012

    Publisher: 電気関係学会東北支部連合大会実行委員会

    DOI: 10.11528/tsjc.2012.0_184  

  192. 人間共存型ロボットのための呼びかけ制御の検討

    戸塚 典子, 伊藤 彰則

    電気関係学会東北支部連合大会講演論文集 2012 149-149 2012

    Publisher: 電気関係学会東北支部連合大会実行委員会

    DOI: 10.11528/tsjc.2012.0_149  

  193. Effect of Linguistic Contents on Human Estimation of Internal State of Dialog System Users Peer-reviewed

    Yuya Chiba, Masashi Ito, Akinori Ito

    Proceedings of The Interdisciplinary Workshop on Feedback Behavior in Dialog 11-14 2012

  194. Round-robin duel discriminative language models Peer-reviewed

    Takanobul Oba, Takaaki Hori, Atsushi Nakamura, Akinori Ito

    IEEE Transactions on Audio, Speech and Language Processing 20 (4) 1244-1255 2012

    DOI: 10.1109/TASL.2011.2174225  

    ISSN: 1558-7916

    eISSN: 1558-7924

  195. Robust Transmission of Audio Signals over the Internet: An Advanced Packet Loss Concealment for MP3-Based Audio Signals Peer-reviewed

    Akinori Ito, Kiyoshi Konno, Masashi Ito, Shozo Makino

    Interdisciplinary Information Sciences 18 (2) 99-105 2012

    Publisher: The Editorial Committee of the Interdisciplinary Information Sciences

    DOI: 10.4036/iis.2012.99  

    ISSN: 1340-9050

    More details Close

    This paper describes packet loss concealment methods for MP3 audio. The proposed methods are based on estimation of modified discrete cosine transform (MDCT) coefficients of the lost packets. The estimation of MDCT coefficients of lower dimensions is performed by switching two concealment methods: the sign correction method and the correlation-based method. The concealment methods are switched based on redundant side information calculated subband-by-subband for reducing MDCT prediction errors. Next, a method for improving estimation of MDCT coefficients of higher dimensions was proposed. The method estimates the absolute value and sign of an MDCT coefficient independently. The subjective evaluation experiment proved that both of the improvement methods for lower and higher dimensions effectively improved the subjective audio quality.

  196. Mobile Robot System With Semi-Autonomous Navigation Using Simple And Robust Person Following Behavior Peer-reviewed

    Yutaka Hiroi, Shohei Matsunaka, Akinori Ito

    Journal of Man, Machine and Technology 1 (1) 44-62 2012

  197. Spoken document retrieval by discriminative modeling in a high dimensional feature space Peer-reviewed

    Takanobu Oba, Takaaki Hori, Atsushi Nakamura, Akinori Ito

    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 5153-5156 2012

    DOI: 10.1109/ICASSP.2012.6289080  

    ISSN: 1520-6149

  198. Estimating a user's internal state before the first input utterance Peer-reviewed

    Yuya Chiba, Akinori Ito

    Advances in Human-Computer Interaction 2012 2012

    DOI: 10.1155/2012/865362  

    ISSN: 1687-5893

    eISSN: 1687-5907

  199. Effect of robot height on comfortableness of spoken dialog Peer-reviewed

    Yutaka Hiroi, Takayuki Nakayama, Hisanori Kuroda, Shinji Miyake, Akinori Ito

    International Conference on Human System Interaction, HSI 29-34 2012

    DOI: 10.1109/HSI.2012.14  

    ISSN: 2158-2246

    eISSN: 2158-2254

  200. Estimation of user's internal state before the user's first utterance using acoustic features and face orientation Peer-reviewed

    Yuya Chiba, Masashi Ito, Akinori Ito

    International Conference on Human System Interaction, HSI 23-28 2012

    DOI: 10.1109/HSI.2012.13  

    ISSN: 2158-2246

    eISSN: 2158-2254

  201. Recognition of utterances with grammatical mistakes based on optimization of language model towards interactive CALL systems Peer-reviewed

    Takuya Anzai, Akinori Ito

    2012 Conference Handbook - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2012 2012

  202. A Japanese lyrics writing support system for amateur songwriters Peer-reviewed

    Chihiro Abe, Akinori Ito

    2012 Conference Handbook - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2012 2012

  203. A spoken dialogue system using virtual conversational agent with augmented reality Peer-reviewed

    Shinji Miyake, Akinori Ito

    2012 Conference Handbook - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2012 2012

  204. A packet loss recovery of G.729 speech under severe packet loss condition Peer-reviewed

    Takeshi Nagano, Akinori Ito

    2012 Conference Handbook - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2012 2012

  205. Automatic assessment of easiness of Japanese for writing aid of "Easy Japanese" Peer-reviewed

    Meng Zhang, Akinori Ito, Kazuyuki Sato

    ICALIP 2012 - 2012 International Conference on Audio, Language and Image Processing, Proceedings 303-307 2012

    DOI: 10.1109/ICALIP.2012.6376630  

  206. Packet loss concealment of VoIP under severe loss conditions Peer-reviewed

    Akinori Ito, Takeshi Nagano

    International Symposium on Wireless Personal Multimedia Communications, WPMC 489-490 2012

    ISSN: 1347-6890

  207. Influence of the size factor of a mobile robot moving toward a human on subjective acceptable distance Peer-reviewed

    Hiroi, Yutaka, Ito, Akinori

    Mobile Robots-Current Trends 177-190 2011/10/26

    Publisher: IntechOpen

  208. A System for Evaluating Singing Enthusiasm for Karaoke Peer-reviewed

    Ryunosuke Daido, Seong-Jun Hahm, Masashi Ito, Shozo Makino, Akinori Ito

    Proceedings of International Society of Music Information Retrieval Conference 31-36 2011/10/24

  209. Find out what a user doing before the first utterance: discrimination of user's internal state using non-verbal information Peer-reviewed

    Yuya Chiba, Akinori Ito

    Proceedings of Asian-Pacific Signal and Information Processing Association Annual Summit and Conference 2011/10/19

  210. 統計的言語モデルを用いた作詞補助システム

    阿部 ちひろ, 伊藤 彰則

    電気関係学会東北支部連合大会講演論文集 2011 141-141 2011

    Publisher: 電気関係学会東北支部連合大会実行委員会

    DOI: 10.11528/tsjc.2011.0_141  

  211. 雑音環境下での頑健な単語検出

    藤田 一暁, 咸 聖俊, 伊藤 彰則

    電気関係学会東北支部連合大会講演論文集 2011 184-184 2011

    Publisher: 電気関係学会東北支部連合大会実行委員会

    DOI: 10.11528/tsjc.2011.0_184  

  212. 音声合成用コーパス作成方式に関する研究

    加藤 圭造, 伊藤 彰則

    電気関係学会東北支部連合大会講演論文集 2011 187-187 2011

    Publisher: 電気関係学会東北支部連合大会実行委員会

    DOI: 10.11528/tsjc.2011.0_187  

  213. 拡張現実感を用いたバーチャル対話エージェントに関する研究

    三宅 真司, 伊藤 彰則

    電気関係学会東北支部連合大会講演論文集 2011 77-77 2011

    Publisher: 電気関係学会東北支部連合大会実行委員会

    DOI: 10.11528/tsjc.2011.0_77  

  214. Utterance classification for combination of multiple simple dialog systems Peer-reviewed

    Seong Jun Hahm, Akinori Ito, Kentaro Awano, Masashi Ito, Shozo Makino

    Proceedings - 9th IEEE International Symposium on Parallel and Distributed Processing with Applications Workshops, ISPAW 2011 - ICASE 2011, SGH 2011, GSDP 2011 171-176 2011

    DOI: 10.1109/ISPAW.2011.74  

  215. Bit rate reduction of the MELP coder using Lempel-Ziv segment quantization Peer-reviewed

    Minoru Kohata, Motoyuki Suzuki, Akinori Ito, Shozo Makino

    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 5240-5243 2011

    DOI: 10.1109/ICASSP.2011.5947539  

    ISSN: 1520-6149

  216. Round-robin duel discriminative language models in one-pass decoding with on-the-fly error correction Peer-reviewed

    Takanobu Oba, Takaaki Hori, Akinori Ito, Atsushi Nakamura

    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 5588-5591 2011

    DOI: 10.1109/ICASSP.2011.5947626  

    ISSN: 1520-6149

  217. Evaluation of Abnormal Sound Detection using Multi-stage GMM in Various Environments Peer-reviewed

    Akinori Ito, Akihito Aiba, Masashi Ito, Shozo Makino

    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 308-+ 2011

  218. Training a language model using webdata for large vocabulary Japanese spontaneous speech recognition Peer-reviewed

    Ryo Masumura, Seongjun Hahm, Akinori Ito

    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 1465-1468 2011

    eISSN: 1990-9772

  219. Language model expansion using webdata for spoken document retrieval Peer-reviewed

    Ryo Masumura, Seongjun Hahm, Akinori Ito

    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2133-2136 2011

    eISSN: 1990-9772

  220. Manipulating vocal signal in mixed music sounds using small amount of side information Peer-reviewed

    Yuto Sasaki, Seong Jun Hahm, Akinori Ito

    Proceedings - 7th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIHMSP 2011 298-301 2011

    DOI: 10.1109/IIHMSP.2011.21  

  221. Evaluation of abnormal sound detection using multi-stage GMM in various environments Peer-reviewed

    Akinori Ito, Akihito Aiba, Masashi Ito, Shozo Makino

    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 301-304 2011

    eISSN: 1990-9772

  222. Toward human-robot interaction design through human-human interaction experiment Peer-reviewed

    Yutaka Hiroi, Akinori Ito

    Lecture Notes in Electrical Engineering 133 LNEE (VOL. 2) 127-130 2011

    DOI: 10.1007/978-3-642-25992-0_18  

    ISSN: 1876-1100

    eISSN: 1876-1119

  223. Training a language model using webdata for large vocabulary Japanese spontaneous speech recognition Peer-reviewed

    Ryo Masumura, Seongjun Hahm, Akinori Ito

    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 1476-1479 2011

  224. A system for evaluating singing enthusiasm for karaoke Peer-reviewed

    Ryunosuke Daido, Seong Jun Hahm, Masashi Ito, Shozo Makino, Akinori Ito

    Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011 31-36 2011

  225. Language model expansion using webdata for spoken document retrieval Peer-reviewed

    Ryo Masumura, Seongjun Hahm, Akinori Ito

    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 2144-2147 2011

  226. Find out what a user is doing before the first utterance: Discrimination of user's internal state using non-verbal information Peer-reviewed

    Yuya Chiba, Seongjun Hahm, Akinori Ito

    APSIPA ASC 2011 - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011 906-909 2011

  227. Multiple description coding using time domain division for MP3 coded sound signal Peer-reviewed

    Ho seok Wey, Akinori Ito, Takuma Okamoto, Yoiti Suzuki

    Journal of Information Hiding and Multimedia Signal Processing 1 (4) 269-285 2010/10

    ISSN: 2073-4212

    eISSN: 2073-4239

  228. Speech recognition under multiple noise environment based on multi-mixture HMM and weight optimization by the aspect model Peer-reviewed

    Seong Jun Hahm, Yuichi Ohkawa, Masashi Ito, Motoyuki Suzuki, Akinori Ito, Shozo Makino

    IEICE Transactions on Information and Systems E93-D (9) 2407-2416 2010/09

    DOI: 10.1587/transinf.E93.D.2407  

    ISSN: 0916-8532

    eISSN: 1745-1361

  229. Evaluation of head size of an interactive robot using augmented reality Peer-reviewed

    Yutaka Hiroi, Shuhei Hisano, Akinori Ito

    Proceedings of International Symposium on Robotics and Automation 2010/09

  230. An HMM‐based segment quantizer and its application to low bit rate speech coding Peer-reviewed

    Motoyuki Suzuki, Masashi Adachi, Minoru Kohata, Akinori Ito, Shozo Makino, Fuji Ren

    Proceedings of International Congress on Acoustics 2010/08

  231. Multiple description coding for MP3 coded sound signal Peer-reviewed

    Ho-seok Wey, Akinori Ito, Takuma Okamoto, Yoiti Suzuki

    Proceedings of International Congress on Acoustics 2010/08

  232. Improved reference speaker weighting using aspect model Peer-reviewed

    Seong Jun Hahm, Yuichi Ohkawa, Masashi Ito, Motoyuki Suzuki, Akinori Ito, Shozo Makino

    IEICE Transactions on Information and Systems E93-D (7) 1927-1935 2010/07

    DOI: 10.1587/transinf.E93.D.1927  

    ISSN: 0916-8532

    eISSN: 1745-1361

  233. Information hiding for G.711 speech based on substitution of least significant bits and estimation of tolerable distortion Peer-reviewed

    Akinori Ito, Shun'Ichiro Abe, Yôiti Suzuki

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E93-A (7) 1279-1286 2010/07

    DOI: 10.1587/transfun.E93.A.1279  

    ISSN: 0916-8508

    eISSN: 1745-1337

  234. Bit Rate Reduction of Vocoder-Type Speech Coder by Reducing Temporal Redundancy Peer-reviewed

    KOHATA Minoru, SUZUKI Motoyuki, ITO Akinori, MAKINO Syouzou

    The IEICE transactions on information and systems 93 (5) 588-597 2010/05

    Publisher: 一般社団法人電子情報通信学会

    ISSN: 1880-4535

    More details Close

    これまでに筆者らは,連続情報源に含まれる時間的な冗長性を圧縮符号化する方式として,新しいセグメント量子化法であるLempel-Ziv Segment Quantization(LZSQ)を提案した.これは,離散情報源用の圧縮法であるLZ符号化を連続情報源に適用できるように修正したものである.本論文ではLZSQをボコーダ型の低ビット音声符号化方式に適用し,時間冗長性を圧縮することにより,更なるビットレートの削減を試みる.ボコーダ型符号化においては音質を維持するためにはビットレートの下限が2.4kbit/s程度であるといわれているが,LZSQを適用することで,音質を維持しつつ更に低レート化することが可能となる.本論文では,標準化されているボコーダ型音声符号化方式の一つである2.4kbit/sMELP符号化の6個の符号化パラメータにLZSQを適用することにより,MELP符号化と同等の音質を維持しつつ極限までビットレートを削減することを試みた.その結果,総ビットレートを約1.57kbit/sまで低減することができた.

  235. Packet loss concealment for mdct-based audio codec using correlation-based side information Peer-reviewed

    Akinori Ito, Toshiyuki Sakai, Kiyoshi Konno, Shozo Makino, Motoyuki Suzuki

    International Journal of Innovative Computing, Information and Control 6 (3) 1347-1361 2010/03

    ISSN: 1349-4198

  236. Intonation evaluation of english utterances using synthesized speech for computer-assisted language learning Peer-reviewed

    Akinori Ito, Tomoaki Konno, Masashi Ito, Shozo Makino, Motoyuki Suzuki

    International Journal of Innovative Computing, Information and Control 6 (3) 1501-1514 2010/03

    ISSN: 1349-4198

  237. A Constant-bitrate Information Hiding into G.711 Speech Using ADPCM Output and Sample Magnitude Peer-reviewed

    Akinori Ito, Hironori Handa, Yoiti Suzuki

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences J93-A (2) 82-90 2010/02

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5707

  238. Source-filter separation for nonstationary voiced speech based on sinusoidal representation Peer-reviewed

    Masashi Ito, Keiji Ohara, Akinori Ito, Masafumi Yano

    Acoustical Science and Technology 31 (2) 181-184 2010

    DOI: 10.1250/ast.31.181  

    ISSN: 1346-3969

    eISSN: 1347-5177

  239. Designing side information of multiple description coding Peer-reviewed

    Akinori Ito, Shozo Makino

    Journal of Information Hiding and Multimedia Signal Processing 1 (1) 10-19 2010/01

    ISSN: 2073-4212

    eISSN: 2073-4239

  240. Aspect-model-based reference speaker weighting Peer-reviewed

    Seongjun Hahm, Yuichi Ohkawa, Masashi Ito, Motoyuki Suzuki, Akinori Ito, Shozo Makino

    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 4302-4305 2010

    DOI: 10.1109/ICASSP.2010.5495672  

    ISSN: 1520-6149

  241. Document expansion using relevant web documents for spoken document retrieval Peer-reviewed

    Ryo Masumura, Akinori Ito, Yu Uno, Masashi Ito, Shozo Makino

    Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2010 612-619 2010

    DOI: 10.1109/NLPKE.2010.5587854  

  242. An Effect of Formant Amplitude in Vowel Perception Peer-reviewed

    Masashi Ito, Keiji Ohara, Akinori Ito, Masafumi Yano

    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4 2494-+ 2010

  243. Improvement of packet loss concealment for MP3 audio based on switching of concealment method and estimation of MDCT signs Peer-reviewed

    Akinori Ito, Kiyoshi Konno, Masashi Itot, Shozo Makino

    Proceedings - 2010 6th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIHMSP 2010 518-521 2010

    DOI: 10.1109/IIHMSP.2010.132  

  244. A query-by-humming music information retrieval from audio signals based on multiple F0 candidates Peer-reviewed

    Akinori Ito, Yu Kosugi, Shozo Makino, Masashi Ito

    ICALIP 2010 - 2010 International Conference on Audio, Language and Image Processing, Proceedings 1-5 2010

    DOI: 10.1109/ICALIP.2010.5685029  

  245. A spoken dialog system based on automatically-generated example database Peer-reviewed

    Akinori Ito, Takahiro Morimoto, Shozo Makino, Masashi Ito

    ICALIP 2010 - 2010 International Conference on Audio, Language and Image Processing, Proceedings 732-736 2010

    DOI: 10.1109/ICALIP.2010.5685069  

  246. Grammatical error detection from English utterances spoken by Japanese Peer-reviewed

    Takuya Anzai, Seongjun Hahm, Akinori Ito, Masashi Ito, Shozo Makino

    APSIPA ASC 2010 - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 482-485 2010

  247. Speech recognition based on tree-structured clustering and aspect model in multiple noise environments Peer-reviewed

    Seong Jun Hahm, Yuichi Ohkawa, Motoyuki Suzuki, Masashi Ito, Shozo Makino, Akinori Ito

    APSIPA ASC 2010 - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 454-457 2010

  248. Evaluation of head size of an interactive robot using an augmented reality Peer-reviewed

    Yutaka Hiroi, Shuhei Hisano, Akinori Ito

    2010 World Automation Congress, WAC 2010 2010

  249. An effect of formant amplitude in vowel perception Peer-reviewed

    Masashi Ito, Keiji Ohara, Akinori Ito, Masafumi Yano

    Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010 2490-2493 2010

  250. Multiple description coding for an MP3 coded sound signal Peer-reviewed

    Ho Seok Wey, Akinori Ito, Takuma Okamoto, Yôiti Suzuki

    20th International Congress on Acoustics 2010, ICA 2010 - Incorporating Proceedings of the 2010 Annual Conference of the Australian Acoustical Society 4 3081-3088 2010

  251. An HMM-based segment quantizer and its application to low bit rate speech coding Peer-reviewed

    Motoyuki Suzuki, Masashi Adachi, Minoru Kohata, Akinori Ito, Shozo Makino, Fuji Ren

    20th International Congress on Acoustics 2010, ICA 2010 - Incorporating Proceedings of the 2010 Annual Conference of the Australian Acoustical Society 5 3877-3880 2010

  252. A speaker adaptation method for non-native speech using learners' native utterances for computer-assisted language learning systems Peer-reviewed

    Yuichi Ohkawa, Motoyuki Suzuki, Hirokazu Ogasawara, Akinori Ito, Shozo Makino

    SPEECH COMMUNICATION 51 (10) 875-882 2009/10

    DOI: 10.1016/j.specom.2009.05.005  

    ISSN: 0167-6393

    eISSN: 1872-7182

  253. Multiple Description Coding of Flash Video based on Adaptive Allocation of DCT Coefficients Peer-reviewed

    Akinori Ito, Takuya Kuraishi, Masashi Ito, Shozo Makino

    Proc. 1st Asian-Pacific Signal&Info. Proc. Assoc. Annual Summit & Conf. (APSIPA ASC 2009) 2009/10

  254. 混合重み再学習を用いた単語モデルによる連続音声認識

    大越真裕美, 鈴木基之, 大河雄一, 伊藤彰則, 牧野正三

    日本音響学会 2009年春季研究発表会講演論文集,1-P-23 2009/03

  255. Query-by-Humming based Music Information Retrieval System Based on Novel Tonal Feature and Statistical Modeling Peer-reviewed

    Motoyuki Suzuki, Takuto Ichikawa, Akinori Ito, Shozo Makino

    IPSJ Journal 50 (3) 1100-1110 2009/03

  256. Novel Tonal Feature and Statistical User Modeling for Query-by-Humming

    Suzuki Motoyuki, Ichikawa Takuto, Ito Akinori, Makino Shozo

    Information and Media Technologies 4 (2) 498-508 2009

    Publisher: Information and Media Technologies Editorial Board

    DOI: 10.11185/imt.4.498  

    More details Close

    This paper describes a query-by-humming (QbH) music information retrieval (MIR) system based on a novel tonal feature and statistical modeling. Most QbH-MIR systems use a pitch extraction method in order to obtain tonal features of an input humming. In these systems, pitch extraction errors inevitably occur and degrade the performance of the system. In the proposed system, a cross-correlation function between two logarithmic frequency spectra is calculated as a tonal feature instead of a difference of two successive pitch frequencies, and probabilistic models are prepared for all tone intervals existing in the database. The similarity scores between an input humming and musical pieces in a database are calculated using the probabilistic models. The advantages of this system are that it can obtain more appropriate tonal features than the pitch-based method, and it is also robust against inaccurate humming by the user thanks to its statistical approach. From experimental results, the top-1 retrieval accuracy given by the proposed method was 86.8%, which was more than 10 points higher than the conventional single pitch method. Moreover, several integration methods were applied to the proposed method with several conditions. The majority decision method showed the highest accuracy, and 5% reduction of retrieval error was obtained.

  257. Dictation of Japanese Speech Based on Kana and Kanji Character String Peer-reviewed

    Ito, Akinori, Kinno, Hiroaki, Katoh, Masaharu, Kosaka, Tetsuo, Kohda, Masaki

    International Journal of Computer Processing Of Languages 22 (01) 75-98 2009

    Publisher: World Scientific

  258. Fast and Robust Training of a Probabilistic Latent Semantic Analysis Model by the Parallel Learning and Data Segmentation Peer-reviewed

    Kato, Masaharu, Kosaka, Tetsuo, Ito, Akinori, Makino, Shozo

    Journal of Communication and Computer 6 (5) 28-35 2009

    Publisher: 美國大衛出版公司

  259. Evaluation of Robot-Avatar-based User-Familiarity Improvement for Elderly People Peer-reviewed

    Yutaka Hiroi, Akinori Ito

    Kansei Engineering International 8 (1) 59-66 2009/01

    DOI: 10.5057/ER080218-1  

  260. Effect of the size factor on psychological threat of a mobile robot moving toward human Peer-reviewed

    Hiroi, Yutaka, Ito, Akinori

    KANSEI Engineering International 8 (1) 51-58 2009/01

    Publisher: Japan Society of Kansei Engineering

    DOI: 10.5057/ER080206-1  

  261. Bit rate reduction of mixed excitation linear prediction coder by Lempel-Ziv segment quantization Peer-reviewed

    Minora Kohata, Motoyuki Suzuki, Akinori Ito, Shozo Makino

    Acoustical Science and Technology 30 (2) 136-138 2009

    DOI: 10.1250/ast.30.136  

    ISSN: 1346-3969 1347-5177

  262. INFORMATION HIDING FOR G.711 SPEECH BASED ON SUBSTITUTION OF LEAST SIGNIFICANT BITS AND ESTIMATION OF TOLERABLE DISTORTION Peer-reviewed

    Akinori Ito, Shun'ichiro Abe, Yoiti Suzuki

    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS 1409-+ 2009

    DOI: 10.1109/ICASSP.2009.4959857  

    ISSN: 1520-6149

  263. Detection of abnormal sound using multi-stage GMM for surveillance microphone Peer-reviewed

    Akinori Ito, Akihito Aiba, Masashi Ito, Shozo Makino

    5th International Conference on Information Assurance and Security, IAS 2009 1 733-736 2009

    DOI: 10.1109/IAS.2009.160  

  264. A band extension of G.711 speech with low computational cost for data hiding application Peer-reviewed

    Akinori Ito, Hironori Handa, Yôiti Suzuki

    IIH-MSP 2009 - 2009 5th International Conference on Intelligent Information Hiding and Multimedia Signal Processing 491-494 2009

    DOI: 10.1109/IIH-MSP.2009.69  

  265. Data hiding is a better way for transmitting side information for MP3 bitstream Peer-reviewed

    Akinori Ito, Shozo Makino

    IIH-MSP 2009 - 2009 5th International Conference on Intelligent Information Hiding and Multimedia Signal Processing 495-498 2009

    DOI: 10.1109/IIH-MSP.2009.55  

  266. Relative importance of formant and whole-spectral cues for vowel perception Peer-reviewed

    Masashi Ito, Keiji Ohara, Akinori Ito, Masafumi Yano

    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 132-+ 2009

  267. Evaluation of English Intonation based on Combination of Multiple Evaluation Scores Peer-reviewed

    Akinori Ito, Tomoaki Konno, Masashi Ito, Shozo Makino

    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 596-599 2009

  268. Detailed description of triphone model using SSS-free algorithm Peer-reviewed

    Motoyuki Suzuki, Daisuke Honma, Akinori Ito, Shozo Makino

    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 1403-+ 2009

  269. Relevant document retrieval using a spoken document Peer-reviewed

    Akinori Ito, Yu Uno, Ryo Masumura, Masashi Ito, Shozo Makino

    2009 9th International Symposium on Communications and Information Technology, ISCIT 2009 1483-1488 2009

    DOI: 10.1109/ISCIT.2009.5341051  

  270. Multiple description coding for wideband audio signal transmission Peer-reviewed

    Hoseok Wey, Akinori Ito, Yôiti Suzuki

    Proceedings of 2009 IEEE International Conference on Network Infrastructure and Digital Content, IEEE IC-NIDC2009 769-773 2009

    DOI: 10.1109/ICNIDC.2009.5360882  

  271. Automatic query generation and query relevance measurement for unsupervised language model adaptation of speech recognition Peer-reviewed

    Akinori Ito, Yasutomo Kajiura, Motoyuki Suzuki, Shozo Makino

    Eurasip Journal on Audio, Speech, and Music Processing 2009 2009

    DOI: 10.1155/2009/140575  

    ISSN: 1687-4714

    eISSN: 1687-4722

  272. 音素トライフォンの混合重み再学習に基づく孤立単語認識

    大越真裕美, 鈴木基之, 大河雄一, 伊藤彰則, 牧野正三

    日本音響学会 2008年秋季研究発表会講演論文集 123-124 2008/09

  273. Are Bigger Robots Scary? -The Relationship between Robot Size and Psychological Threat- Peer-reviewed

    Yutaka Hiroi, Akinori Ito

    Proceedings of International Conference on Advanced Intelligent Mechatronics 540-545 2008/07

  274. Improvement of user familiarity using robot avatar Peer-reviewed

    Yutaka Hiroi, Akinori Ito, Eiji Nakano

    Journal of Japan Society of Kansei Engineering 7 (4) 797-805 2008/04

    Publisher: Japan Society of Kansei Engineering

    DOI: 10.5057/jjske2001.7.797  

    ISSN: 1346-1958

    More details Close

    Familiarity is one of the most important requirements for human symbiosis robots such as care service robot. Many studies have been made to provide robots with the familiarity by improving their appearance, facial expression and smoothness of the movement. This paper presents a new concept, called a "robot avatar."A robot avatar is a small robot mounted on a main robot and equipped with minimum function to play some gestures according to every scene of the task execution of the main robot. By looking at the avatar, a user feels as if the avatar is controlling the main robot. Therefore a user is informed of the next behavior of the main robot by the avatar. A prototype of the avatar named CHIRIS is designed and installed to an intelligent service robot IRIS developed by the authors. IRIS can execute some simple tasks such as serving beverages by verbal request of the user. Utilizing CHIRIS, some psychological tests about the impression of IRIS during its task execution were carried out using video. Test results showed that CHIRIS is effective to give more familiar impression to the users.

  275. Multiple description coding of an audio stream by optimum recovery transforms Peer-reviewed

    Akinori Ito, Shozo Makino

    Journal of Digital Information Management 6 (2) 189-195 2008/04

  276. Selection of optimum vocabulary and dialog strategy for noise-robust spoken dialog systems Peer-reviewed

    Akinori Ito, Takanobu Oba, Takashi Konashi, Motoyuki Suzuki, Shozo Makino

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E91D (3) 538-548 2008/03

    DOI: 10.1093/ietisy/e9l-d.3.538  

    ISSN: 0916-8532

  277. Improvement of Automatic English Prosody Evaluation Based on Word Clustering Using a Decision Tree Peer-reviewed

    Akinori Ito, Tatsuki Konno, Motoyuki Suzuki, Shozo Makino

    The IEICE Transaction on Information and Systems J91-D (2) 358-366 2008/02

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 1880-4535

  278. Suppression of Internal Noise for Speech Recognition of Small Robots Peer-reviewed

    Akinori Ito, Takashi Kanayama, Motoyuki Suzuki, Shozo Makino

    Journal of Human Interface Society 10 (1) 1-10 2008/02

  279. Adaptive Multiple Description Coding of Flash Video based on Bitstream Pattern Reconstruction

    KURAISHI Takuya, ITO Masashi, ITO Akinori, MAKINO Shozo

    ITE Technical Report 32 35-40 2008

    Publisher: The Institute of Image Information and Television Engineers

    DOI: 10.11485/itetr.32.56.0_35  

    More details Close

    Multiple Description (MD) Coding is one of effective methods for concealing burst packet loss. This method divides source information into multiple streams, and adds them correlation using redundant information. Utilizing the redundant information, the source can be fairly recovered if packet losses occur during the transmission. In this paper, we propose a method of MD Coding for Flash Video (FLV) based on bitstream pattern reconstruction. The effectiveness of the proposed method is examined for actual video data with packet loss simulations. Our proposed method showed almost equal quality with related method, but only needed a little redundancy. This result supported the proposed method to be effective for concealing burst packet loss.

  280. Automatic evaluation system of English prosody based on word importance factor Peer-reviewed

    Suzuki, Motoyuki, Konno, Tatsuki, Ito, Akinori, Makino, Shozo

    Journal of Systemics, Cybernetics and Informatics 6 (4) 83-90 2008

  281. An unsupervised language model adaptation based on keyword clustering and query availability estimation Peer-reviewed

    Akinori Ito, Yasutomo Kajiura, Shozo Makino, Motoyuki Suzuki

    2008 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING, VOLS 1 AND 2, PROCEEDINGS 1412-1418 2008

    DOI: 10.1109/ICALIP.2008.4590103  

  282. Packet loss concealment for MDCT-based audio codec using correlation-based side information Peer-reviewed

    Akinori Ito, Kiyoshi Konno, Shozo Makino, Motoyuki Suzuki

    2008 FOURTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, PROCEEDINGS 612-+ 2008

    DOI: 10.1109/IIH-MSP.2008.103  

  283. Discrimination of Task-Related Words for Vocabulary Design of Spoken Dialog Systems Peer-reviewed

    Akinori Ito, Toyomi Meguro, Shozo Makino, Motoyuki Suzuki

    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 207-+ 2008

  284. A Fast Speaker Adaptation Method using Aspect Model Peer-reviewed

    Seongjun Hahm, Akinori Ito, Shozo Makino, Motoyuki Suzuki

    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 1221-1224 2008

  285. Recognition of English Utterances with Grammatical and Lexical Mistakes for Dialogue-based CALL System Peer-reviewed

    Akinori Ito, Ryohei Tsutsui, Shozo Makino, Motoyuki Suzuki

    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 2819-2822 2008

  286. Intonation Evaluation of English Utterances using Synthesized Speech for Computer-Assisted Language Learning Peer-reviewed

    Tomoaki Konno, Masashi Ito, Motoyuki Suzuki, Akinori Ito, Shozo Makino

    IEEE NLP-KE 2008: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING 202-+ 2008

    DOI: 10.1109/NLPKE.2008.4906807  

  287. Application of Multiple Description Scalar Quantization to LogPCM and ADPCM Peer-reviewed

    Ho-seok Wey, Ryouichi Nishimura, Akinori Ito, Maori Kobayashi, Yoiti Suzuki

    The IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences J90-A (12) 918-921 2007/12

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5707

  288. 複数パスを有する音素モデル連結のためのパス間接続確率の平滑化法の検討

    本間大輔, 大河雄一, 鈴木基之, 伊藤彰則, 牧野正三

    日本音響学会2007年秋季研究発表会講演論文集 135-136 2007/09

  289. Reduction Method of Side Information for Packet Loss Concealment Based on Spectrum Striping Coding Peer-reviewed

    Motoyuki Suzuki, Toshiyuki Sakai, Akinori Ito, Shozo Makino

    Proceedings of 19th International Congress of Acoustics 2007/09

  290. Detection and Direction Estimation of Calling Voice Peer-reviewed

    Akinori Ito, Kota Kitadate, Motoyuki Suzuki, Shozo Makino

    Proceedings of 19th International Congress of Acoustics 2007/09

  291. Packet Loss Concealment of an Audio Stream by Time Domain and Frequency Domain Multiple Description Peer-reviewed

    Akinori Ito, Toshiyuki Sakai, Motoyuki Suzuki, Shozo Makino

    Proceedings of Japan-China Joint Conference on Acoustics 2007/06

  292. Application of Multiple Description (MD) scalar quantization to speech codec Peer-reviewed

    Ho seok Wey, Ryouichi Nishimura, Akinori Ito, Maori Kobayashi, Yoiti Suzuki

    Proceedings of Japan-China Joint Conference on Acoustics 2007/06

  293. A new segment quantization using Lempel-Ziv algorithm and its application to quantization of line spectral frequencies Peer-reviewed

    Minoru Kohata, Motoyuki Suzuki, Akinori Ito, Shozo Makino

    IEEE TRANSACTIONS ON COMMUNICATIONS 55 (4) 661-664 2007/04

    DOI: 10.1109/TCOMM.2007.894090  

    ISSN: 0090-6778

  294. HMnetのパス接続確率を利用した音素認識の検討

    本間大輔, 大河雄一, 鈴木基之, 伊藤彰則, 牧野正三

    日本音響学会2007年春季研究発表会講演論文集 53-54 2007/03

  295. Music information retrieval from a singing voice using lyrics and melody information Peer-reviewed

    Motoyuki Suzuki, Toru Hosoya, Akinori Ito

    Eurasip Journal on Advances in Signal Processing 2007 2007

    DOI: 10.1155/2007/38727  

    ISSN: 1110-8657 1687-0433

  296. Automatic evaluation system of English prosody for Japanese learner's speech Peer-reviewed

    Motoyuki Suzuki, Tatsuki Konno, Akinori Ito, Shozo Makino

    IMSCI '07: INTERNATIONAL MULTI-CONFERENCE ON SOCIETY, CYBERNETICS AND INFORMATICS, VOL 1, PROCEEDINGS 48-53 2007

  297. Increasing correlation using a few bits for multiple description coding Invited Peer-reviewed

    Akinori Ito, Shozo Makino

    2007 THIRD INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, VOL II, PROCEEDINGS 259-262 2007

    DOI: 10.1109/IIHMSP.2007.4457700  

  298. Music information retrieval from a singing voice using lyrics and melody information Peer-reviewed

    Motoyuki Suzuki, Toru Hosoya, Akinori Ito, Shozo Makino

    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING 2007

    DOI: 10.1155/2007/38727  

    ISSN: 1687-6180

  299. Pronunciation error detection for computer-assisted language learning system based on error rule clustering using a decision tree Peer-reviewed

    Akinori Ito, Yen-Ling Lim, Motoyuki Suzuki, Shozo Makino

    Acoustical Science and Technology 28 (2) 131-133 2007

    DOI: 10.1250/ast.28.131  

    ISSN: 1346-3969 1347-5177

  300. A Phoneme Duration Model Considering Speaking-rate and Linguistic Features for Speech Recognition Peer-reviewed

    大河雄一, 伊藤彰則, 鈴木基之, 牧野正三

    Journal of Information Processing Society Japan 47 (12) 3380-3391 2006/12

  301. Music Information Retrieval from a Singing Voice Based on Verification of Recognized Hypotheses Peer-reviewed

    Motoyuki Suzuki, Toru Hosoya, Akinori Ito, Shozo Makino

    Proceedings of 11th International Conference on Music Information Retrieval 168-171 2006/10

  302. 発話速度と言語的特徴の影響を考慮した持続時間モデルを用いた音声認識に関する研究

    大河雄一, 伊藤彰則, 鈴木基之, 牧野正三

    東北大学電気通信研究所 音響工学研究会 344-1 2006/08

  303. A New Segment Quantization of LSP Parameters with Lempel-Ziv Algorithm Peer-reviewed

    Minoru Kohata, Motoyuki Suzuki, Akinori Ito, Shozo Makino

    IEICE Transaction on Information and Systems J89-D (7) 1504-1513 2006/07

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 1880-4535

  304. Evaluation of multiple PLSA adaptation based on separation of topic and style words Invited Peer-reviewed

    Akinori Ito, Naoto Kuriyama, Motoyuki Suzuki, Shozo Makino

    Proceedings of 9th Western-Pacific Acoustic Conference 2006/06

  305. Packet loss concealment of audio stream based on multiple description by spectrum striping Invited Peer-reviewed

    Motoyuki Suzuki, Toshiyuki Sakai, Jie Liu, Akinori Ito, Shozo Makino

    Proceedings of 9th Western-Pacific Acoustic Conference 2006/06

  306. An effective music information retrieval method using three-dimensional continuous DP Peer-reviewed

    SP Heo, M Suzuki, A Ito, S Makino

    IEEE TRANSACTIONS ON MULTIMEDIA 8 (3) 633-639 2006/06

    DOI: 10.1109/TMM.2006.870717  

    ISSN: 1520-9210

  307. 音素持続時間予測モデルを用いたリスコアリングによる自然発話音声認識

    大河雄一, 伊藤彰則, 鈴木基之, 牧野正三

    日本音響学会2006年春季研究発表会講演論文集 1207-1208 2006/03

  308. Generating search query in unsupervised language model adaptaion using www

    Kajiura, Yasutomo, Suzuki, Motoyuki, Ito, Akinori, Makino, Shozo

    The Journal of the Acoustical Society of America 120 (5) 3043-3044 2006

    Publisher: ASA

  309. A grammatical error detection method for dialogue-based CALL system

    Kweon Oh-pyo, Ito Akinori, Suzuki Motoyuki, Makino Shozo

    Information and Media Technologies 1 (1) 391-410 2006

    Publisher: Information and Media Technologies Editorial Board

    DOI: 10.11185/imt.1.391  

    More details Close

    This paper describes a method to detect grammatical errors from a non-native speaker's utterance for a dialogue-based CALL (Computer Assisted Language Learning) system. For conversation exercises, several dialogue-based CALL systems were developed. However, one of the problems in conventional dialogue-based CALL systems is that a learner is usually assigned a passive role. The goal of our system is to allow a learner to compose his/her own sentences freely in a role-playing situation. One of the biggest problems in realizing the proposed system is that the learner's utterance inevitably contains pronunciation, lexical and grammatical errors. In this paper, we focus on the correction of the lexical and grammatical errors. To correct these errors, we propose two methods to detect lexical/grammatical errors in an utterance. The conventional methods are to write a grammar that accepts the errors manually. The proposed methods 1 and 2 use the `error rules' that are independent of the recognition grammar. The method 1 uses only correct system grammar and extends the recognition results using the `error rules'. The method 2 uses a general grammar (which does not consider the relationship between verb, particle and each noun) to recognize the learner's utterance and check acceptance of each N-best result and searches the learner's utterance. The grammar error detection experiment proved that the method 2 performs as well as the conventional method.

  310. Unsupervised language model adaptation based on automatic text collection from WWW Peer-reviewed

    Motoyuki Suzuki, Yasutomo Kajiura, Akinori Ito, Shozo Makino

    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 2202-2205 2006

  311. A User Simulator based on VoiceXML for evaluation of spoken dialog systems Peer-reviewed

    Akinori Ito, Keisuke Shimada, Motoyuki Suzuki, Shozo Makino

    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 1045-1048 2006

  312. Multiple description coding of an audio stream by optimum recovery transform Invited Peer-reviewed

    Akinori Ito, Shozo Makino

    IIH-MSP: 2006 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, PROCEEDINGS 19-+ 2006

    DOI: 10.1109/IIH-MSP.2006.265110  

  313. Automatic detection of English mispronunciation using speaker adaptation and automatic assessment of English Intonation and rhythm Peer-reviewed

    Akinori Ito, Tadao Nagasawa, Hirokazu Ogasawara, Motoyuki Suzuki, Shozo Makino

    Educational Technology Research 29 (1) 13-23 2006

    Publisher: Japan Society for Educational Technology

    DOI: 10.15077/etr.KJ00004963297  

    ISSN: 0387-7434

    More details Close

    This paper describes evaluation methods of English utterances by Japanese speaker. The proposed methods consist of the following two methods: a pronunciation evaluation method and a prosody evaluation method. The pronunciation evaluation method detects phoneme-level mispronunciations, and the prosody evaluation method treats intonation and rhythm of the speech. The pronunciation evaluation method exploits VFS speaker adaptation technique to improve precision of phoneme labeling. On the adaptation, we developed a new adaptation scheme that uses Japanese utterance to adapt English acoustic models. This method enables speaker adaptation for speakers who are not good at English pronunciation. The prosody evaluation method compares the pitch pattern of native speakers' utterances and that of a learner's one, and returns a score that reflects the utterance's naturalness. Besides intonation, the method compares the rhythm of utterances between native speakers' speech and the learner's one. Evaluation experiments are carried out to compare native speakers' evaluation scores and the system's one against Japanese speakers' utterances, and we obtained significant correlation between the two evaluations.

  314. Pronunciation Error Detection Method Based on Error Rule Clustering Using a Decision Tree Peer-reviewed

    Akinori Ito, Yenling Lim, Motoyuki Suzuki, Shozo Makino

    Proceeding of European Conference on Speech Communication and Technology 173-176 2005/09

  315. Construction Method of Acoustic Models Dealing with Various Background Noises Based on Combination of HMMs Peer-reviewed

    Motoyuki Suzuki, Yusuke Kato, Akinori Ito, Shozo Makino

    Proceeding of European Conference on Speech Communication and Technology 973-976 2005/09

  316. nternal Noise Suppression for Speech Recognition by Small Robots Peer-reviewed

    Akinori Ito, Takashi Kanayama, Motoyuki Suzuki, Shozo Makino

    Proceeding of European Conference on Speech Communication and Technology 2685-2688 2005/09

  317. Lyrics Recognition From A Singing Voice Based On Finite State Automaton For Music Information Retrieval Peer-reviewed

    Toru Hosoya, Motoyuki Suzuki, Akinori Ito, Shozo Makino

    Proceedings of the 6th International Conference on Music Information Retrieval 532-535 2005/09

  318. A Grammatical Error Detection Method for Dialogue-based CALL system Peer-reviewed

    Oh-Pyo Kweon, Akinori Ito, Motoyuki Suzuki, Shozo Makino

    Journal of Natural Language Processing 12 (4) 137-156 2005/08

    Publisher:

    DOI: 10.5715/jnlp.12.4_137  

    ISSN: 1340-7619

  319. Fast optimization of language model weight and insertion penalty from n-best candidates Peer-reviewed

    Akinori Ito, Masaki Kohda, Shozo Makino

    Acoustical Science and Technology 26 (4) 384-387 2005/07

    DOI: 10.1250/ast.26.384  

    ISSN: 1346-3969

  320. A new design concept of robotic interface for the improvement of user familiarity Peer-reviewed

    Y Hiroi, E Nakano, T Takahashi, A Ito, K Kotani, N Takatsu

    ICMIT 2005: CONTROL SYSTEMS AND ROBOTICS, PTS 1 AND 2 6042 (604230) 1-4 2005

    DOI: 10.1117/12.664685  

    ISSN: 0277-786X

  321. Smile and laughter recognition using speech processing and face recognition from conversation video Peer-reviewed

    A Ito, XY Wang, M Suzuki, S Makino

    2005 INTERNATIONAL CONFERENCE ON CYBERWORLDS, PROCEEDINGS 437-444 2005

    DOI: 10.1109/CW.2005.82  

  322. Noise Adaptive Spoken Dialog System based on Selection of Multiple Dialog Strategies Peer-reviewed

    Akinori Ito, Takanobu Oba, Takashi Konashi, Motoyuki Suzuki, Shozo Makino

    Proceedings of International Conference on Spoken Language Processing 1 193-196 2004/10

  323. A Japanese dialogue-based CALL system with mispronunciation and grammar error detection Peer-reviewed

    Oh Pyo Kweon, Akinori Ito, Motoyuki Suzuki, Shozo Makino

    Proceedings of International Conference on Spoken Language Processing 3 1833-1836 2004/10

  324. Speaker Adaptation Method for CALL Systems Using Bilingual Speakers' Utterances Peer-reviewed

    Motoyuki Suzuki, Hirokazu Ogasawara, Akinori Ito, Yuichi Ohkawa, Shozo Makino

    Proceedings of International Conference on Spoken Language Processing 4 2929-2932 2004/10

  325. Comparison of Features for DP-matching based Query-by-humming System Peer-reviewed

    Akinori Ito, Sung-Phil Heo, Motoyuki Suzuki, Shozo Makino

    Proceedings of the 5th International Conference on Music Information Retrieval 297-302 2004/10

  326. A spoken dialog system based on automatic grammar generation and template-based weighting for autonomous mobile robots Peer-reviewed

    Takashi KONASHI, Motoyuki SUZUKI, Akinori ITO, Shozo MAKINO

    Proceedings of International Conference on Spoken Language Processing 1 189-192 2004/10

  327. 再学習とモデル選択の反復によるマルチパス音響モデルの最適化

    大河雄一, 伊藤彰則, 鈴木基之, 牧野正三

    日本音響学会2004年秋季研究発表会講演論文集 I 77-78 2004/09

  328. A dialogue-based CALL system for Japanese conversation Peer-reviewed

    Oh-Pyo Kweon, Akinori Ito, Motoyuki Suzuki, Shozo Makino

    Proceedings of the 18th International Congress on Acoustics 3 2015-2018 2004/04

  329. Language modeling using stochastic switching N-gram Peer-reviewed

    NAGANO, Takeshi, SUZUKI, Motoyuki, ITO, Akinori, MAKINO, Shozo

    training 5 (3years) 1991-1993 2004/04

  330. Language Modeling by an Ergodic HMM based on an N-gram Peer-reviewed

    Takeshi Nagano, Motoyuki Suzuki, Akinori Ito, Shozo Makino, Masaharu Katoh, Masaki Kohda

    Proceedings of the 18th International Congress on Acoustics 5 3701-3704 2004/04

  331. オールスターモデル選択法による自然発話音声音響モデル学習の検討

    大河雄一, 伊藤彰則, 鈴木基之, 牧野正三

    日本音響学会2004年春季研究発表会講演論文集 I 101-102 2004/03

  332. SATを用いた二言語混合音響モデルの話者適応

    小笠原洋一, 伊藤彰則, 鈴木基之, 牧野正三, 大河雄一

    日本音響学会2004年春季研究発表会講演論文集 I 179-180 2004/03

  333. An evaluation method of Japanese pronunciation for Korean native speakers Peer-reviewed

    Oh Pyo Kweon, Motoyuki Suzuki, Akinori Ito, Shozo Makino

    Educational Technology Research 27 (1) 1-8 2004/01

    Publisher: Japan Society for Educational Technology

    DOI: 10.15077/etr.KJ00003899214  

    ISSN: 0387-7434

    More details Close

    This paper describes an analysis of pronunciation problems in Japanese utterances by Korean speakers, and evaluation methods of a CALL (Computer Assisted Language Learning) system for teaching Japanese pronunciation to Korean speakers. To develop a CALL system, the pronunciation problems of Koreans must be understood. Firstly, Japanese utterances by adult Korean speakers were evaluated by Japanese native speakers. Then, the Japanese pronunciation problems of Korean speakers were analyzed. Finally, evaluation methods were developed. Speech recognition technology was used to compare Japanese utterances by a learner with that by a native speaker. With the proposed methods, intelligibility scores which indicate the similarity between the learner's speech and the Japanese native's speech are automatically calculated.

  334. A Patient Care Service Robot System Based on a State Transition Architecture Peer-reviewed

    Yutaka Hiroi, Eiji Nakano, Takayuki Takahashi, Shozo Makino, Akinori Ito, Koji Kotani, Nobuo Takatsu, Tadahiro Ohmi

    Proceedings of the 2nd International Conference on Mechatronics and Information Technology 231-236 2003/12

  335. 自然発話音声認識のための高精度な音響モデル学習法の検討

    大河雄一, 鈴木基之, 伊藤彰則, 牧野正三

    東北大学電気通信研究所 音響工学研究会327-1 2003/11

  336. Three dimensional continuous DP algorithm for multiple pitch candidates in music information retrieval system Peer-reviewed

    Heo, Sungphil, Suzuki, Motoyuki, Ito, Akinori, Makino, Shozo

    Proceedings of 4th International Symposium on Music Information Retrieval 235-236 2003/10

    Publisher: Johns Hopkins University

  337. 学習話者の異なる複数言語の音響モデルの話者適応の検討

    小笠原洋一, 鈴木基之, 伊藤彰則, 牧野正三, 大河雄一

    日本音響学会 2003年秋季研究発表会講演論文集 I 109-110 2003/09

  338. Multiple pitch candidates based music information retrieval method for query-by-humming Peer-reviewed

    Heo, Sung-Phil, Suzuki, M., Ito, A., Makino, S., Chung, HY

    Proc. AMR 189-200 2003/09

  339. マルチパス音響モデルによる自然発話音声の認識に関する研究

    大河雄一, 吉田明弘, 鈴木基之, 伊藤彰則, 牧野正三

    東北大学電気通信研究所 音響工学研究会 325-1 2003/07

  340. Analysis of pronunciation errors in Japanese speech uttered by Korean towards development of Japanese CALL system Peer-reviewed

    KWEON, OH

    Proc. of O-COCOSDA 2003 185-192 2003/06

  341. A Portable spoken dialog system for autonomous robots Peer-reviewed

    Takashi Konashi, Motoyuki Suzuki, Akinori Ito, Shozo Makino

    Proceeding of 1st International Workshop on Language Understanding and Agents for Real-world Interaction 79-84 2003/05

  342. Construction and evaluation of language models based on stochastic context-free grammar for speech recognition

    Chiori Hori, Masaharu Katoh, Akinori Ito, Masaki Kohda

    Systems and Computers in Japan 33 (13) 48-59 2002/11/30

    DOI: 10.1002/scj.1172  

    ISSN: 0882-1666

  343. 適応学習における話者適応法の比較

    大河雄一, 鈴木基之, 伊藤彰則, 牧野正三

    日本音響学会 2002年秋季研究発表会講演論文集 I 113-114 2002/09

  344. A Metric based on Likelihood Difference for N-gram Language Model Evaluation Peer-reviewed

    Akinori Ito, Masaki Kohda

    IPSJ Journal 43 (7) 2055-2064 2002/07

  345. Construction and evaluation of language models based on stochastic context-free grammar for speech recognition Peer-reviewed

    Chiori Hori, Masaharu Katoh, Akinori Ito, Masaki Kohda

    IEICE Trans.(D-II) J83-D-II (11) 2407-2417 2000/11

  346. Evaluation of Task Adaptation Using N-gram Count Mixture Peer-reviewed

    Akinori Ito, Masaki Kohda

    IEICE Trans.(D-II) J83-D-II (11) 2418-2427 2000/11

  347. Language modeling by stochastic dependency grammar for Japanese speech recognition Peer-reviewed

    Akinori Ito, Chiori Hori, Masaharu Katoh, Masaki Kohda

    Proceeding of International Conference on Spoken Language Processing 2000/10

  348. Free Software Toolkit for Japanese large vocabulary continuous speech recognition Peer-reviewed

    Tatsuya Kawahara, Akinobu Lee, Tetsunori Kobayashi, Kazuya Takeda, Nobuaki Minematsu, Shigaki Sagayama, Katsunobu Itoh, Akinori Ito, Mikio Yamamoto, Atsushi Yamada, Takehito Utsuro, Kiyohiro Shikano

    Proceeding of International Conference on Spoken Language Processing 476-479 2000/10

  349. Overview of Japanese Dictation Toolkit

    Kawaharay, Tatsuya, Lee, Akinobu, Kobayashi, Tetsunori, Takeda, Kazuya, Minematsu, Nobuaki, Sagayama, Shigeki and ETL, Katsunobu Itou, Ito, Akinori, Yamamoto, Mikio, Yamada, Atsushi

    2000

  350. A new metric for stochastic language model evaluation Peer-reviewed

    Akinori Ito, Masaki Kohda

    Proceeding of European Conference on Speech Communication and Technology 4 1591-1594 1999/09

  351. A Study on a Phoneme-graph-based Hypothesis Restriction for Large Vocabulary Continuous Speech Recognition Peer-reviewed

    Takaaki Hori, Masaharu Katoh, Akinori Ito, Masaki Kohda

    IPSJ Journal 40 (4) 1365-1373 1999/04

  352. A Study on a State Clustering-Based Topology Design Method for HM-Nets Peer-reviewed

    Takaaki Hori, Masaharu Katoh, Akinori Ito, Masaki Kohda

    IEICE Trans.(D-II) J81-D-II (10) 2239-2248 1998/10

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0915-1923

  353. Evaluation of Japanese Dictation ToolKit-1997 version-

    Kawahara,Tatsuya, Lee,Akinobu, Kobayashi,Tetsunori, Takeda,Kazuya, Minematsu,Nobuaki, Ito,Katsunobu, Ito,Akinori, Yamamoto,Mikio, Yamada,Atsushi, Utsuro,Takehito, Shikano,Kiyohiro

    IPSJ SIG Notes 98 (49) 91-96 1998/05

    Publisher: 一般社団法人情報処理学会

    ISSN: 0919-6072

  354. A study on HM-Nets using decision tree-based successive state splitting Peer-reviewed

    Takaaki Hori, Masaharu Katoh, Akinori Ito, Masaki Kohda

    Proceeding of IEEE International Conference on Speech Processing 1 383-387 1998/05

  355. Common Platform of Japanese Large Vocabulary Continuous Speech Recognizer Assessment -- Proposal and Initial Results -- Peer-reviewed

    T.Kawahara, A.Lee, T.Kobayashi, K.Takeda, N.Minematsu, K.Itou, A.Ito, M.Yamamoto, A.Yamada, T.Utsuro, K.Shikano

    Proc. Oriental-COCOSDA Workshop 117-122 1998

  356. A Study on HM-Nets Using Phonetic Decision Tree-Based Successive State Splitting Peer-reviewed

    Takaaki Hori, Masaharu Katoh, Akinori Ito, Masaki Kohda

    IEICE Trans.(D-II) J80-D-II (10) 2645-2654 1997/10

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0915-1923

  357. N-gram language model adaptation using small corpus for spoken dialog recognition Peer-reviewed

    Akinori Ito, Hideyuki Saitoh, Masaharu Katoh, Masaki Kohda

    Proceeding of European Conference on Speech Processing 2735-2738 1997/09

  358. Language Modeling by Kana and Kanji String N-gram Peer-reviewed

    Akinori Ito, Masaki Kohda

    IEICE Trans.(D-II) J79-D-II (12) 2062-2069 1996/12

  359. The performance prediction on sentence recognition using a finite state word automaton Peer-reviewed

    T Otsuki, A Ito, S Makino, T Ohtomo

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E79D (1) 47-53 1996/01

    ISSN: 0916-8532

  360. Language modeling by string pattern N-gram for Japanese speech recognition Peer-reviewed

    A Ito, M Kohda

    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4 490-493 1996

  361. A NEW HMNET CONSTRUCTION ALGORITHM REQUIRING NO CONTEXTUAL FACTORS Peer-reviewed

    M SUZUKI, S MAKINO, A ITO, H ASO, H SHIMODAIRA

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E78D (6) 662-668 1995/06

    ISSN: 0916-8532

  362. Word Pre-Selection Using Extended Redundant Hash Addressing Method for Continuous Speech Recognition Peer-reviewed

    Akinori Ito, Shozo Makino

    IEICE Trans.(D-II) J78-D-II (3) 400-408 1995/03

  363. Performance Prediction of Word Recognition Using the Probability of Word Occurrence Peer-reviewed

    Takashi Otsuki, Akinori Ito, Shozo Makino, Teruhiko Otomo

    IEICE Trans.(A) J77-A (2) 274-281 1994/02

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5707

  364. A continuous speech recognition system using a modified LVQ2 method and a dependency grammar with semantic constraints Peer-reviewed

    Shozo Makino, Akinori Ito, Mitsuru Endo, Ken'iti Kido

    J. Pattern Recognition and Artificial Intelligence 8 (1) 197-213 1994/01

    DOI: 10.1142/S0218001494000097  

  365. THE PERFORMANCE PREDICTION METHOD ON SENTENCE RECOGNITION SYSTEM USING A FINITE STATE AUTOMATON Peer-reviewed

    T OTSUKI, A ITO, S MAKINO, T OTOMO

    ICASSP-94 - PROCEEDINGS, VOL 1 397-400 1994

  366. A Fast Word Pre-Selection Based on Speech Fragments for Continuous Speech Recognition

    Akinori Ito, Shozo Makino

    Proceeding of International Workshop on Speech Processing 107-112 1993/11

  367. Performance Prediction of Word Recognition Using the Transition Information between Phonemes or between Characters Peer-reviewed

    Takashi Otsuki, Akinori Ito, Shozo Makino, Toshio Sone

    IEICE Trans.(D-II) J76-D-Ii (6) 1090-1096 1993/06

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0915-1923

  368. Speech to Text Conversion System Based on Phoneme Recognition Peer-reviewed

    Shozo Makino, Akinori Ito, Mitsuru Endo, Ken'ichi Kido

    The Annals of Applied Information Sciences 18 (1-2) 51-66 1993/03

  369. A NEW WORD PRESELECTION METHOD BASED ON AN EXTENDED REDUNDANT HASH ADDRESSING FOR CONTINUOUS SPEECH RECOGNITION Peer-reviewed

    A ITO, S MAKINO

    ICASSP-93 : 1993 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5 B299-B302 1993

  370. Word pre-selection using a redundant hash addressing method for continuous speech recognition Peer-reviewed

    Akinori Ito, Shozo Makino

    Proceeding of the International Conference on Spoken Language Processing 309-312 1992/10

  371. A Functional Word Prediction CYK Method for Parsing Spoken Japanese Sentences Peer-reviewed

    Akinori Ito, Shozo Makino, Ken'iti Kido

    IEICE Trans.(D-II) J74-D-II (9) 1147-1155 1991/09

    Publisher:

    ISSN: 0915-1923

  372. A JAPANESE TEXT DICTATION SYSTEM BASED ON PHONEME RECOGNITION AND A DEPENDENCY GRAMMAR Peer-reviewed

    S MAKINO, A ITO, M ENDO, K KIDO

    IEICE TRANSACTIONS ON COMMUNICATIONS ELECTRONICS INFORMATION AND SYSTEMS 74 (7) 1773-1782 1991/07

    ISSN: 0917-1673

  373. Parsing of spoken Japanese sentences using the functional word prediction CYK algorithm Peer-reviewed

    Akinori Ito, Shozo Makino, Ken'iti Kido

    Proc. Korea-Japan Joint Symposium on Acoustics 218-221 1991/07

  374. A JAPANESE TEXT DICTATION SYSTEM BASED ON PHONEME RECOGNITION AND A DEPENDENCY GRAMMAR Peer-reviewed

    S MAKINO, A ITO, M ENDO, K KIDO

    ICASSP 91, VOLS 1-5 273-276 1991

  375. A Japanese Text Dictation System Based on Phoneme Recognition Using a Modified LVQ2 Method Peer-reviewed

    Shozo Makino, Akinori Ito, Mitsuru Endo, Ken'iti Kido

    Proceeding of the International Conference on Spoken Language Processing 241-244 1990/11

  376. Computerized System for Recording and Analysis of the Circadian Biological Activity Peer-reviewed

    Kunio Isono, Yoshiharu Oda, Akinori Ito, Satoshi Hongo, Masao Miyauchi, Atsushi Harada, Shouichi Musashi, Yasuo Tsukahara

    The Annals of Applied Information Sciences 15 (1) 155-166 1990/03

  377. Linguistic Processing in Japanese Dictation System Peer-reviewed

    Shozo Makino, Akinori Ito, Mitsuru Endo, Ken'iti Kido

    Preprints of The Third Symposium on Advanced Man-Machine Interface Through Spoken Language 25-1-25-10 1989/12

  378. Bunsetsu-spotting Based Linguistic Processing for a Japanese Dictation System Peer-reviewed

    Shozo Makino, Akinori Ito, Yoichi Ogawa, Michio Okada, Ken'iti Kido

    Preprints of The Second Symposium on Advanced Man-Machine Interface Through Spoken Language 29-1-29-10 1988/11

  379. `Bunsetsu' Spotting-based Japanese Continuous Speech Recognition Peer-reviewed

    Michio Okada, Hiroshi Matsuo, Akinori Ito, Yoiti Ogawa, Shozo Makino, Ken'iti Kido

    Trans. IEEJ(C) 108-C (10) 826-833 1988/10

    DOI: 10.1541/ieejeiss1987.108.10_826  

  380. Japanese Conjugate Word Spotting in Continuous Speech Using a Syntactic Driven Continuous DP Matching Algorithm Peer-reviewed

    Michio Okada, Akinori Ito, Shozo Makino, Ken'iti Kido

    IEICE Trans.(D) 70 (12) p2479-2490 1987/12

    Publisher:

    ISSN: 0913-5731

Show all ︎Show first 5

Misc. 351

  1. Fundamental investigation of a human-following robot system that moves side-by-side with a person

    廣井富, 朝倉大裕, 中田海地, 伊藤彰則

    日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM) 2020 2020

    ISSN: 2424-3124

  2. 人追従時における追従対象者と非追従対象者の切り分け手法の実装

    中田海地, 朝倉大裕, 廣井富, 伊藤彰則

    計測自動制御学会システムインテグレーション部門講演会(CD-ROM) 20th 2019

  3. 2台のLRFを用いた人追跡手法の提案-鬼ごっこロボットの開発-

    池本瑚幸, 廣井富, 伊藤彰則

    計測自動制御学会システムインテグレーション部門講演会(CD-ROM) 20th 2019

  4. テレプレゼンスロボットのための操作者の顔提示機能の開発

    野阪百穂, 廣井富, 伊藤彰則

    日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM) 2019 2019

    ISSN: 2424-3124

  5. 人追従時における追従対象者と非追従対象者の切り分けに関する基礎的検討

    中田海地, 廣井富, 伊藤彰則

    日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM) 2019 2019

    ISSN: 2424-3124

  6. Preface

    Jeng Shyang Pan, Akinori Ito, Pei Wei Tsai, Lakhmi C. Jain

    Smart Innovation, Systems and Technologies 109 V-VI 2019

    ISSN: 2190-3018

    eISSN: 2190-3026

  7. デモンストレーションを指向したロボットの原点復帰の提案-「だるまさんが転んだ」を行うロボットの開発-

    中森裕子, 廣井富, 伊藤彰則

    日本ロボット学会学術講演会予稿集(CD-ROM) 36th 2018

  8. 操作者の顔を再現するテレプレゼンスロボットの提案

    野阪百穂, 廣井富, 伊藤彰則

    計測自動制御学会システムインテグレーション部門講演会(CD-ROM) 19th 2018

  9. 「だるまさんが転んだ」の鬼役ロボットのためのタッチ機能の開発

    中森裕子, 廣井富, 田中翔吾, 伊藤彰則

    計測自動制御学会システムインテグレーション部門講演会(CD-ROM) 19th 2018

  10. RGB-DカメラとLaser Range Finderを用いた障害物回避に関する基礎的検討

    宮内雄大, 廣井富, 伊藤彰則

    計測自動制御学会システムインテグレーション部門講演会(CD-ROM) 19th 2018

  11. 正面から接近する歩行者に対するロボットの事前回避手法の開発

    廣井富, 宮内雄大, 伊藤彰則

    日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM) 2018 2018

    ISSN: 2424-3124

  12. OpenPoseを用いた人の振り返り検出手法の開発-「だるまさんが転んだ」を行うロボットの開発-

    廣井富, 小田垣成伸, 伊藤彰則

    日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM) 2018 2018

    ISSN: 2424-3124

  13. Poster Presentation : A Study on Singer-Independent Singing Voice Conversion Using Read Speech Based on Neural Network

    116 (414) 17-22 2017/01/21

    Publisher: 電子情報通信学会

    ISSN: 0913-5685

  14. OpenPoseとLRFを用いた群衆回避手法の試み

    森下康平, 廣井富, 宮内雄大, 伊藤彰則

    計測自動制御学会システムインテグレーション部門講演会(CD-ROM) 18th 2017

  15. RGB-Dカメラを用いた床面上の小物体回避に関する基礎的検討

    宮内雄大, 廣井富, 今西天希, 伊藤彰則

    計測自動制御学会システムインテグレーション部門講演会(CD-ROM) 18th 2017

  16. LRFとビジョンの併用による群衆通り抜け時における人追跡手法の開発

    宮内雄大, 廣井富, 西口敏司, 伊藤彰則

    日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM) 2017 2017

    ISSN: 2424-3124

  17. LRFを用いた「だるまさんが転んだ」における「幅判定手法」の効果

    中森裕子, 廣井富, 伊藤彰則

    日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM) 2017 2017

    ISSN: 2424-3124

  18. Improvement of Accent Sandhi Rules Based on Accent Dictionary for Japanese Text-to-Speech Systems

    116 (378) 31-36 2016/12/20

    Publisher: 電子情報通信学会

    ISSN: 0913-5685

  19. Poster Presentation : Development of the Julius-compatible interface for the speech recognition engine of Kaldi toolkit

    116 (378) 49-51 2016/12/20

    Publisher: 電子情報通信学会

    ISSN: 0913-5685

  20. Poster Presentation : F0 control by modeling differential features in DNN-based speech synthesis

    116 (378) 37-42 2016/12/20

    Publisher: 電子情報通信学会

    ISSN: 0913-5685

  21. Discrimination of Level of Willingness to Talk and Analysis of Features by Using Dialog Collected on WOZ basis

    78 7-12 2016/10/05

    Publisher: 人工知能学会

    ISSN: 0918-5682

  22. A Study on Colorization in Photo-Realistic Facial Animation Synthesis from Text Based on HMM and DNN with Animation Unit

    116 (220) 67-72 2016/09/15

    Publisher: 電子情報通信学会

    ISSN: 0913-5685

  23. A Study on Colorization in Photo-Realistic Facial Animation Synthesis from Text Based on HMM and DNN with Animation Unit

    40 (31) 67-72 2016/09

    Publisher: 映像情報メディア学会

    ISSN: 1342-6893

  24. Study of Photo-realistic Face Moving Image Generation from the Text Using the Facial Feature

    116 (33) 43-48 2016/05/19

    Publisher: 電子情報通信学会

    ISSN: 0913-5685

  25. 円形回避領域を用いた群衆回避手法の提案

    森下康平, 廣井富, 伊藤彰則

    日本ロボット学会学術講演会予稿集(CD-ROM) 34th 2016

  26. RGB-Dセンサを用いた指差し認識に関する研究-位置誤差に関する一考察-

    津田剛志, 廣井富, 伊藤彰則

    日本ロボット学会学術講演会予稿集(CD-ROM) 34th 2016

  27. 複数台の道案内ロボットのための人位置情報の引き継ぎ手法の提案

    田中佑季, 廣井富, 伊藤彰則

    日本ロボット学会学術講演会予稿集(CD-ROM) 34th 2016

  28. 複数台の手すりを移動する道案内ロボットによる人位置情報の引き継ぎ手法の実装

    田中佑季, 廣井富, 伊藤彰則

    日本感性工学会大会予稿集(CD-ROM) 18th 2016

  29. 子どもと外遊びを行うテレプレゼンスロボットの提案

    廣井富, 中森裕子, 森下康平, 伊藤彰則

    計測自動制御学会システムインテグレーション部門講演会(CD-ROM) 17th 2016

  30. 移動ロボット接近時における動作予告を用いた恐怖感低減に関する検討

    廣井富, 前田彰大, 田中佑季, 松丸隆文, 伊藤彰則

    日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM) 2016 2016

    ISSN: 2424-3124

  31. 拡張現実感を用いた恐怖感低減手法に関する検討

    廣井富, 前田彰大, 田中佑季, 伊藤彰則

    日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM) 2016 2016

    ISSN: 2424-3124

  32. Analyzing the human-human dialog and examining to build WOZ system for estimating the user's willingness to talk

    115 (346) 117-122 2015/12/02

    Publisher: 電子情報通信学会

    ISSN: 0913-5685

  33. A study on quick model training in HMM-based speech synthesis

    115 (253) 27-32 2015/10/15

    Publisher: 電子情報通信学会

    ISSN: 0913-5685

  34. Multiple Description Vector Quantizer Based on Bit-Error-Tolerant Vector Quantizer Design

    115 (219) 33-38 2015/09/10

    Publisher: 電子情報通信学会

    ISSN: 0913-5685

  35. Multiple Description Vector Quantizer Based on Bit-Error-Tolerant Vector Quantizer Design

    39 (32) 33-38 2015/09

    Publisher: 映像情報メディア学会

    ISSN: 1342-6893

  36. Automatic generation of abbreviated named entities for localized speech recognition

    115 (184) 7-12 2015/08/21

    Publisher: 電子情報通信学会

    ISSN: 0913-5685

  37. HMM音声合成におけるアクセントラベリング基準が合成音声に与える影響の分析

    高橋 遼太, 能勢 隆, 伊藤 彰則

    情報処理学会研究報告. SLP, 音声言語情報処理 2015 (1) 1-6 2015/05/18

    Publisher: 一般社団法人情報処理学会

    More details Close

    本論文では,従来の HMM 音声合成において曖昧であったアクセントラベリング基準について検討を行い,合成音声への影響を調べる.具体的には,アクセント型の表現およびアクセント句境界の基準について検討する.アクセント型については,尾高型が 0 型とモーラ長型の 2 通りの表現があることに着目し,それらを用いた場合に合成音声の F0 がどのような影響を受けるかについて客観評価を行う.また,2 段階クラスタリングを用いる効果についても検証する.アクセント句境界については,アクセント句によっては 0 型と 1 型の 2 つのアクセント句で表現する場合と,それらを結合し 1 つのアクセント句として表現する場合があり,これらの違いが合成音声に与える影響を調べる.またこれらの評価において,日本語アクセントの高低の誤りを客観的指標として導入し,この指標の有効性について分析を行う.

  38. 日本人のための音声対話による英会話学習システム

    伊藤 彰則

    情報処理学会研究報告. SLP, 音声言語情報処理 2015 (12) 1-6 2015/05/18

    Publisher: 一般社団法人情報処理学会

    More details Close

    筆者のグループがこれまで研究してきた,音声対話を利用した英会話のための CALL システムに関する技術について述べる.音声認識技術を利用した現状の CALL システムは,発音やイントネーションなど,1 つの発話に含まれる要素を採点するものが多い.それも重要ではあるが,英会話学習には 「実際に使われる表現を何度も繰り返して練習する」 ということも必要である.この考えに基づき,筆者のグループではこれまで 「対話に基づく CALL システム」 について研究してきた.本稿では,対話音声からの韻律評価,文法誤り検出および応答タイミング制御練習のためのシステムについて述べる.

  39. シナリオ対話における感情音声合成を用いた対話システムの評価と感情付与方法の検討

    加瀬 嵩人, 能勢 隆, 千葉 祐弥, 伊藤 彰則

    情報処理学会研究報告. SLP, 音声言語情報処理 2015 (9) 1-7 2015/05/18

    Publisher: 一般社団法人情報処理学会

    More details Close

    近年,非タスク指向型の音声対話システムへの需要が拡大しており,様々な研究がされている.それらほとんどの研究は言語的な観点から適切な応答の生成を目指したものである.一方で人間同士の会話においては,感情表現や発話様式などのパラ言語情報を効果的に利用することにより,対話を円滑に進めることができると考えられる.そこで我々はシステムの応答の内容ではなく,応答の仕方に着目し,感情音声合成を対話システムに用いることを試みる.本研究ではまず,適切な感情付与を人手により与えた場合に実際に対話システムの質が向上するかを複数のシナリオを作成して主観基準により評価する.次に,感情付与を自動化するために,システム発話に応じた付与とユーザ発話に協調した付与の 2 つの手法について検討を行う.評価結果から,感情を自動付与することで対話におけるユーザの主観評価スコアが向上すること,またユーザ発話に協調した感情付与がより効果的であることを示す.

  40. ユーザの対話意欲自動推定を目標とした対話データの分析と音声画像特徴量の検討

    千葉 祐弥, 能勢 隆, 伊藤 彰則

    研究報告音声言語情報処理(SLP) 2015 (10) 1-6 2015/02/20

    Publisher: 一般社団法人情報処理学会

    More details Close

    対話型システムがユーザに適応して話題の提供や情報推薦を行うためには,ユーザの情報を効率的に獲得できることが望ましい.本研究では,ユーザに対して積極的に質問するインタビュー型の音声対話システムを想定する.このようなシステムとの対話では,ユーザが話したいと思う話題に関してはより詳細な情報が得られる可能性がある一方,ユーザが話したくない話題に関しては有益な情報が得られない可能性が高いと考えられるため,システムはユーザの対話意欲を考慮して質問や話題の選択を行う必要がある.本稿では,ユーザの対話意欲を自動推定するための初期検討として,人間同士のインタビュー対話の分析とその自動識別を行った.分析から,対話者自身が自分の対話意欲の高低を自覚できている場合,70~80% 程度の精度で第三者にあたる評価者が対話意欲を判断できることが示唆された.また,評価者のアンケートに挙げられたマルチモーダル情報を利用することで,人間と同程度の精度で自動識別できることが示された.

  41. Waveletを用いた特徴量抽出法とその高精度化手法の評価

    松井 清彰, 能勢 隆, 伊藤 彰則

    研究報告音声言語情報処理(SLP) 2015 (5) 1-6 2015/02/20

    Publisher: 一般社団法人情報処理学会

    More details Close

    音声認識の普及のために,より安価な音声認識システムの実現が必要である.音声認識の低演算量化に関しては様々な先行研究が行われているが,特徴量抽出処理に関しては研究が不十分である.そのため我々は,Wavelet 変換を用いた新しい低演算量特徴量抽出法およびその高精度化手法について提案してきた.本論文では,Haar Wavelet 及び Daubechies Wavelet の 2 種類の Wavelet を用いて特徴量抽出を行い,その性能を MFCC と比較した.その結果,高精度化手法を用いることで,若干の認識率の向上が見られた.また,フレーム間の動的特徴量である Δ 特徴量及び MFCC と同様に,DCT 出力の高次削減によって,さらに認識率を向上させることができた.一方,計算時間に関しては,最もシンプルな Wavelet を用いることで,MFCC の 5 倍以上の計算速度を確保できることが分かった.

  42. 英会話学習システムの複数回使用時における学習者の交替潜時の変化に関する検討

    鈴木直人, 廣井富, 藤原祐磨, 千葉祐弥, 能勢隆, 伊藤彰則

    日本音響学会研究発表会講演論文集(CD-ROM) 2015 2015

    ISSN: 1880-7658

  43. 英会話学習システムにおける応答タイミング練習方法の有効性の検証

    鈴木直人, 廣井富, 藤原祐磨, 千葉祐弥, 能勢隆, 伊藤彰則

    情報処理学会研究報告(Web) 2015 (SLP-105) 2015

  44. 空き缶を拾うロボット-物体の傾き推定に関する一手法-

    二上啓大, 廣井富, 西口敏司, 伊藤彰則

    日本ロボット学会学術講演会予稿集(CD-ROM) 33rd 2015

  45. 荷物の運搬支援のための台車の開発-台車の自走を可能にする着脱式駆動ユニット-

    坂井奎亮, 廣井富, 伊藤克明, 伊藤彰則

    日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM) 2015 2015

    ISSN: 2424-3124

  46. ロボットとの「だるまさんがころんだ」の提案

    廣井富, 坂井奎亮, 立田裕記, 伊藤彰則

    日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM) 2015 2015

    ISSN: 2424-3124

  47. 拡張現実感を用いた生活支援ロボットの恐怖感低減手法の評価-ロボットサイズに関する実験-

    廣井富, 森奨平, 藤原祐磨, 伊藤彰則

    日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM) 2015 2015

    ISSN: 2424-3124

  48. 人の少し前を移動するコミュニケーションロボットの評価-手すり上を移動するコミュニケーションロボットの開発-

    田中佑季, 廣井富, 藤原祐磨, 伊藤彰則

    日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM) 2015 2015

    ISSN: 2424-3124

  49. 拡張現実感を用いた生活支援ロボットの恐怖感低減手法の評価-ロボットの色に関する実験-

    廣井富, 森奨平, 藤原祐磨, 伊藤彰則

    日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM) 2015 2015

    ISSN: 2424-3124

  50. Drawing the current and future figures of ASJ from viewpoint of number of members

    Ito Akinori

    The Journal of the Acoustical Society of Japan 71 (1) 5-6 2014/12/25

    Publisher: The Acoustical Society of Japan (ASJ)

    ISSN: 0369-4232

  51. Bit-error-tolerant Quantizer Based on Self-Organizing Map

    ITO Akinori

    Technical report of IEICE. EA 114 (315) 19-24 2014/11/20

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    Bit errors cannot be avoided when communicating using a digital channel. Packet-based communication abodons the packets with bit errors. However, we can assume a multimedia communiation such as speech or image where small amount of bit errors are not fatal. Considering such kind of multimedia communication, effect of bit error on quality of multimedia data was investigated. The result suggested that vector quantization was more fragile than scalar quantization with respect to the bit errors. Then a new vector quantization is proposed, which is robust against bit errors. The proposed method is based on the self-organizing map (SOM), and the codebook is designed so that Hamming distance between two codes and Euclidean discance between the corresponding centroids are correlated. The results of the simulation experiments showed that the proposed method was less affected by bit errors compared with the conventional k-means method.

  52. 日本人による英語歌唱音声の発音評価手法の検討

    吉田一道, 能勢隆, 伊藤彰則

    研究報告音楽情報科学(MUS) 2014 (9) 1-6 2014/11/13

    More details Close

    我々は日本人による英語歌唱音声の英語発音の自動評価を目指している.本研究では,日本人による英語歌詞朗読音声,歌唱音声のデータベースを構築し,英語ネイティブ話者と日本語ネイティブ話者による主観評価を行った.また,英語ネイティブ話者と日本語ネイティブ話者による英語歌詞朗読音声と英語歌唱音声の評価を比較し,歌唱音声では発話音声と比較して伸ばすフレーズに発音誤りが生じやすいということが示唆された.さらに,HMM による英語歌唱の自動発音評価手法について検討し,日米 2 言語のネイティブ話者による発話音声から学習した HMM を用いた簡単な発音誤り判定実験を行った.その結果,発音誤り判定時の尤度差の閾値や歌唱時に伸ばすフレーズの発音誤りの検討により,更に検出精度を向上させられる可能性がある事を論じた.

  53. 日本人による英語歌唱音声の発音評価手法の検討

    吉田一道, 能勢隆, 伊藤彰則

    研究報告デジタルコンテンツクリエーション(DCC) 2014 (9) 1-6 2014/11/13

    More details Close

    我々は日本人による英語歌唱音声の英語発音の自動評価を目指している.本研究では,日本人による英語歌詞朗読音声,歌唱音声のデータベースを構築し,英語ネイティブ話者と日本語ネイティブ話者による主観評価を行った.また,英語ネイティブ話者と日本語ネイティブ話者による英語歌詞朗読音声と英語歌唱音声の評価を比較し,歌唱音声では発話音声と比較して伸ばすフレーズに発音誤りが生じやすいということが示唆された.さらに,HMM による英語歌唱の自動発音評価手法について検討し,日米 2 言語のネイティブ話者による発話音声から学習した HMM を用いた簡単な発音誤り判定実験を行った.その結果,発音誤り判定時の尤度差の閾値や歌唱時に伸ばすフレーズの発音誤りの検討により,更に検出精度を向上させられる可能性がある事を論じた.

  54. A Study on Intuitive Control of Emotional Expressions and Speaking Styles Using Facial Features by Kinect

    BI Yu, NOSE Takashi, ITO Akinori

    IEICE technical report. Speech 114 (303) 25-30 2014/11/13

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    This paper proposes a style control technique of synthetic speech based on multiple regression HSMM (MRHSMM) using facial features. In the proposed technique, styles and their intensities are represented by Animation Unit (AU) parameters and are modeled by an assumption that mean parameters of acoustic models are given as multiple regressions of the AU parameters. Since correlation among AU parameters is problematic in the modeling, we conducted orthogonalization and dimiensionality reduction in advance. When synthesizing speech, we can generated synthetic speech with an intended style by inputting the corresponding facial expression. In this study, we examine the appropriate number of AU parameters and discuss the performance difference depending on the users.

  55. Analysis of interview dialog for building user-profiling dialog system considering motivation of conversation

    CHIBA Yuya, ITO Akinori

    Technical report of IEICE. HCS 114 (273) 43-48 2014/10/23

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    A dialog system was to obtain user's profile appropriately in order to provide a topic of dialog or recommend information adapted to the user. In this research, we assumed an interview-based user profiling system which asks the user about his/her personal information actively. In such a system, the system can obtain the detailed information if the user want to talk about the provided topic, but the system cannot obtain the beneficial information if the user does not interested in talking with that topic. A present paper analyse the interview dialog between humans for an initial study to estimate the user's motivation of conversation. As a result, the evaluators can judge the user's motivation of conversation of the relatively long dialog of the single topic in an accuracy from 70% to 80%. In addition, when they evaluate the subdivided dialog data, we observed some correlation between the judgements of the evaluators, although the concordance of the evaluation was decreased. Finally, it was indicated that several multi-modal information is efficient to estimate the user's motivation of the conversation, such as prosodic information, linguistic information, gesture, and gaze activity of the user.

  56. コンピュータが声を聴く : 機械による音声の認識 (特集 きく)

    伊藤 彰則

    高翔 : 自動車技術会関東支部報 (62) 16-19 2014/07

    Publisher: 自動車技術会関東支部

  57. 20 years of SIG-SLP ―Review by successive chairs―

    Tsuneo Nitta, Tetsunori Kobayashi, Satoshi Nakamura, Kazuya Takeda, Tatsuya Kawahara, Akinori Ito

    IPSJ SIG Notes 2014 (5) 1-6 2014/01/24

    Publisher: Information Processing Society of Japan (IPSJ)

    More details Close

    This report reviews researches presented in 20-year of SlG-SLP meetings and overlooks the trends of spoken language processing research. First, the facts of papers presented in SIG-SLP are described. Then we present chair-by-chair trends of spoken language research, and finally we make suggestions to promote spoken language research of the next decade.

  58. Subjective evaluation of latency and speech degradation of VoIP communication with packet loss concealment under severe packet loss

    389-392 2014

    Publisher: 日本音響学会

    ISSN: 1880-7658

  59. A sinusoidal model for voiced speech based on a complex analysis window

    319-322 2014

    Publisher: 日本音響学会

    ISSN: 1880-7658

  60. Applying singing voice analysis to entertainment : from music information retrieval to karaoke

    1033-1036 2014

    Publisher: 日本音響学会

    ISSN: 1880-7658

  61. LRFによる人追従を考慮した障害物回避手法の提案

    坂井奎亮, 廣井富, 伊藤彰則

    日本ロボット学会学術講演会予稿集(CD-ROM) 32nd 2014

  62. 手すり上を移動するコミュニケーションロボットの開発-伸びる手を用いた道案内の評価-

    藤原祐磨, 廣井富, 鈴木直人, 伊藤彰則

    日本ロボット学会学術講演会予稿集(CD-ROM) 32nd 2014

  63. 英会話学習システムにおけるCGキャラクタの効果と学習者の発話タイミング制御のための付加表現に関する検討

    鈴木直人, 廣井富, 藤原祐磨, 千葉祐弥, 能勢隆, 伊藤彰則

    日本音響学会研究発表会講演論文集(CD-ROM) 2014 2014

    ISSN: 1880-7658

  64. ARキャラクタとの英会話練習時における交替潜時のタイムプレッシャーによる制御

    鈴木直人, 廣井富, 藤原祐磨, 黒田尚孝, 戸塚典子, 千葉祐弥, 能勢隆, 伊藤彰則

    日本音響学会研究発表会講演論文集(CD-ROM) 2014 2014

    ISSN: 1880-7658

  65. 指差しと音声対話併用による床面上の物体回収手法の提案

    二上啓大, 廣井富, 黒田尚孝, 鈴木直人, 伊藤彰則

    日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM) 2014 2014

    ISSN: 2424-3124

  66. LRFを用いた人追従時の移動軌跡の記録と軌道追従に関する基礎的検討

    坂井奎亮, 廣井富, 伊藤彰則

    日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM) 2014 2014

    ISSN: 2424-3124

  67. 手すり上を移動するコミュニケーションロボットの開発-伸びる手を用いた道案内の提案-

    藤原祐磨, 廣井富, 川崎成人, 黒田尚孝, 鈴木直人, 伊藤彰則

    日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM) 2014 2014

    ISSN: 2424-3124

  68. 日常生活支援移動ロボットASAHI2013の開発

    廣井富, 坂井奎亮, 二上啓大, 藤原祐磨, 伊藤彰則

    日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM) 2014 2014

    ISSN: 2424-3124

  69. 音声操作ロボットの意図せぬ動作に対するユーザ発話のパラ言語的特徴に関する分析(音声対話,第15回音声言語シンポジウム)

    戸塚 典子, 伊藤 彰則

    電子情報通信学会技術研究報告. SP, 音声 113 (366) 59-64 2013/12/12

    Publisher: 一般社団法人電子情報通信学会

    More details Close

    音声操作インタフェースがロボットなど移動機構を持つ機器に搭載されると,ユーザが音声によってロボットをリアルタイムで操作するという状況が考えられる.しかしこのような場合,ユーザの言い間違えやシステムの誤認識などによってロボットがユーザの意図せぬ動作をする可能性がある.我々は,そのような動作が発生した際に迅速に修正する手法として,ユーザの意図せぬロボット動作が発生した際のユーザ発話のパラ言語的特徴に着目し,これらをロボットの制御に応用することを提案する.本研究では,被験者実験によって実際にロボットを操作している音声を収集し,ロボットがユーザの意図通りに動作している時とそうでない時で発話速度,基本周波数(FO),インテンシティに変化が表れるかどうか分析を行った.

  70. 音声操作ロボットの意図せぬ動作に対するユーザ発話のパラ言語的特徴に関する分析

    戸塚典子, 伊藤彰則

    研究報告音声言語情報処理(SLP) 2013 (10) 1-6 2013/12/12

    Publisher: 一般社団法人情報処理学会

    ISSN: 0913-5685

    More details Close

    音声操作インタフェースがロボットなど移動機構を持つ機器に搭載されると,ユーザが音声によってロボットをリアルタイムで操作するという状況が考えられる.しかしこのような場合,ユーザの言い間違えやシステムの誤認識などによってロボットがユーザの意図せぬ動作をする可能性がある.我々は,そのような動作が発生した際に迅速に修正する手法として,ユーザの意図せぬロボット動作が発生した際のユーザ発話のパラ言語的特徴に着目し,これらをロボットの制御に応用することを提案する.本研究では,被験者実験によって実際にロボットを操作している音声を収集し,ロボットがユーザの意図通りに動作している時とそうでない時で発話速度,基本周波数 (F0),インテンシティに変化が表れるかどうか分析を行った.

  71. ARキャラクタとの英会話練習時における交替潜時のタイムプレッシャーによる制御

    鈴木直人, 廣井富, 藤原祐磨, 黒田尚孝, 戸塚典子, 千葉祐弥, 伊藤彰則

    研究報告音声言語情報処理(SLP) 2013 (9) 1-6 2013/12/12

    Publisher: 一般社団法人情報処理学会

    More details Close

    英会話練習をする際は対話相手が必要であり,相手との会話がテンポ良く行えるようになる練習が求められる.CALL (Computer-Assited Language Learning) システムにおいて,学習者の応答のタイミングを向上させるような枠組みは無いのが現状である.英会話練習の際には発話内容を想起し,それを英語で表現する 2 重の認知的負荷がかかるため,交代潜時が長くなりがちであるが,対話の最初から意識的に交代潜時を短くしていくためには学習者に対して明示的な方法を用いるべきである.そこで本研究では対話相手として AR (Augmented Reality) キャラクタを設定し,タイムプレッシャー表現をかけたときに応答タイミングの練習として有効であるかどうかを実験により検証することを試みた.実験参加者にはタイムプレッシャーの有無で 2 通りの対話を行い,最後に主観評価のアンケートを行った.本稿では以上の結果と主観評価を踏まえた考察を報告する.

  72. A study of the user's state's estimation by using multi-modal information of the local segment

    CHIBA Yuya, ITO Masashi, ITO Akinori

    IEICE technical report. Speech 113 (220) 27-32 2013/09/18

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    Most conventional researches of spoken dialog system have focused on natural language processing because the dialog system decide the response by processing the speech recognition result of the user's utterance. However, the user sometimes upset by the system's interface and cannot make any input utterance under the actual environment. The system should consider the user's state before his/her input utterance ignored by conventional researches to help these users appropriately. To solve this problem, we have decided the two user's states and studied the method to estimate them. The previous experimental analysis of human evaluation suggested these user's internal states can be estimated by observing some user's non-verbal behavior. From this results, we proposed the estimation method by using multi-modal features in this report. The proposed method clusters the feature sequences and uses them as Bag-of-Words. We confirmed the proposed method obtains over 70.0 % accuracy.

  73. An acoustical analysis for mixed speech signals using a complex window function

    43 (6) 473-478 2013/08/09

    Publisher: 日本音響学会

    ISSN: 1346-1109

  74. An acoustical analysis for mixed speech signals using a complex window function

    ITO Masashi, ITO Akinori

    Technical report of IEICE. EA 113 (177) 1-6 2013/08/09

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    A sinusoidal representation of voiced speech is one of the promising methods for speech analysis and synthesis, which approximates the input signal to a sum of sinusoidal components of which frequency and amplitude continuously vary with time. The difficulty in estimating sinusoidal parameters from the input can be classified into two types: one is a spectral distortion induced by non-stationarity in the signal, while the other is an interferences among neighboring components in the spectrum. To overcome the difficulties, a new analysis method is proposed which integrates the local vector transform and complex analysis window. The result of the experiment, in which sinusoidal parameters for single speech or tone of musical instrument were estimated, supported effectiveness of the proposed method. Further, the method could provide important basis in analyzing the mixture of these signals.

  75. Noise reduction based on fragmentary measurement of environmental noise

    MACHIDA Kohei, ITO Akinori

    IEICE technical report. Speech 113 (161) 1-6 2013/07/25

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    We propose a speech recognition method under noisy environments using multiple microphones based on asynchronous and intermittent observations. In this method, microphones placed at various locations in the room sometimes observe sounds, and clustering by GMM is performed to model the noise in the environment. Each of the clustered noise spectrum is subtracted from the input signal, and then the noise-reduced signals are decoded in parallel. Then, the final recognition result is determined by integrating all of the recognition results.

  76. Analysis of Acoustic Feature of Command Speech Towards a Mobile Robot Under the Robot's Unintended Behavior

    TOTSUKA Noriko, ITO Akinori

    IEICE technical report. Speech 113 (161) 57-62 2013/07/25

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    In recent years, many devices have been developed that have a speech-based interface. The speech command may be used for operating a device such as a mobile robot that moves in real time. In this case, the robot might behave in an unintended way because of mis-recognitions or wrong commands. When the robot moves against the users' intention, the behavior should be corrected quickly. But how the robot knows that the behavior does not conform the operator's intention? We are investigating a possibility to use acoustic features of user's utterances to estimate whether the robot's behavior comply the user's intention. To this end, we collected utterances of operating robots by the operator's voice, and analyzed the acoustic features. In this paper, we show the result of the analysis of four features: speaking rate, fundamental frequency (F0), intensity, and speaking interval. As a result, we found severally that speaking rate and speaking interval tended to be faster and shorter when the robot is behaving against users intention, but we did not any differences for F0 and intensity.

  77. A Task Development Experiment for the Multi-task Spoken Dialog System based on QA Database

    MIYAKE Shinji, ITO Akinori

    IEICE technical report. Speech 113 (161) 31-36 2013/07/25

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    We are developing a spoken dialog system for daily life support task such as smart home or human symbiotic robot. This system exploits a dialog control strategy based on a Q-A database, which makes it easier for a developers to develop a new task description. Even a novice developercan create a task because the task creation procedure is just making a list of supposed user utterances as well as responses to that utterances. Moreover, the system can treat multiple tasks by just merging multiple task descriptions in parallel. In this paper, we conduct an experiment to confirm if novice developers can really create task descriptions that is as good as that by the experienced developer. As a result, the created task description by novice developers were similar to that by the experienced developers, and the impressions of the system user were similar for both task descriptions. However, the task completion rate and the discrimination rate of ambiguous utterances were higher for the task description by the experienced developer than that by the novice developers.

  78. Consideration of the relation between auditory impression and acoustic features of death growl and scream singing voice

    Kato KEIZO, Ito AKINORI

    IEICE technical report. Speech 112 (422) 43-48 2013/01/30

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    In the contemporary music scene, death-growl and scream singing style are often used in extreme metal, and have been one of the indispensable singing style. In this study, we attempt to clarify the essential acoustic feature of death-growl and scream singing voice, by considering relationship between auditory impression and acoustic feature.

  79. Multi-modal Information Processing by Embedding Image Features into Speech Signal

    ABE Yohei, ITO Akinori

    112 (420) 1-5 2013/01/29

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    Lip movement has a close relationship with speech because lip moves when we talk. The idea of this work is to extract the lip movement feature from the facial video and embed the movement feature into speech signal using data hiding technique. In this paper, we show the basic framework of the method and apply the proposal method to multi-modal voice activity detection (VAD). As a result of detection experiment using SVM, we obtained higher accuracy than audio-only VAD in noisy environment. In addition we investigated effects of embedding data into speech signal on sound quality and detection accuracy.

  80. 対話中のユーザ状態逐次推定のための多段階識別手法に関する検討

    千葉祐弥, 伊藤仁, 伊藤彰則

    研究報告ヒューマンコンピュータインタラクション(HCI) 2013 (21) 1-6 2013/01/25

    More details Close

    従来の音声対話システムは,ユーザが入力した発話の音声認識結果を基準として処理を行うため,ユーザの入力を待機している間にユーザの状態を推定することはできなかった.しかしながら,実環境下においては,ユーザがシステムのプロンプトに戸惑ってしまうなどで,入力を行うことができないという状況が度々起こる.こういったユーザに対して適切な応答を行うためには,従来の音声対話システムでは無視されていた 「発話を行う前のユーザ状態」 を考慮する必要がある.我々は,発話前のユーザ状態を 2 種類定義し,その推定手法について研究を行ってきた.ここまでの分析結果から,マルチモーダル情報を用いることで対象とするユーザの状態がある程度推定できることを結論づけた.この結果を踏まえ,本報告では動画像と音声から得られる情報を統合し,逐次的にユーザの状態を推定する手法について検討を行う.

  81. 対話中のユーザ状態逐次推定のための多段階識別手法に関する検討

    千葉祐弥, 伊藤仁, 伊藤彰則

    研究報告音声言語情報処理(SLP) 2013 (21) 1-6 2013/01/25

    More details Close

    従来の音声対話システムは,ユーザが入力した発話の音声認識結果を基準として処理を行うため,ユーザの入力を待機している間にユーザの状態を推定することはできなかった.しかしながら,実環境下においては,ユーザがシステムのプロンプトに戸惑ってしまうなどで,入力を行うことができないという状況が度々起こる.こういったユーザに対して適切な応答を行うためには,従来の音声対話システムでは無視されていた 「発話を行う前のユーザ状態」 を考慮する必要がある.我々は,発話前のユーザ状態を 2 種類定義し,その推定手法について研究を行ってきた.ここまでの分析結果から,マルチモーダル情報を用いることで対象とするユーザの状態がある程度推定できることを結論づけた.この結果を踏まえ,本報告では動画像と音声から得られる情報を統合し,逐次的にユーザの状態を推定する手法について検討を行う.

  82. 手すりを移動するコミュニケーションロボット-道案内方法の比較-

    廣井富, 黒田尚孝, 藤原祐磨, 戸塚典子, 伊藤彰則

    日本ロボット学会学術講演会予稿集(CD-ROM) 31st 2013

  83. ロボットアバタを用いた指差し行為の実装-人間による指差し認識の調査-

    黒田尚孝, 廣井富, 伊藤彰則

    日本ロボット学会学術講演会予稿集(CD-ROM) 31st 2013

  84. ARキャラクタを用いた音声対話による英会話学習システムの検討-タイムプレッシャー導入の効果-

    鈴木直人, 廣井富, 藤原祐磨, 黒田尚孝, 戸塚典子, 千葉祐弥, 伊藤彰則

    日本バーチャルリアリティ学会大会論文集(CD-ROM) 18th 2013

    ISSN: 1349-5062

  85. ARキャラクタとの英会話練習時における交替潜時のタイムプレッシャーによる制御

    鈴木直人, 廣井富, 藤原祐磨, 黒田尚孝, 戸塚典子, 千葉祐弥, 伊藤彰則

    電子情報通信学会技術研究報告 113 (366(SP2013 82-95)) 2013

    ISSN: 0913-5685

  86. 対話ターン中のユーザ状態の推定に有用なモダリティの分析 (音声・第14回音声言語シンポジウム)

    千葉 祐弥, 伊藤 仁, 伊藤 彰則

    電子情報通信学会技術研究報告 : 信学技報 112 (369) 35-40 2012/12/20

    Publisher: 一般社団法人電子情報通信学会

    ISSN: 0913-5685

    More details Close

    従来の音声対話システムは,ユーザが入力した発話を基準として処理を決定しているため,入力を待機している間にユーザの状態を推定することはできない.しかしながら,実環境下においてはユーザがシステムのプロンプトに戸惑ってしまい,入力をすることができない状況が度々起こる.このような場合,一定時間おきに同一内容のプロンプトを提示することが一般的であるが,この補助は入力内容を考えているユーザにとっては非常にわずらわしいものである.これらのユーザに対して適切な応答を行うためには,発話を行う前のユーザ状態を推定できる必要がある.以前行なっていた検討では,様々な影響を切り分けた分析を行わずに自動推定を試みていたため,どの情報がユーザの状態の推定に必要なのかが不明瞭であった.そこで,本稿ではあらためてデータの収集と被験者による評価実験を行い,より詳しい分析を行った.

  87. トピック関連語推定とSTDによる未知語推定の評価 (音声・第14回音声言語シンポジウム)

    佐藤 壮一, 伊藤 彰則

    電子情報通信学会技術研究報告 : 信学技報 112 (369) 143-147 2012/12/20

    Publisher: 一般社団法人電子情報通信学会

    ISSN: 0913-5685

    More details Close

    本稿では,音声認識結果から関連する単語を推定するトピック関連語推定と,発話中にある単語が含まれているかどうかを見る検索語検出(SpokenTermDetection:STD)を用いて,音声認識における未知語を推定した.トピック関連語推定のみを用いた場合,STDのみを用いた場合,両方を用いた場合について,それぞれ比較し検討を行った.その結果,両方を用いた場合に推定語数が多い状況で,トピック関連語推定のみの場合に推定語数が少ない状況で最も良い再現率を得られることがわかった.また,トピック関連語推定の再現率が高い状態でSTDを利用することで,トピック関連語推定のみの場合よりも高い適合率を得ることができることもわかった.

  88. 対話ターン中のユーザ状態の推定に有用なモダリティの分析

    千葉祐弥, 伊藤仁, 伊藤彰則

    研究報告音声言語情報処理(SLP) 2012 (7) 1-6 2012/12/13

    More details Close

    従来の音声対話システムは,ユーザが入力した発話を基準として処理を決定しているため,入力を待機している間にユーザの状態を推定することはできない.しかしながら,実環境下においてはユーザがシステムのプロンプトに戸惑ってしまい,入力をすることができない状況が度々起こる.このような場合,一定時間おきに同一内容のプロンプトを提示することが一般的であるが,この補助は入力内容を考えているユーザにとっては非常にわずらわしいものである.これらのユーザに対して適切な応答を行うためには,発話を行う前のユーザ状態を推定できる必要がある.以前行なっていた検討では,様々な影響を切り分けた分析を行わずに自動推定を試みていたため,どの情報がユーザの状態の推定に必要なのかが不明瞭であった.そこで,本稿ではあらためてデータの収集と被験者による評価実験を行い,より詳しい分析を行った.

  89. Enrichment of audio signal using side information

    ITO Akinori

    Technical report of IEICE. EA 112 (292) 87-92 2012/11/09

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    This paper describes methods that add values to audio signals using side information. There have been many acoustic signal processing methods have been proposed for estimating the lost information from the original signal. Using the appropriate side information, we can enhance the estimation easily. In this paper, the principle of the audio signal processing using side information is described first, and then three applications are described: packet loss concealment of audio signal, manipulation of mixed music signal and frequency band extension of telephone speech.

  90. The Available Telecommunications Services at Serious Disaster

    SHOJI Sadao, AOKI Takafumi, ITO Akinori, OMACHI Shinichiro, ITO Koichi

    IEICE technical report 112 (208) 71-72 2012/09/13

    Publisher: The Institute of Electronics, Information and Communication Engineers

    More details Close

    Hitachi East Japan Solutions, Ltd. And Tohoku University study the Available Telecommunications Services and Security and Information Sharing in case of Overcrowding of Mobile Communications Network at Serious Disaster.

  91. Estimation of a User's Internal State before the First Input Utterance Using HMM with Non-verbal Information

    CHIBA Yuya, ITO Masashi, ITO Akinori

    Technical report of IEICE. PRMU 111 (430) 7-12 2012/02/02

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    This paper describes a method for estimating the internal state of the user of a spoken dialog system before his input utterance. In the practical use of dialogue-based system, the user often perplexed with the prompt. An ordinary system provides more detailed information to the user taking time to input, but these helps are meddlesome for the user considering the answer to the prompt. To make appropriate response, the spoken dialogue system have to be able to consider the user's internal state before user's input. The conventional researches on user modeling have focused on the linguistic information of the utterance. One problem of these approaches is that it cannot estimate the user's states until the end of the user's first utterance. Therefor, our study focused on the user's non-verbal output such as fillers, silence or head-moving until the occurrence of user's input utterance. This paper describes the method of the user modeling by HMM. We conducted the examination discrimination, and obtained the accuracy of 79.6%.

  92. The SEES; Singing Enthusiasm Evaluation System for Amateur Singing Entertainment

    Ryunosuke Daido, Masashi Ito, Shozo Makino, Akinori Ito

    IPSJ SIG Notes 2012 (2) 1-7 2012/01/27

    Publisher: Information Processing Society of Japan (IPSJ)

    More details Close

    The goal of our research is to develop a system for evaluating singing enthusiasm. As evaluation systems for karaoke represent, many researchers have worked on automatic evaluation methods of singing voice to make additional value on amateur singing entertainment. However most of the researches try to evaluate only singing skill. In our research, the point of interest is not singing skill but singing enthusiasm. We describe in this paper our attempt to develop an automatic evaluation system of singing enthusiasm through analyses of principles on human perception of it. Moreover we propose a new style of amateur singing entertainment with our system.

  93. Acoustic analysis towards extreme voice synthesis of death growl and scream singing voices

    Keizo Kato, Akinori Ito

    IPSJ SIG Notes 2012 (14) 1-6 2012/01/27

    Publisher: Information Processing Society of Japan (IPSJ)

    More details Close

    In this study, we analized acoustic feauture of growl and scream singing voices used in extream metal music, such as death metal, metal core, and so on. We observed sub-harmonics and macro pulse structures those are reported as accoustic features of rough voice. We also measured jitter, shimmer, and HNR values.

  94. patissier-A Lyrics Writing Support System for Amateur Lyricists-

    Chihiro Abe, Akinori Ito

    IPSJ SIG Notes 2012 (17) 1-6 2012/01/27

    Publisher: Information Processing Society of Japan (IPSJ)

    More details Close

    In this paper, we propose a lyrics writing support system focused on the number of syllables, rhyme and word accent. The system generates candidate sentences that satisfy user-specified conditions based on Ngram, and presents them. Users can use the system like a dictionary, and write lyrics be choosing presented sentences. In our subjective evaluations, we have investigated how the system is utilized for writing lyrics actually. The log of using the system and the questionnaires showed that users want the system to present words suitable for their images, and they used the presented words as keywords of a lyrics rather than as they are.

  95. 手すりを移動するコミュニケーションロボット-全体コンセプト-

    廣井富, 内田裕二, 西村駿宏, 中山貴之, 黒田尚孝, 三宅真司, 戸塚典子, 伊藤彰則

    ヒューマンインタフェースシンポジウム論文集(CD-ROM) 2012 2012

    ISSN: 1345-0794

  96. ロボットアバタを用いた指差し行為の実現-ロボットアバタへの実装-

    黒田尚孝, 廣井富, 三宅真司, 伊藤彰則

    日本感性工学会大会予稿集(CD-ROM) 14th 2012

  97. ロボットアバタを用いた指差し行為の移動ロボットへの実装

    黒田尚孝, 廣井富, 三宅真司, 伊藤彰則

    日本ロボット学会学術講演会予稿集(CD-ROM) 30th 2012

  98. Detection of Utterances that Need Clarification Using a Question and Answer Database

    三宅真司, 廣井富, 伊藤彰則

    情報処理学会研究報告(CD-ROM) 2012 (2) 2012

    ISSN: 2186-2583

  99. Controlling the start time of human utterance by behavior of the robot

    中山貴之, 廣井富, 黒田尚孝, 三宅真司, 伊藤彰則

    情報処理学会研究報告(CD-ROM) 2012 (2) 2012

    ISSN: 2186-2583

  100. 日常生活支援移動ロボットASAHIの開発-全体構想とハードウェア構成-

    廣井富, 黒田尚孝, 内藤圭祐, 高田晶太, 松井一馬, 井上駿, 林和孝, 中山貴之, 松中翔平, 伊藤彰則

    日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM) 2012 2012

    ISSN: 2424-3124

  101. 一つのLRFを用いた人追跡に関する一考察

    松中翔平, 廣井富, 内藤圭祐, 井上駿, 伊藤彰則

    日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM) 2012 2012

    ISSN: 2424-3124

  102. ロボットアバタを用いた指差し行為の実現-基本コンセプトと予備実験-

    黒田尚孝, 廣井富, 松井一馬, 三宅真司, 伊藤彰則

    日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM) 2012 2012

    ISSN: 2424-3124

  103. A Comparison of Side Information Expressions Incorporating Background Music Signals for Manipulating Mixed Music Sounds

    SASAKI Yuto, HAHM Seong-Jun, ITO Akinori

    111 (287) 47-52 2011/11/14

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    In this paper, we propose a method for manipulating vocal sound in mixed music signals using side information. In the proposed method, fundamental frequency (pitch) of a vocal sound signal and the backing sound information are used as side information. After receiving the mixed music signal, vocal sound manipulation is performed using a comb filter with harmonic structure using pitch information. The performance was evaluated using signal-to-noise ratio (SNR). We designed three filters using different backing sound information, and compared those filters.

  104. Crisis Responses to the Great East Japan Earthquake : 12. Emergency Activity for Information Systems of the Graduate School of Engineering, Tohoku University, under the Great East Japan Earthquake

    ITO A.

    IPSJ MAGAZINE 52 (9) 1084-1085 2011/08/15

  105. A Lyrics Writing Support System Using a Statistical Language Model

    2011 (9) 1-6 2011/07/20

  106. Discrimination of User's Internal State using Non-verbal Information before the First User Utterance

    CHIBA Yuya, HAHM Seongjun, ITO Akinori

    IEICE technical report 111 (153) 23-28 2011/07/14

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    A dialogue system is expected to be able to make flexible responses to various user behavior. Because speech-based interface can be used hands free and without training, these requirement should be crucial. Although there have been a number of conventional researches for making adapted responses based on linguistic information of user's input, there have been few attempts for deciding system's dialogue strategy before the user making the first input utterance. In this research, we focus on non-verbal information of the user in order to build a system that can help users before the input utterance. Here, we investigate the length of the non-linguistic utterances like filler or silence and 3 angles of face orientation. Finally, we conducted an experiment for discrimination by SVM.

  107. 移動ロボット減速時におけるロボットアバタを用いた動作予告法の実装と評価

    中山貴之, 廣井富, 伊藤彰則

    日本ロボット学会学術講演会予稿集(CD-ROM) 29th 2011

  108. 10日間で作るロボット音声対話システム

    三宅真司, 廣井富, 伊藤彰則

    ヒューマンインタフェースシンポジウム論文集(CD-ROM) 2011 2011

    ISSN: 1345-0794

  109. Subjective evaluation of a robot: a real body or augmented reality?

    廣井富, 伊藤彰則

    電子情報通信学会技術研究報告 110 (459(HCS2010 56-69)) 2011

    ISSN: 0913-5685

  110. ロボットアバタを用いた日常生活支援ロボットの親しみ感の向上-非ヒューマノイド型ロボットへの適用-

    廣井富, 伊藤彰則

    日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM) 2011 2011

    ISSN: 2424-3124

  111. 日常生活支援移動ロボットGoyaneの開発-高さ変更可能な機構の提案-

    廣井富, 篠原達也, 兼次一喜, 岩本昂, 中山貴之, 伊藤彰則

    日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM) 2011 2011

    ISSN: 2424-3124

  112. Modeling expansion using Web for spoken document retrieval based on probabilistic language model

    IEICE technical report 110 (357) 109-114 2010/12/20

    Publisher: 電子情報通信学会

    ISSN: 0913-5685

  113. Modeling Expansion using Web for Spoken Document Retrieval based on Probabilistic Language Model

    MASUMURA RYO, HAHM SEONGJUN, ITO AKINORI

    2010 (20) 1-6 2010/12/13

    Publisher: 情報処理学会

    ISSN: 0919-6072

  114. An abnormal sound detection method using multi-stage GMM for surveillance microphone

    ITO Akinori, AIBA Akihito, ITO Masashi, MAKINO Shozo

    IEICE technical report 110 (220) 1-6 2010/10/01

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    We have been developing a method for detecting abnormal sound events from audio signal recorded in real environments, which uses the multi-stage Gaussian Mixture Models (GMM) that learns rare sounds using multiple GMMs. In this paper, we investigate relationship between sound environment and detection performance, and we found that the performance deteriorates in noisy environments. The performance largely depended on SN ratio of the abnormal sounds. Next, we investigated methods for determining hyperparameters of the multi-stage GMM, which involves intermediate thresholds, numbers of mixture of GMMs and the detection threshold. From the experimental results, combination of Percentile-based threshold determination and Bayesian information criterion (BIC)-based mixture determination was most effective. However, when using the automatically-determined parameters, the detection performance deteriorated around 20%.

  115. Sinusoidal Modeling for Voiced Speech Signals Based on Local Vector Transform and Time-Warping

    ITO Masashi, ITO Akinori

    The IEICE transactions on information and systems 93 (9) 1745-1754 2010/09/01

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 1880-4535

  116. Topic Expression of Words using Web Documents for Unsupervised Language Model Adaptation

    2010 (18) 1-6 2010/07/15

    Publisher: 情報処理学会

    ISSN: 1884-0930

  117. Lecture Speech Recognition Based on Word Graph Combination by Using Quinphone HM-Net

    KATO Masaharu, KOSAKA Tetsuo, ITO Akinori, MAKINO Shozo

    IEICE technical report 110 (81) 37-42 2010/06/10

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    While high recognition performance has been achieved for read speech, rather poor performance has been reported for spontaneous speech recognition because it has various problems, such as hesitations, filled pauses, unclear pronunciation, and so on. In particular, acoustic variation caused by coarticulation has become a serious problem. In order to solve the problem, context-dependent models such as triphone or quinphone are used for recognition. However, the strength of coarticulatory effect varies widely in spontaneous speech. In this study, we attempt to improve the recognition performance by using a technique of word graph combination in which various acoustic models are combined.

  118. Measuring "enthusiasm" of singing voice

    DAIDO RYUNOSUKE, ITO MASASHI, ITO AKINORI, MAKINO SHOZO

    2010 (10) 1-6 2010/05/20

    Publisher: 情報処理学会

    ISSN: 0919-6072

  119. Towards development of practical life-support robots

    廣井富, 伊藤彰則

    電子情報通信学会技術研究報告 109 (457(HCS2009 64-88)) 2010

    ISSN: 0913-5685

  120. 拡張現実感を用いた日常生活支援移動ロボットへの位置の指示方法の提案

    去来川勇樹, 廣井富, 榊洋祐, 二神龍平, 中山貴之, 伊藤彰則

    バイオメカニズム学術講演会予稿集 31st 2010

  121. 日常生活支援移動ロボットGoyaneの開発

    廣井富, 後藤基允, 山本祐三, 山根佑介, 稲田遥一, 大原達哉, 木村昭太, 久野修平, 伊藤彰則

    日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM) 2010 2010

    ISSN: 2424-3124

  122. 日常生活支援移動ロボットのためのロボットアバタを用いた動作予告法の比較

    廣井富, 大原達哉, 木村昭太, 久野修平, 伊藤彰則

    日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM) 2010 2010

    ISSN: 2424-3124

  123. 音声認識における言語モデル

    Akinori Ito

    The Journal of the Acoustical Society of Japan 66 (1) 32-35 2010/01

    DOI: 10.20697/jasj.66.1_32  

  124. Utterance discrimination for dialog control on multi-task spoken dialog system

    Awano Kentaro, Ito Masashi, Ito Akinori, Makino Shozo

    IEICE technical report 109 (355) 37-42 2009/12/21

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

  125. Evaluation of unsupervised language model adaptation based on topic-related word estimation using WWW

    Masumura Ryo, Ito Masashi, Ito Akinori, Makino Shozo

    IEICE technical report 109 (355) 183-188 2009/12/21

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

  126. Evaluation of Unsupervised Language Model Adaptation based on Topic-related Word Estimation using WWW

    MASUMURA RYO, ITO MASASHI, ITO AKINORI, MAKINO SHOZO

    2009 (32) 1-6 2009/12/14

    Publisher: 情報処理学会

    ISSN: 0919-6072

  127. Utterance Discrimination for dialog control on Multi-task Spoken Dialog System

    AWANO KENTARO, ITO MASASHI, ITO AKINORI, MAKINO SHOZO

    2009 (7) 1-6 2009/12/14

    Publisher: 情報処理学会

    ISSN: 0919-6072

  128. Bit Rate Reduction of Vocoder-Type Speech Coder by Reducing Temporal Redundancy

    KOHATA Minoru, SUZUKI Motoyuki, ITO Akinori, MAKINO Syouzou

    IEICE technical report 109 (308) 7-12 2009/11/19

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    We formerly proposed a new segment quantization method named LZSQ, which is a modified version of the LZ coding, so that it can be applied to a continuous information source. In this paper, LZSQ is applied to a vocoder-type speech coder for reducing its bit rate, by removing temporal redundancy in coding parameters, while preserving the quality of coded speech. In this paper, LZSQ is applied to six coding parameters of the MELP coder, which is one of the standardized vocoder-type speech coders operating at 2.4kbit/s, to reduce its bit rate as lower as possible. As the result, the total bit rate was reduced to about 1.57kbit/s.

  129. この曲、何だっけ? 歌で音楽を探す「歌声検索」

    伊藤彰則, 鈴木基之, 牧野正三

    DTM Magazine 16 (11) 100-101 2009/11

    Publisher: 寺島情報企画

  130. An algorithm for fast calculation of back-off n-gram probabilities with unigram rescaling

    Kato, M., Kosaka, T., Ito, A., Makino, S.

    IAENG International Journal of Computer Science 36 (4) 2009/11/01

    ISSN: 1819-656X

  131. RE-005 Sinusoidal Modeling for Voiced Speech Based on a Local Vector Transform

    Ito Masashi, Ito Akinori

    8 (2) 43-48 2009/08/20

    Publisher: Forum on Information Technology

  132. Detection of abnormal sound using multi-stage GMM and segment model

    39 (5) 401-405 2009/08/03

    Publisher: 日本音響学会聴覚研究委員会

    ISSN: 1346-1109

  133. A study on objective evaluation of MP3 packet loss concealment

    39 (5) 367-372 2009/08/03

    Publisher: 日本音響学会聴覚研究委員会

    ISSN: 1346-1109

  134. A Study on Objective Evaluation of MP3 Packet Loss Concealment

    KONNO Kiyoshi, ITO Masashi, ITO Akinori, MAKINO Shozo

    IEICE technical report 109 (166) 37-42 2009/07/27

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    In this paper, we analyze objective evaluation of MP3 audio with packet loss concealment. As an objective evaluation for wideband audio, PEAQ is recommended as ITU-R BS.1387. However, PEAQ is not designed to evaluate audio with packet losses, and its accuracy is not sufficient. So we applied multiple linear regression analysis using PEAQ's Model Output Variables. In addition, we improved correlation by taking variance of subband SNR into account, which may reflects degradation in a specific frequency band. As a result of cross-validation, mean of correlation was about 0.84.

  135. Detection of Abnormal Sound Using Multi-stage GMM and Segment Model

    AIBA Akihito, ITO Masashi, ITO Akinori, MAKINO Shozo

    IEICE technical report 109 (166) 71-75 2009/07/27

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    We propose an abnormal sound detection system for surveillance microphone. The system uses models for normal sounds produced actually in surveillance place and detects sounds not learned as abnormal sounds. Therefore, the system doesn't limit detection targets to particular events and can cope with any abnormal events. The detection performance of the proposed system was examined for actual environmental sounds. The performance was improved by a multi-stage GMM that models normal sounds that rarely occur. Furthermore, we examined incorporating a dynamic variation of acoustic features by segment features.

  136. Panel Discussion Featuring Newly Honored Doctors (III) "Research for me, research for new values"

    ITO Akinori, ANDO Daichi, LE ROUX Jonathan, NAKANO Tomoyasu, YOSHII Kazuyoshi

    2009 (7) 1-5 2009/07/22

    Publisher: 情報処理学会

    ISSN: 0919-6072

  137. Utterance Discrimination for using Multiple Spoken Dialog Systems

    AWANO KENTARO, ITO MASASHI, ITO AKINORI, MAKINO SHOZO

    2009 (15) 1-6 2009/05/14

    Publisher: 情報処理学会

    ISSN: 0919-6072

  138. Utterance Discrimination for using Multiple Spoken Dialog Systems

    2009 (15) 1-6 2009/05/14

    Publisher: 情報処理学会

    ISSN: 1884-0930

  139. Composition Search Query for Language Model Adaptation using WWW

    MASUMURA RYO, ITO MASASHI, ITO AKINORI, MAKINO SHOZO

    2009 (10) 1-8 2009/05/14

    Publisher: 情報処理学会

    ISSN: 0919-6072

  140. Music Information Retrieval using database with multiple F0 candidates

    KOSUGI YU, ITO MASASHI, ITO AKINORI, MAKINO SHOZO

    2009 (6) 1-6 2009/05/14

    Publisher: 情報処理学会

    ISSN: 0919-6072

  141. Adaptive Multiple Description Coding for Flash Video based on Bitstream Pattern Reconstruction

    KURAISHI Takuya, ITO Masashi, ITO Akinori, MAKINO Shozo

    71 275-276 2009/03/10

  142. Database generation from acoustic signal for music infomation retrieval system with Query-by-Humming

    KOSUGI Yu, ITO Masashi, ITO Akinori, MAKINO Shozo

    71 237-238 2009/03/10

  143. DS-3-8 Bit-rate control of payload by information hiding based on ADPCM

    HANDA Hironori, ITO Akinori, SUZUKI Yoiti

    Proceedings of the IEICE General Conference 2009 (2) "S-33"-"S-34" 2009/03/04

    Publisher: The Institute of Electronics, Information and Communication Engineers

  144. Implementation of preliminary-announcement for a life-support mobile robot using a robot avatar

    廣井富, 後藤基允, 山本祐三, 大原達哉, 木村昭太, 伊藤彰則

    日本ロボット学会学術講演会予稿集(CD-ROM) 27th 2009

  145. Novel tonal feature and statistical user modeling for query-by-humming

    Motoyuki Suzuki, Takuto Ichikawa, Akinori Ito, Shozo Makino

    Journal of Information Processing 17 95-105 2009

    Publisher: Information Processing Society of Japan

    DOI: 10.2197/ipsjjip.17.95  

    ISSN: 1882-6652 0387-5806

  146. Evaluation of English Intonation based on Combination of Multiple Evaluation Scores

    Akinori Ito, Tomoaki Konno, Masashi Ito, Shozo Makino

    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 596-599 2009

  147. Relative importance of formant and whole-spectral cues for vowel perception

    Masashi Ito, Keiji Ohara, Akinori Ito, Masafumi Yano

    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 132-+ 2009

  148. Detailed description of triphone model using SSS-free algorithm

    Motoyuki Suzuki, Daisuke Honma, Akinori Ito, Shozo Makino

    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 1403-+ 2009

  149. Multiple description coding of flash video based on adaptive allocation of DCT coefficients Peer-reviewed

    Akinori Ito, Takuya Kuraishi, Masashi Ito, Shozo Makino

    APSIPA ASC 2009 - Asia-Pacific Signal and Information Processing Association 2009 Annual Summit and Conference 453-456 2009

  150. Evaluation of annealing schadule for PLSA language model adaptaion

    KATO Masaharu, KOSAKA Tetsuo, ITO Akinori, MAKINO Shozo

    IPSJ SIG Notes 2008 (123) 49-53 2008/12/02

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    Probabilistic Latent Semantic Analysis (PLSA) is a powerful statistical laguage model. However the PLSA has the local maxima problem. To overcame this problem, the EM annealing algorithm has been proposed. In this paper, we designed annealing schedule β with some continuous functions. As a result, we found that increasing functions and square root functions are the best for annealing schedule. In the experiment, we obtain 28.7% perplexity reduction and 5.3% word error rate reduction.

  151. Estimation of Spoken Dialog System using Automatically-generated question-and-answer database

    MORIMOTO Takahiro, ITO Masashi, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    IEICE technical report 108 (337) 267-272 2008/12/02

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    A question-and-answer style spoken dialog system based on example-based answer generation is known to be robust against variation of user utterances. However, it is costly to create QA database for a new task. In this paper, we proposed a method to reduce cost of preparing the database by generating the database automatically from templates. As a result, we obtained almost same performance using the automatically generated QA database compared with the manually prepared database. In addition, we propose a new scoring method to choose an answer based on F-measure, which improved the accuracy of answer selection.

  152. Evaluation of annealing schadule for PLSA language model adaptaion

    KATO Masaharu, KOSAKA Tetsuo, ITO Akinori, MAKINO Shozo

    IEICE technical report 108 (337) 49-53 2008/12/02

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    Probabilistic Latent Semantic Analysis (PLSA) is a powerful statistical laguage model. However the PLSA has the local maxima problem. To overcame this problem, the EM annealing algorithm has been proposed. In this paper, we designed annealing schedule β with some continuous functions. As a result, we found that increasing functions and square root functions are the best for annealing schedule. In the experiment, we obtain 28.7% perplexity reduction and 5.3% word error rate reduction.

  153. Estimation of Spoken Dialog System using Automatically-generated question-and-answer database

    MORIMOTO Takahiro, ITO Masashi, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    IPSJ SIG Notes 2008 (123) 267-272 2008/12/02

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    A question-and-answer style spoken dialog system based on example-based answer generation is known to be robust against variation of user utterances. However, it is costly to create QA database for a new task. In this paper, we proposed a method to reduce cost of preparing the database by generating the database automatically from templates. As a result, we obtained almost same performance using the automatically generated QA database compared with the manually prepared database. In addition, we propose a new scoring method to choose an answer based on F-measure, which improved the accuracy of answer selection.

  154. Multiple description coding of an audio stream by optimum recovery transforms

    Ito, A., Makino, S.

    Journal of Digital Information Management 6 (2) 189-195 2008/12/01

    ISSN: 0972-7272

  155. I-021 動き情報を用いたビットストリームパターン推定によるFlash VideoのMultiple Description符号化(グラフィクス・画像,一般論文)

    倉石 卓也, 伊藤 仁, 伊藤 彰則, 牧野 正三, 鈴木 基之

    情報科学技術フォーラム講演論文集 7 (3) 241-242 2008/08/20

    Publisher: FIT(電子情報通信学会・情報処理学会)運営委員会

  156. A study of high-quality speech modification based on sinusoidal representation

    38 (5) 513-518 2008/08/04

    Publisher: 日本音響学会聴覚研究委員会

    ISSN: 1346-1109

  157. A study of high-quality speech modification based on sinusoidal representation

    ITO Masashi, OHARA Keiji, ITO Akinori, YANO Masafumi

    IEICE technical report 108 (179) 41-46 2008/07/28

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    One of the crucial problems in speech analysis is to separate acoustical characteristics caused by source signal and vocal-tract filter from input speech signal. To overcome this problem, a method is proposed to estimate fundamental frequency and vocal-tract filter response on the basis of sinusoidal representation of speech. Three psycho-acoustical experiments were carried out to evaluate accuracy of the estimation for natural utterances. The results indicated that the proposed algorithm could estimate sinusoidal parameters and fundamental frequency with high accuracy. However, it was also indicated that non-negligible errors were remained in interpolating vocal-tract filter response.

  158. Intonation Evaluation by Combination of Multiple Evaluation Scores using Synthesized Voice

    KONNO Tomoaki, ITO Masashi, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    IEICE technical report 108 (142) 37-42 2008/07/12

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    In this paper, we describe a system for intonation evaluation of English utterance by Japanese learners using synthesized speech for a CALL system. To evaluate the intonation of learners' utterance, we need reference utterances, for which English native speakers' utterances should be used. However, it is costly to gather native speakers' utterances for all sentences in the system. Therefore, we examined an intonation evaluation method using synthesized speech. Intonation evaluation system calculates scores between a learner's utterance and corresponding utterances by the teachers. We investigated a method of combining multiple scores. In addition, we incorporated a feature for rhythm evaluation into intonation evaluation. As a result, we obtained improvement of correlation between scores by human evaluators and the system. Furthermore, we analyzed a tendency of intonation evaluation by the system through limiting evaluation utterances to find out what degrades the system performance.

  159. Statistical Language Modeling and Its Problems

    ITO Akinori

    IPSJ SIG Notes 2008 (68) 43-46 2008/07/11

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    Statistical language models are widely used as language models for large vocabulary continuous speech recognition. Above all, a back-off n-gram is a de facto standard as a language model for speech recognition. Number of models have been proposed so far for overcoming the back-off n-gram, but none of them has achieved large improvement over the back-off trigram. In this paper, various language models are briefly reviewed, and I give some suggestions what is needed for current language models, and discuss possibilities of improving language models.

  160. Packet Loss Concealment for Flash Video Streaming Using Multiple Description Coding

    KURAISHI Takuya, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    全国大会講演論文集 70 (0) 107-108 2008/03/13

  161. DS-4-3 A New Lower Bits Substitution Method for log-PCM using ADPCM

    ABE Shun-ichiro, ITO Akinori, SUZUKI Yoiti

    Proceedings of the IEICE General Conference 2008 (2) "S-23"-"S-24" 2008/03/05

    Publisher: The Institute of Electronics, Information and Communication Engineers

  162. Improvement of a Query-by-Humming Music Information Retrieval System using Multiple Musical Interval Features

    ICHIKAWA Takuto, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    IPSJ SIG Notes 2008 (12) 7-12 2008/02/08

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    This paper describes a query-by-humming (QbH) music information retrieval (MIR) system without FO extraction. In FO extraction based system, FO extraction errors inevitably occur that degrades performance of the system. Furthermore, errors in pitch of sung data degrade performance of the system, too. To improve these problems, we have propose an MIR system that used a musical interval feature and probabilistic models. The performance of the proposed system exceeded the system based FO extraction. In this paper, we use peak interval of the cross-correlation function as a tonal feature to improve performance of the system. In addition, we integrated multiple retrieval result to obtain better recognition result. From an experimented result, the top retrieval accuracy given by the proposed method have exceeded the system based FO extraction by 13.2 %.

  163. 正弦波モデルに基づく高品質音声変調の検討

    伊藤仁, 小原桂二, 伊藤彰則, 矢野雅文

    信学技報 EA2008-52 (15067) 2008

  164. 正弦波モデルに基づく非定常音声の分析と変調

    伊藤仁, 小原桂二, 伊藤彰則, 矢野雅文

    日本音響学会秋季研究発表会講演論文集 3-4-5. 2008

  165. Are Bigger Robots Scary? - The Relationship Between Robot Size and Psychological Threat -

    Yutaka Hiroi, Akinori Ito

    2008 IEEE/ASME INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT MECHATRONICS, VOLS 1-3 546-551 2008

    DOI: 10.1109/AIM.2008.4601719  

    ISSN: 2159-6255

  166. A Fast Speaker Adaptation Method using Aspect Model

    Seongjun Hahm, Akinori Ito, Shozo Makino, Motoyuki Suzuki

    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 1221-1224 2008

  167. Recognition of English Utterances with Grammatical and Lexical Mistakes for Dialogue-based CALL System

    Akinori Ito, Ryohei Tsutsui, Shozo Makino, Motoyuki Suzuki

    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 2819-2822 2008

  168. Discrimination of Task-Related Words for Vocabulary Design of Spoken Dialog Systems

    Akinori Ito, Toyomi Meguro, Shozo Makino, Motoyuki Suzuki

    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 207-+ 2008

  169. Automatic Clustering of Part-of-speech for Vocabulary Divided PLSA Language Model

    Motoyuki Suzuki, Naoto Kuriyama, Akinori Ito, Shozo Makino

    IEEE NLP-KE 2008: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING 289-+ 2008

    DOI: 10.1109/NLPKE.2008.4906747  

  170. Examination of judgment method of utterance outside task in voice conversation system

    MEGURO Toyomi, SUZUKI Motoyuki, ITO Akinori, MAKINO Syozo

    IPSJ SIG Notes 2007 (129) 283-287 2007/12/21

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    In a small task, to be able to do more flexible processing, the utterance that relates to the task is recognized by the written grammar and the utterance that did not relate to the task is recognized by a large vocabulary speech recognition. Then, the technique for identifying sentences that do not relate to sentences that relate to the task by using semantic distance between words of the noun is examined in this paper.

  171. A Study on the Environment and Speaker Adaptation System using Aspect model

    HAHM Seongjun, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    IPSJ SIG Notes 2007 (129) 115-118 2007/12/20

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    One of the key issues for adaptation algorithms is to modify a large number of parameters with only a small amount of adaptation data. Speaker adaptation techniques try to obtain near speaker dependent (SD) performance with only small amounts of specific data and are often based on initial speaker independent (SI) recognition systems. In this paper, we introduce an aspect model into an acoustic model for rapid speaker and environment adaptation. A formulation of probabilistic latent semantic analysis (PLSA) is extended to continuous density HMM. We carried out isolated word recognition experiment, and the results was compared to that of MAP and MLLR.

  172. Speech recognition of English spoken by Japanese native speekers using N-gram trained from generated text

    TSUTSUI Ryohei, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    IPSJ SIG Notes 2007 (129) 125-130 2007/12/20

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    Our goal is to develop a voice interactive CALL system which enables langage learners to practice words, phrases, and grammars interactively. In order to develop such a system, it is necessary to recognize learner's utterances correctly. We found that 4 or 5 states HMM works better than 3 states HMM in the case of recognition of English spoken by Japanese native speakers. Ngram language model trained from generated text achieves heigher speech recognition accuracy than FSA (Finite States Automata) language model.

  173. Phoneme Recognition with SSS-free HMnet using, Cutting number of paths Method and Smoothing Method

    HONMA Daisuke, OHKAWA YUICHI, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    IPSJ SIG Notes 2007 (129) 131-135 2007/12/20

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    When carrying out phoneme recognition with SSS-free HMnet's path connection probability, as probability is specialization for training data, phoneme accuracy don't improve. In this paper, We propose smoorhing method and cutting number of paths Method. In phoneme recognition for specific speaker, as a result both of methods prevent connection probability's specialization, phoneme accuracy improve better than conventonal method.

  174. Phoneme Recognition with SSS-free HMnet using, Cutting number of paths Method and Smoothing Method

    HONMA Daisuke, OHKAWA YUICHI, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    IEICE technical report 107 (406) 131-135 2007/12/13

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    When carrying out phoneme recognition with SSS-free HMnet's path connection probability, as probability is specialization for training data, phoneme accuracy don't improve. In this paper, We propose smoorhing method and cutting number of paths Method. In phoneme recognition for specific speaker, as a result both of methods prevent connection probability's specialization, phoneme accuracy improve better than conventonal method.

  175. A Study on the Environment and Speaker Adaptation System using Aspect model

    HAHM Seongjun, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    IEICE technical report 107 (406) 115-118 2007/12/13

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    One of the key issues for adaptation algorithms is to modify a large number of parameters with only a small amount of adaptation data. Speaker adaptation techniques try to obtain near speaker dependent (SD) performance with only small amounts of specific data and are often based on initial speaker independent (SI) recognition systems. In this paper, we introduce an aspect model into an acoustic model for rapid speaker and environment adaptation. A formulation of probabilistic latent semantic analysis (PLSA) is extended to continuous density HMM. We carried out isolated word recognition experiment, and the results was compared to that of MAP and MLLR.

  176. Examination of judgment method of utterance outside task in voice conversation system

    MEGURO Toyomi, SUZUKI Motoyuki, ITO Akinori, MAKINO Syozo

    IEICE technical report 107 (406) 283-287 2007/12/13

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    In a small task, to be able to do more flexible processing, the utterance that relates to the task is recognized by the written grammar and the utterance that did not relate to the task is recognized by a large vocabulary speech recognition. Then, the technique for identifying sentences that do not relate to sentences that relate to the task by using semantic distance between words of the noun is examined in this paper.

  177. Speech recognition of English spoken by Japanese native speekers using N-gram trained from generated text

    TSUTSUI Ryohei, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    IEICE technical report 107 (406) 125-130 2007/12/13

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    Our goal is to develop a voice interactive CALL system which enables langage learners to practice words, phrases, and grammars interactively. In order to develop such a system, it is necessary to recognize learner's utterances correctly. We found that 4 or 5 states HMM works better than 3 states HMM in the case of recognition of English spoken by Japanese native speakers. Ngram language model trained from generated text achieves heigher speech recognition accuracy than FSA(Finite States Automata) language model.

  178. 「おかしな言語」の楽しみ(ちょっとしたエッセイ,コーヒーブレーク)

    伊藤 彰則

    日本音響学会誌 63 (11) 696-696 2007/11/01

    Publisher: 一般社団法人日本音響学会

    ISSN: 0369-4232

  179. Increasing correlation in one or two bits

    37 (7) 509-514 2007/08/09

    Publisher: 日本音響学会聴覚研究委員会

    ISSN: 1346-1109

  180. Increasing correlation in one or two bits

    ITO Akinori, MAKINO Shozo

    IEICE technical report 107 (186) 1-6 2007/08/02

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    In this paper we investigated methods that increase correlation between two values using one or two bits of extra information. For methods that use one bit, we investigated '1-bit quantization, ' 'sign correction' and 'difference quantization' methods. For those that use two bits, we investigated '2-bit quantization, ' 'sign correction+difference quantization' methods. From theoretical analysis and numerical experiments, it has been found that the quantization-based method is best when correlation of the original data is weak, while 'difference quantization' or combination of sign correction is better when the original data have strong correlation. Then we applied the methods to multiple description coding of speech signals.

  181. Query-by-Humming Music Information Retrieval System using Probabilistic Distribution for Tone Interval Features

    ICHIKAWA Takuto, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    IPSJ SIG Notes 2007 (81) 33-38 2007/08/01

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    This paper describes a query-by-humming (QbH) music information retrieval (MIR) system without pitch extraction. In pitch extraction based system, pitch extraction errors inevitably occur that degrades performance of the system. In this system, a cross-correlation function between two logarithmic frequency spectra is extracted as a tonal feature instead of deltaPitch, and probabilistic models are prepared for all tone intervals assumed to exist in the music. When two signals corresponding to two contiguous notes are given, likelihoods are calculated for all possibility of tone intervals. The advantage of this system is that it is hard to occur a fatal error such as a pitch extraction error because extracted features are modeled stochastically. From a experimented result, the top retrieval accuracy given by the proposed method have exceeded the system based pitch extraction by 4.9%.

  182. Automatic detection and estimation of the direction of calling speech under noisy envirionment

    SUZUKI Motoyuki, KITADATE Kota, ITO Akinori, MAKINO Shozo

    IEICE technical report 107 (116) 67-72 2007/06/21

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    If the robot understands user's calling voice, it can approach the user to hear the user's commands. In this paper we developed a method to detect user's calling voice and estimate the direction of arrival (DoA) of the voice in a real environment. Many works have been proposed for estimation of DoA, but most of those methods do not assume more than one voice sources. Our method detects pre-registered voice even when other voice and heavy noise exist. The method combines two distinct technologies. One is the multi-channel spectrum subtraction (MSS). Using MSS we record sound from every five degree. The next technology is word spotting using continuous DP (CDP) matching. We perform CDP for all of 72 directions in parallel. When a registered word is detected, the word is verified using the frame relation matrix, which expresses word-internal similarities. Finally, the scores of CDP are combined with power of each direction to determine the DoA. We carried out experiments and obtained 95% accuracy for from 0 to 20 dB SNR conditions.

  183. The evaluation of vocabulary divided PLSA language model using information criterion

    栗山 直人, 鈴木 基之, 伊藤 彰則

    Proceedings of the Spoken Document Processing Workshop 1 103-108 2007/02/26

    Publisher: [豊橋技術科学大学メディア科学リサーチセンター]

  184. Unsupervised iterative language model adaptation using WWW

    梶浦 泰智, 鈴木 基之, 伊藤 彰則

    Proceedings of the Spoken Document Processing Workshop 1 109-114 2007/02/26

    Publisher: [豊橋技術科学大学メディア科学リサーチセンター]

  185. B-6-82 Enhanced secret and high quality audio communication system by disjoint path routing

    ENOMOTO Nobuyuki, KITAMURA Tsuyoshi, IWATA Atsushi, TANI Hideaki, ABE Shunichiro, NISHIMURA Ryouichi, SUZUKI Yoiti, SAKAI Toshiyuki, ITO Akinori, MAKINO Shozo

    Proceedings of the IEICE General Conference 82-82 2007

    Publisher: The Institute of Electronics, Information and Communication Engineers

  186. 音声符号化へのMD量子化の適用に関する基礎的検討

    WEY H., 西村竜一, 伊藤彰則, 小林まおり, 鈴木陽一

    日本音響学会研究発表会講演論文集(CD-ROM) 2007 2007

    ISSN: 1880-7658

  187. Automatic evaluation system of English prosody for Japanese learner's speech

    Motoyuki Suzuki, Tatsuki Konno, Akinori Ito, Shozo Makino

    IMSCI '07: INTERNATIONAL MULTI-CONFERENCE ON SOCIETY, CYBERNETICS AND INFORMATICS, VOL 1, PROCEEDINGS 1 48-53 2007

  188. Analysis of cell wall polysaccharides during storage of a local melon accession 'Wasada-uri' compared to the melon cultivar 'Prince'

    T. Nishizawa, A. Ito

    Journal of Horticultural Science and Biotechnology 82 (2) 227-234 2007

    Publisher: Headley Brothers Ltd

    DOI: 10.1080/14620316.2007.11512224  

    ISSN: 1462-0316

  189. Topic and style adaptation using vocabulary divided PLSA language model by criterion of information

    KURIYAMA Naoto, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    IPSJ SIG Notes 2006 (136) 233-238 2006/12/22

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    PLSA (Probabilistic Latent Semantic Analysis) is one of promising language model adaptation methods. We propose a new way to combine PLSA and N-gram models by separating the vocabulary into three classes - 'topic'-related, 'style'-related and 'general'-related words. This method trains topic vocabulary PLSA model, style vocabulary PLSA model, and general vocabulary unigram model independently, and combines the three models. And we propose an automatic composing method of vocabulary divide criterion, using pattern of word-Class occurrence between newspaper and CSJ. The experimental result showed that the proposed method achieves 15.48% perplexity reduction than conventional PLSA model, about testset of which topic and style feature are not happen together in the training data.

  190. Deciding Search Query for Unsupervised Language Model Adaptation using WWW

    KAJIURA Yasutomo, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    IPSJ SIG Notes 2006 (136) 131-135 2006/12/21

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    To improve the accuracy of an LVCSR system, it is effective to gather text data related to the topic of the input speech and adapts the language model using the text data. However, collecting topic-related text manually requires much effort. To automate the text collection, we have proposed a method to create an adapted language model by collecting topic-related text from World Wide Web. In this paper, we propose the method of deciding available search query using similarities between words and calculating query's availability using small WWW texts. This method reaches same performance as selected query by human.

  191. Topic and style adaptation using vocabulary divided PLSA language model by criterion of information

    KURIYAMA Naoto, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    IEICE technical report 106 (444) 55-60 2006/12/15

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    PLSA (Probabilistic Latent Semantic Analysis) is one of promising language model adaptation methods. We propose a new way to combine PLSA and N-gram models by separating the vocabulary into three classes-'topic'-related, 'style'-related and 'general'-related words. This method trains topic vocabulary PLSA model, style vocabulary PLSA model, and general vocabulary unigram model independently, and combines the three models. And we propose an automatic composing method of vocabulary divide criterion, using pattern of word-Class occurrence between newspaper and CSJ. The experimental result showed that the proposed method achieves 15.48% perplexity reduction than conventional PLSA model, about testset of which topic and style feature are not happen together in the training data.

  192. Deciding Search Query for Unsupervised Language Model Adaptation using WWW

    KAJIURA Yasutomo, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    IEICE technical report 106 (443) 131-135 2006/12/14

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    To improve the accuracy of an LVCSR system, it is effective to gather text data related to the topic of the input speech and adapts the language model using the text data. However, collecting topic-related text manually requires much effort. To automate the text collection, we have proposed a method to create an adapted language model by collecting topic-related text from World Wide Web. In this paper, we propose the method of deciding available search query using similarities between words and calculating query's availability using small WWW texts. This method reaches same performance as selected query by human.

  193. Music information retrieval from a singing voice based on verification of recognized hypotheses

    Motoyuki Suzuki, Toru Hosoya, Akinori Ito, Shozo Makino

    ISMIR 2006 - 7th International Conference on Music Information Retrieval 168-171 2006/12/01

  194. A new construction method of a context-dependent HMnet considering phonetic variations

    SUZUKI Motoyuki, SAKAMOTO Hajime, ITO Akinori, MAKINO Shozo

    IEICE technical report 106 (123) 37-41 2006/06/16

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    Most of all speech recognition systems use a context-dependent HMM (such as triphone) as an acoustic model. It can represent phonetic variations depending on a phoneme context, however, other factors such as speaker, speaking rate, and so on, cannot be considered. In this paper, a new construction algorithm of HMnet is proposed. It can construct an HMnet considering various phonetic variations by combining between SSS and SSS-free algorithm. From the experimental results, the proposed algorithm gives higher recognition accuracy than that given by conventional SSS and SSS-free.

  195. Unsupervised Language Model Adaptation using Web Text

    KAJIURA Yasutomo, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    IEICE technical report 106 (123) 43-47 2006/06/16

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    To improve the accuracy of an LVCSR system, it is effective to gather text data related to the topic of the input speech and adapts the language model using the text data. However, collecting topic-related text manually requires much effort. To automate the text collection, we have proposed a method to create an adapted language model by collecting topic-related text from World Wide Web. In this paper, we propose two new methods the search query using multiple words extracted from the preliminary recognition result. This method achieved 2.2 points higher accuracy than the previous method when 1000 documents are gathered. The other method excludes the misrecognized words from the query words. Using the proposed method, the ratio of misrecognized words in all words reduced to only 4%.

  196. 「人はなぜコンピューターを人間として扱うか『メディアの等式』の心理学」, バイロン・リーブズ, クリフォード・ナス著, 細馬宏通訳, 翔泳社, 2001年(私のすすめるこの一冊,コーヒーブレーク)

    伊藤 彰則

    日本音響学会誌 62 (6) 473-474 2006/06/01

    Publisher: 一般社団法人日本音響学会

    ISSN: 0369-4232

  197. A-19-15 An Interpolation Method of the Feature Vector for Finger Character Recognition

    Osato Muneyuki, Suzuki Motoyuki, Ito Akinori, Makino Shozo

    Proceedings of the IEICE General Conference 2006 333-333 2006/03/08

    Publisher: The Institute of Electronics, Information and Communication Engineers

  198. Training optimization and vocabulary division of PLSA language model

    KURIYAMA Naoto, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    IPSJ SIG Notes 2006 (12) 37-42 2006/02/04

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    PLSA is a method of composing language model which can reflect the global charactetistics of linguistic context as "topic". We propose more extention of PLSA language model. First, we compare the conventional learning methods of PLSA language model, and examine the optimization of EM annealing schedule. As a result, we found that the best method is to reduce β from 1.0 to some special value. Next, we compose a PLSA language model whose vocabulary set is divided, into content words and function words. Then training and adaptation to topic or style are performed separately. In the experiment, we acheived 82.23% perplexity reduction against conventional way 83.90%.

  199. 2項 音響工学研究会(3節 工学研究会,第5章 国際会議・シンポジウム等)

    鈴木 陽一, 坂本 修一, 伊藤 彰則

    東北大学電気通信研究所研究活動報告 13 278-278 2006/01/01

  200. ロボットアバタを用いたユーザ親和性向上手法の高齢者による評価

    廣井富, 伊藤彰則, 高津宣夫, 中野栄二

    情報科学技術フォーラム FIT 2006 2006

  201. Unsupervised language model adaptation based on automatic text collection from WWW

    Motoyuki Suzuki, Yasutomo Kajiura, Akinori Ito, Shozo Makino

    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 5 2202-2205 2006

  202. A User Simulator based on VoiceXML for evaluation of spoken dialog systems

    Akinori Ito, Keisuke Shimada, Motoyuki Suzuki, Shozo Makino

    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 2 1045-1048 2006

  203. Lyrics recognition from a singing voice based on finite state automaton for music information retrieval

    Toru Hosoya, Motoyuki Suzuki, Akinori Ito, Shozo Makino

    ISMIR 2005 - 6th International Conference on Music Information Retrieval 532-535 2005/12/01

  204. Construction method of acoustic models dealing with various background noises based on combination of HMMs

    Motoyuki Suzuki, Yusuke Kato, Akinori Ito, Shozo Makino

    9th European Conference on Speech Communication and Technology 973-976 2005/12/01

  205. Pronunciation error detection method based on error rule clustering using a decision tree

    Akinori Ito, Yen Ling Lim, Motoyuki Suzuki, Shozo Makino

    9th European Conference on Speech Communication and Technology 173-176 2005/12/01

  206. Internal noise suppression for speech recognition by small robots

    Akinori Ito, Takashi Kanayama, Motoyuki Suzuki, Shozo Makino

    9th European Conference on Speech Communication and Technology 2685-2688 2005/12/01

  207. Feature Value Combination for Finger Character Recognition Using a Color Glove

    OSATO Muneyuki, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    IEICE technical report 105 (375) 73-78 2005/10/28

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    Several finger character recognition systems have been developed for the achievement of the finger character recognition to support communication between hearing-impaired people and a person in normal health. In the systems that utilize color information of the hand image, various feature values are employed. In this paper, several effective feature values for the finger character recognition are examined through some comparison experiment results. In addition, we try to recover errors caused by single feature value by combining multiple feature values. Using feature value combination and the combination by posterior probability, 8% improvement of recognition rate was obtained.

  208. Performance evaluation for the multi-mixture HMMs in various kinds of noise with various SNRs conditions

    SUZUKI Motoyuki, KATO Yusuke, ITO Akinori, MAKINO Shozo

    IEICE technical report. Speech 105 (133) 25-30 2005/06/17

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    Background noise is one of the biggest problem for speech recognition systems in real environments. In order to achieve high recognition performance for corrupted speech, we proposed a new construction method of HMMs dealing with various kinds of background noise. At first, each HMM dealing with a single noise is trained for each background noise, and then all Gaussian components of those HMMs are combined into a "multi-mixture HMM". From the experimental results, the multi-mixture HMM gave the highest recognition performance for any kind of noise and any variation of SNR. Although the multi-mixture HMMs has high performance, it has a huge number of Gaussian components that makes the speech recognition slower. In order to solve the problem, we also proposed a reduction method of Gaussian components. It can decrease the number of Gaussian components with slight deterioration of recognition performance.

  209. Internal noise suppression for speech recognition by small robots based on the noise spectrum prediction

    ITO Akinori, KANAYAMA Takashi, SUZUKI Motoyuki, MAKINO Shozo

    IEICE technical report. Speech 105 (133) 43-48 2005/06/17

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    Speech recognition by a small robot is difficult because the robot makes noise by itself. In this paper, two new methods are proposed that supresses internal noise of small robots. These methods are based on spectral subtraction (SS). The difference of the proposed methods from the orininal SS is that the proposed methods use the estimated noise spectrum dependent to the motion of the robot. One method, called MDSS, prepares the noise spectrums for all motions. Another method, called NPSS, predicts the noise spectrum from angular velocities of all joints of the robot using a neural network. From the results of the comparison among the original SS and the proposed methods, the proposed methods outperformed the conventional SS. The MDSS method gave good result when the noise within one motion was stable, while the NPSS worked well even when the noise of the motion was unstable.

  210. HMnet training method combining SSS and SSS-free

    SAKAMOTO Hajime, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    日本音響学会研究発表会講演論文集 2005 (1) 31-32 2005/03/08

    Publisher: 日本音響学会

    ISSN: 1340-3168

  211. A construction of a dialogue agent for evaluation of spoken dialogue system

    SHIMADA Keisuke, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    日本音響学会研究発表会講演論文集 2005 (1) 69-70 2005/03/08

    ISSN: 1340-3168

  212. An HMM robust to multiple noise conditions and change in Signal-Noise ratio by combining multiple noise-adapted HMMs

    KATO Yusuke, ITO Akinori, SUZUKI Motoyuki, MAKINO Shozo

    日本音響学会研究発表会講演論文集 2005 (1) 83-84 2005/03/08

    ISSN: 1340-3168

  213. Improvement of audio signal dimension compression using KL expansion

    HARADA Shoji, ITO Akinori, SUZUKI Motoyuki, KOHATA Minoru, MAKINO Shozo

    日本音響学会研究発表会講演論文集 2005 (1) 199-200 2005/03/08

    ISSN: 1340-3168

  214. Laughter Recognition from Natural Conversation Video Using Facial Expression Recognition

    WANG Xinyue, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    日本音響学会研究発表会講演論文集 2005 (1) 217-218 2005/03/08

    ISSN: 1340-3168

  215. A grammar error detection method for interactive CALL system

    KWEON O.-P, ITO A, SUZUKI M, MAKINO S

    日本音響学会研究発表会講演論文集 2005 (1) 303-304 2005/03/08

    ISSN: 1340-3168

  216. Lyrics recognition based on Deterministic Finite State Automaton for song retrieval system

    HOSOYA Toru, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    日本音響学会研究発表会講演論文集 2005 (1) 603-604 2005/03/08

    ISSN: 1340-3168

  217. Speech recognition robust to the internal noise for small robots

    KANAYAMA Takashi, ITO Akinori, SUZUKI Motoyuki, MAKINO Shozo

    日本音響学会研究発表会講演論文集 2005 (1) 659-660 2005/03/08

    ISSN: 1340-3168

  218. A-19-13 An Examination of Finger Character Recognition Using Color Information

    Osato Muneyuki, Suzuki Motoyuki, Ito Akinori, Makino Shozo

    Proceedings of the IEICE General Conference 342-342 2005

    Publisher: The Institute of Electronics, Information and Communication Engineers

  219. Frame - Based Spoken Dialog System for Autonomous Robots

    MAKINO Shozo, KONASHI Takashi, ITO Akinori, SUZUKI Motoyuki

    IPSJ SIG Notes 2004 (108) 141-146 2004/11/05

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    We have been developing a spoken dialog system. Conventional spoken dialog systems need grammar descriptions and scripts of a dialog, that are difficult to develop. The system proposed in this paper is based on semantic frames, and the system generates the recognition grammar from the frames automatically. As the system requires only a frame-based description for a task of dialog, the system can be easily applied to different kinds of tasks. Moreover, the recognition accuracy is improved by sentence weighting based on phrase class template. We evaluated the system by experiments. The system reached the goal with 2.44 user's utterances in average.

  220. Frame-Based Spoken Dialog System for Autonomous Robots

    MAKINO Shozo, KONASHI Takashi, ITO Akinori, SUZUKI Motoyuki

    IEICE technical report. Natural language understanding and models of communication 104 (417) 65-70 2004/10/29

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    We have been developing a spoken dialog system. Conventional spoken dialog systems need grammar descriptions and scripts of a dialog, that are difficult to develop. The system proposed in this paper is based on semantic frames, and the system generates the recognition grammar from the frames automatically. As the system requires only a frame-based description for a task of dialog, the system can be easily applied to different kinds of tasks. Moreover, the recognition accuracy is improved by sentence weighting based on phrase class template. We evaluated the system by experiments. The system reached the goal with 2.44 user's utterances in average.

  221. I-069 Smile and Laugh Recognition from Natural Conversation Video

    Xinyue Wang, Suzuki Motoyuki, Ito Akinori, Makino Shozo

    3 (3) 163-164 2004/08/20

    Publisher: Forum on Information Technology

  222. G-014 Comparison of features for Query-by-Humming MIR

    Ito Akinori, Heo Sung-Phil, Suzuki Motoyuki, Makino Shozo

    情報科学技術フォーラム一般講演論文集 3 (2) 373-374 2004/08/20

    Publisher: Forum on Information Technology

  223. I-009 Environmental Map Generation by Omnidirectional Stereo

    Goto Nozomu, Suzuki Motoyuki, Ito Akinori, Makino Shozo

    情報科学技術フォーラム一般講演論文集 3 (3) 19-20 2004/08/20

    Publisher: Forum on Information Technology

  224. An HMM robust to multiple noise conditions by combining multiple noise - adapted HMMs

    KATO Yusuke, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    IPSJ SIG Notes 2004 (57) 1-6 2004/05/27

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    This paper describes methods to compose an HMM robust under multiple noise conditions. The methods are based on combination of several HMMs trained under different noise conditions. We propose two combination methods. The first one combines multiple HMMs into a multi-path HMM. The second one combines corresponding states of each HMM into one state by mixing the output probability distributions onto one mixture distribution. The recognition experiment revealed that HMMs composed by the proposed methods shows similar or better results than conventional multi-condition model. One drawback of the model composed by the proposed methods is that it has large number of distributions. To reduce the number of distributions, we examined several methods to unify distributions.

  225. An HMM robust to multiple noise conditions by combining multiple noise-adapted HMMs

    KATO Yusuke, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    IEICE technical report. Speech 104 (86) 1-6 2004/05/20

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    This paper describes methods to compose an HMM robust under multiple noise conditions. The methods are based on combination of several HMMs trained under different noise conditions. We propose two combination methods. The first one combines multiple HMMs into a multi-path HMM. The second one combines corresponding states of each HMM into one state by mixing the output probability distributions onto one mixture distribution. The recognition experiment revealed that HMMs composed by the proposed methods shows similar or better results than conventional multi-condition model. One drawback of the model composed by the proposed methods is that it has large number of distributions. To reduce the number of distributions, we examined several methods to unify distributions.

  226. Recent Topics on Speech Recognition

    Akinori Ito

    IEICE Information and Systems Society Journal 9 (1) 14-21 2004/05/01

    Publisher: The Institute of Electronics, Information and Communication Engineers

  227. A study on dialogue-based CALL system

    KWEON Oh-pyo, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    IEICE technical report. Speech 103 (633) 19-24 2004/01/23

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    This paper describes a dialogue-based CALL (Computer Assisted Language Learning) system. One of the major problems in CALL systems is that learners are usually assigned a passive role. Learners have no practices in composing their own utterances. The other major problem is that most of the propose CALL systems are the pronunciation exercise systems such as minimal pairs. The pronunciation exercise is an unrealistic task, if the goal of the learner is to obtain the ability to participate actively in a conversation. We proposed a dialogue-based CALL system with which learners can practice in making a conversation and in composing utterances actively. Depending on learners ' utterances, the path of conversation is also changed. A system also checks pronunciation and grammatical errors, and return proper expression. Therefore, learners can obtain the ability to participate actively in a conversation.

  228. Noise adaptive spoken dialog system based on selection of multiple dialog strategies

    Akinori Ito, Takanobu Oba, Takashi Konashi, Motoyuki Suzuki, Shozo Makino

    8th International Conference on Spoken Language Processing, ICSLP 2004 193-196 2004/01/01

  229. A Japanese dialogue-based CALL system with mispronunciation and grammar error detection

    Oh Pyo Kweon, Akinori Ito, Motoyuki Suzuki, Shozo Makino

    8th International Conference on Spoken Language Processing, ICSLP 2004 1833-1836 2004/01/01

  230. A spoken dialog system based on automatic grammar generation and template-based weighting for autonomous mobile robots

    Takashi Konashi, Motoyuki Suzuki, Akinori Ito, Shozo Makino

    8th International Conference on Spoken Language Processing, ICSLP 2004 189-192 2004/01/01

  231. Speaker adaptation method for call systems using bilingual speakers' utterances

    Motoyuki Suzuki, Hirokazu Ogasawara, Akinori Ito, Yuichi Ohkawa, Shozo Makino

    8th International Conference on Spoken Language Processing, ICSLP 2004 2929-2932 2004/01/01

  232. Error tolerant melody matching method in music information retrieval

    SP Heo, M Suzuki, A Ito, S Makino, HY Chung

    ADAPTIVE MULTIMEDIA RETRIEVAL 3094 212-227 2004

    ISSN: 0302-9743

  233. 様々な雑音環境での音声対話における文法と認識精度の関係の分析 (第5回音声言語シンポジウム)

    大庭 隆伸, 鈴木 基之, 伊藤 彰則, 牧野 正三

    電子情報通信学会技術研究報告 103 (517) 133-138 2003/12/18

    Publisher: 一般社団法人電子情報通信学会

    ISSN: 0913-5685

    More details Close

    音声認識において,雑音下での認識精度の改善は重要な課題の一つとなっている.そのために,音響モデルや雑音除去法の改善など様々な研究が行われているが,本稿では,対話の立場からの精度改善を試みる.具体的には,音声認識にとって不利な雑音環境になるのにあわせ,認識対象とする語彙・侯補数を削減した文法に変更し音声認識を行う.これにより,雑音の影響が小さい場合には,ユーザの自由な発話を認識できる枠組みを残しつつ,雑音下でも一定の認識精度を維持して対話を行うことが可能となる.これを実現するためには,まず,語彙・侯補数を削減した際に,認識側で認識対象としていない語彙や文法を含むユーザ発話が増加してしまうが,そのための対策が必要となる.また,認識文法を環境にあわせて変更させるには,ある雑音下で対話を行った場合に,認識精度がどの程度になるかを推定する必要があり,これをどのように実現するかが課題となる.前者については,システムの質問提示方法を工夫することにより対策を行い,後者については,雑音・文法と認識精度の関係をニューラルネット学習により推定可能か検討する.

  234. Speaker Adaptation of Bilingual Phone Models using Bilingual Speakers' Speech

    OGASAWARA Hirokazu, OHKAWA Yuichi, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    IEICE technical report. Natural language understanding and models of communication 103 (517) 85-90 2003/12/18

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    In this paper, we investigate a method of speaker adaptation of bilingual phone models to improve precision of non-native speech recognition system. Non-native speakers tend to substitute native-language's phones for non-native phones, therefore the recognition system must use bilingual phone models consist of all phones in non-native and native languages. Speaker adaptation, generally, use utterance of the same language as the phone model. However, non-native speaker can't speak well to use speaker adaptation. In order to adapt bilingual phone models, we propose a speaker adaptation method of bilingual phone models using native speaker's utterance. To improve bilingual phone models, we propose a method using bilingual speakers' speech. Experiments showed that the bilingual phone models adapted by the proposed method outperformed the models adapted by conventional methods.

  235. Speaker Adaptation of Bilingual Phone Models using Bilingual Speakers' Speech

    OGASAWARA Hirokazu, OHKAWA Yuichi, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    IPSJ SIG Notes 2003 (124) 85-90 2003/12/18

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    In this paper, we investigate a method of speaker adaptation of bilingual phone models to improve precision of non-native speech recognition system. Non-native speakers tend to substitute native-language's phones for non-native phones, therefore the recognition system must use bilingual phone models consist of all phones in non-native and native languages. Speaker adaptation, generally, use utterance of the same language as the phone model. However, non-native speaker can't speak well to use speaker adaptation. In order to adapt bilingual phone models, we propose a speaker adaptation method of bilingual phone models using native speaker's utterance. To improve bilingual phone models, we propose a method using bilingual speakers' speech. Experiments showed that the bilingual phone models adapted by the proposed method outperformed the models adapted by conventional methods.

  236. 様々な雑音環境での音声対話における文法と認識精度の関係の分析

    大庭 隆伸, 鈴木 基之, 伊藤 彰則, 牧野 正三

    情報処理学会研究報告音声言語情報処理(SLP) 2003 (124) 133-138 2003/12/18

    Publisher: 一般社団法人情報処理学会

    ISSN: 0919-6072

    More details Close

    音声認識において,雑音下での認識精度の改善は重要な課題の一つとなっている.そのために,音響モデルや雑音除去法の改善など様々な研究が行われているが,本稿では,対話の立場からの制度改善を試みる.具体的には,音声認識にとって不利な雑音環境になるのにあわせ,認識対象とする語彙・候補数を削除した文法に変更し音声認識を行う.これにより,雑音の影響が小さい場合には,ユーザの自由な発話を認識できる枠組みを残しつつ,雑音下でも一定の認識精度を維持して対話を行うことが可能となる.これを実現するためには,まず,語彙・候補数を削減した際に,認識側で認識対象としていない語彙や文法を含むユーザ発話が増加してしまうが,そのための対策が必要となる.また,認識文法を環境にあわせて変更させるには,ある雑音下で対話を行った場合に,認識精度がどの程度になるかを推定する必要があり,これをどのように実現するかが課題となる.前者については,システムの質問提示方法を工夫することにより対策を行い,後者については,雑音・文法と認識精度の関係をニューラルネット学習により推定可能か検討する.Speech recognition under noisy environment is one of the hottest topic in the speech recognition research. Noise-tolerant acoustic models or noise reduction techniques are often used to improve the recognition accuracy. In this paper, we propose a method to improve accuracy of spoken dialog system from a dialog strategy point of view. In the proposed method, the dialog system automatically changes its dialog strategy according to the estimated recognition accuracy in noisy environment in order to keep the performance of the system constant. In a noise-free environment, the system accepts any utterance from a user. On the other hand, the system restricts its grammar and vocabulary in a noisy environment. To realize this strategy, we investigated a method to avoid user's out of grammar utterances through an instruction given by the system to a user. Furthermore, we developed a method to estimate recognition from features extracted from noise signal.

  237. Face Detection for Gesture Recognition System

    ONODERA Mieko, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    電子情報通信学会技術研究報告. PRMU, パターン認識・メディア理解 103 (453) 25-30 2003/11/21

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    In this paper, we investigate a face detection method for a gesture recognition system. In a gesture recognition system, a face in image might be so small that its parts (eyes, mouth and so on) can't be identified and its outline is not clear, when the distance from the person to the camera is large. In order to detect a small face, we focus on the method based on HMM (Hidden Markov Model). HMM is statistical model used to characterize the statistical properties of a signal. Then we examine a face detection method using HMM to investigate the effect of future vectors and the HMM topology for a face detection method that can detect a small face. Besides, the effect of the size difference between training faces and faces of evaluation data is investigated.

  238. Product Software of Continuous Speech Recognition Consortium -2002 version-

    KAWAHARA T, SUMIYOSHI T, LEE A, BANNO H, TAKEDA K, MIMURA M, ITOU K, ITO A, SHIKANO K

    IPSJ SIG Notes 2003 (104) 1-6 2003/10/17

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    Continuous Speech Recognition Consortium (CSRC) was founded under IPSJ SIG-SLP for further enhancement of Japanese Dictation Toolkit that had been developed by the IPA project. An overview of the software developed in the third year (Oct. 2002-Sep. 2003) is given in this report. The LVCSR (large vocabulary continuous speech recognition) engine Julius has been improved both in functionality and stability, and ported to Windows in compliance with SAPI (Speech API). A variety of acoustic and language models are set up to realize wider coverage of input, speech. The software package is currently available by contacting the address below.

  239. Examination of the method of learning HSn - gram

    NAGANO Takeshi, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    IPSJ SIG Notes 2003 (104) 35-40 2003/10/17

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    HSn-gram is a language model which extends an N-gram to Ergodic HMM. HSn-gram regards an N-gram as deterministic finite-state automata, and it extends the FSA into a non-deterministic finitestate automata by dividing each state into two or more states. A problem of learning HSn-gram is that estimation of the model is difficult, because the number of state and the number of state transition becomes large. In this paper, we propose a learning method of an HSn-gram that uses a set of parameters obtained from SSn-gram (the other HMM-based language model) as an initial parameter set. This method reduces the number of parameters, in order to cope with this problem. Consequently, the perplexity is reduced by 5% comparing to that normally learned HSn-gram.

  240. Speech Recognition in Unstable Noise Environment using Multipath HMM

    ITO Akinori, KISHIMA Tomonori, SUZUKI Motoyuki, MAKINO Shozo

    IEICE technical report. Speech 103 (93) 1-6 2003/05/29

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    This paper describes a multi-path HMM for speech recognition under unstable noise environment (multi-noise-path HMM). This method concatenates multiple HMMs in parallel, which are learned from speech data under different noise environment. On decoding, the decoder chooses the most likely path among possible paths in the HMM. The multi-noise-path HMM can recognize speech under unstable noise environment, under which noise changes within one utterance. In the experiment, we used white-noise-based unstable noises. Multi-noise-path HMM learned from several white-noise-added speech was used for recognition. The experimental result unvailed that the performance of multi-noise-path HMM was almost equivalent to the matched model under stable noise environment, while the proposed model gave better result than other single-path model under unstable noise environment.

  241. Validating significance of decoder parameters

    ITO A., MAKINO S.

    2003 (1) 147-148 2003/03/18

    ISSN: 1340-3168

  242. An investigation on multi-path HMM with duration control

    OHKAWA Yuichi, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    日本音響学会研究発表会講演論文集 2003 (1) 1-2 2003/03/18

    ISSN: 1340-3168

  243. Evaluation and Analysis of Japanese Pronunciation uttered by Korean

    KWEON Oh-Pyo, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    日本音響学会研究発表会講演論文集 2003 (1) 361-362 2003/03/18

    ISSN: 1340-3168

  244. Performance Evaluation of the Music Retrieval System using Plural Pitch Candidates

    HEO S-P, SUZUKI M, ITO A, MAKINO S

    日本音響学会研究発表会講演論文集 2003 (1) 847-848 2003/03/18

    ISSN: 1340-3168

  245. Construction of the Music Retrieval System using the Multiple Pitch Candidates

    HEO Sungphil, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    IPSJ SIG Notes 2003 (16) 85-90 2003/02/21

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    Users do not sing accurately, especially if they are inexperienced or an accompanied ; even skilled musicians have difficultly in maintaining the correct pitch of a song. Moreover errors may occur when a musical retrieval system extracts pitch from humming. Consider of these problems, we propose to extract multiple pitch candidates. This method has shown that multiple pitch candidates are important features in determining melodic similarity, but it is also clear that reliability information which obtained from power is important as well. In the experiment, we compared to search efficiency of the similar system. Proposed method showed good retrieval result compared with the similar system.

  246. A Study on Japanese Pronunciation Learning System for Korean Using Speech Recognition

    KWEON Oh-pyo, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    IEICE technical report. Speech 102 (618) 19-24 2003/01/23

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    This paper describes a CALL (Computer Assisted Language Learning) system for teaching the pronunciation of Japanese to Korean speaker. First, Japanese sentences uttered by adult Korean speakers were evaluated by Japanese native speakers. Then, the Japanese Learning System based on the evaluation result was developed. Our CALL system asks the learner to read a sentence including minimal pairs. Speech recognition technology is used to compare Japanese speech uttered by a learner to the utterance by a native speaker and the system automatically calculates intelligibility scores which indicate the similarity between the learner's sneech and the standard Japanese native's speech. Furthermore, when the learner make pronunciation mistake the learner confirm his/her mispronunciation. The system also eives a proper instruction of the pronunciation.

  247. Development of the Interactive and Robust Intelligent Patient Care System

    HIROI Yutaka, SHOJI Michihiko, JEONG Seong Hee, KUDO Masaya, TAKAHASHI Ryosuke, KONASHI Takashi, TAJIMA Makoto, OBA Takanobu, CHEN Qiu, NAKANO Eiji, TAKAHASHI Takayuki, MAKINO Shozo, ITO Akinori, OHMI Tadahiro, KOTANI Koji, TAKATSU Nobuo, SUZUKI Motosyuki

    The proceedings of the JSME annual meeting 2003 (0) 231-232 2003

    Publisher: The Japan Society of Mechanical Engineers

    DOI: 10.1299/jsmemecjo.2003.5.0_231  

    More details Close

    An intelligent service robot named IRIS (Interactive, Robust and Intelligent Patient Care System) has been developed with the aim to be used mainly in a sickroom of hospital. IRIS is composed of the speaker direction identification system, the dialog system with the patient, the face recognition system, the safety manipulator and the omni-directional vehicle (ODV). It is able to recognize the patient's face, to dialogue with someone, and to execute some simple tasks such as serving a drink safely by request. The hardware system of IRIS is mainly presented in this paper.

  248. An optimized multi-duration HMM for spontaneous speech recognition

    Yuichi Ohkawa, Akihiro Yoshida, Motoyuki Suzuki, Akinori Ito, Shozo Makino

    EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology 485-488 2003/01/01

  249. A study on language model based on kana and kanji string

    KINNO Hiroaki, KATOH Masaharu, KOSAKA Tetsuo, KOHDA Masaki, ITO Akinori

    IPSJ SIG Notes 2002 (121) 165-170 2002/12/16

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    This paper describes a character-based n-gram language model. The proposed model is based on Kanji and Kana character instead of word or morpheme determined by morphemic analysis. To exploit stronger constraint, character strings are used in addition to single characters as basic units of the model. We examined two methods to choose character strings. One method is based on frequency in the training corpus, and the other is based on mutual information as well as the frequency. We carried out experiments to compare perplexities and character error rates (CER) between the proposed model and conventional (word or character based) n-gram model. The results showed that the mutual information based method gave the better performance. Although the proposed model was not superior to the word-based model, it was better than the character-based one. The vocabulary size of the proposed model was about 50% smaller than that of word-based model.

  250. A study on language model based on kana and kanji string

    KINNO Hiroaki, KATOH Masaharu, KOSAKA Tetsuo, KOHDA Masaki, ITO Akinori

    IEICE technical report. Natural language understanding and models of communication 102 (528) 1-6 2002/12/13

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    This paper describes a character-based n-gram language model. The proposed model is based on Kanji and Kana character instead of word or morpheme determined by morphemic analysis. To exploit stronger constraint, character strings are used in addition to single characters as basic units of the model. We examined two methods to choose character strings. One method is based on frequency in the training corpus, and the other is based on mutual information as well as the frequency. We carried out experiments to compare perplexities and character error rates (CER) between the proposed model and conventional (word or character based) n-gram model. The results showed that the mutual information based method gave the better performance. Although the proposed model was not superior to the word-based model, it was better than the character-based one. The vocabulary size of the proposed model was about 50% smaller than that of word-based model.

  251. Product Software of Continuous Speech Recognition Consortium -2001 version-

    KAWAHARA T, SUMIYOSHI T, LEE A, BANNO H, TAKEDA K, MIMURA M, YAMADA T, NISHIURA T, ITOU K, ITO A, SHIKANO K

    IPSJ SIG Notes 2002 (98) 13-18 2002/10/25

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    Continuous Speech Recognition Consortium (CSRC) was founded under IPSJ SIG-SLP for further enhancement of Japanese Dictation Toolkit that had been developed by the IPA project. An overview of the software developed in the secondyear (Oct. 2001 - Sep. 2002) is given in this report. The LVCSR (large vocabulary continuous speech recognition) engine Julius is ported to Windows and compliance with SAPI (Speech API). A variety of acoustic models are set up to cover wider user generations and speech-input environments. The software is currently available by contacting the address below.

  252. A Speech Coding Method Using LZ Algorithm

    KOHATA Minoru, MITSUYA Ikuya, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    IEICE technical report. Speech 102 (335) 7-12 2002/09/17

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    Most of speech coding parameters have temporal redundancy, which might be removed. This article presents a new speech coding method using Lempel-Ziv algorithm. The proposed method was applied to quantize LP coefficients at first, and it performed better than Split-VQ, MSVQ, and MA prediction VQ in rate-distortion criterion. Then, the proposed method was also used to quantize F0 and gain, and a coder at 1.9kbit/s was designed. The quality of the coded speech was almost compatible with the FS-MELP at 2.4kbit/s according to the subjective tests.

  253. The utterance direction identification system using multiple microphones

    TAJIMA Makoto, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

    IEICE technical report. Speech 102 (335) 19-24 2002/09/17

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    This paper describes a system to identify the direction of user's keyword utterance for autonomous mobile robot. The robot is activated by the user's keyword utterance and identifies the speaker by face recognition. To capture the speaker's face within the camera's viewangle, the robot first have to identify the approximate direction of the utterance using acoustic information. To achieve the above-mentioned task, this system identifies the direction of keyword utterance by multiple microphone within the range of 45 degree. As this system is built into the mobile robot, hardware requirement is very tight due to battery restriction and space factor restriction. Therefore we developed the system which doesn't need expensive calculation The system was evaluated by recall and precision using several thresholds. From the experimental results it is found that the length of the keyword dominates the absolute threshold value. Using mora-by-mora threshold, more than 80% recall and precision was obtained.

  254. I-41 Extraction of The Motion Vector in The Motion Picture Using The Two-Dimensional Warping

    Saito Atsuko, Suzuki Motoyuki, Ito Akinori, Makino Shozo

    情報科学技術フォーラム一般講演論文集 2002 (3) 81-82 2002/09/13

    Publisher: Forum on Information Technology

  255. I-43 領域分割を用いたDPマッチングによるステレオ画像からの対応点検出(ステレオ・オプティカルフロー,I.画像認識・メディア理解)

    倉本 健介, 伊藤 彰則, 鈴木 基之, 牧野 正三

    情報科学技術フォーラム一般講演論文集 2002 (3) 85-86 2002/09/13

    Publisher: FIT(電子情報通信学会・情報処理学会)運営委員会

  256. English pronunciation learning system utilizing speaker adaptation by Japanese speech

    ITO Akinori, NAGASAWA Tadao, SUZUKI Motoyuki, MAKINO Shozo

    IEICE technical report. Speech 102 (159) 19-24 2002/06/20

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    This paper describes a computer-aided English learning system for Japanese speakers. The proposed system is composed of the following two subsystems: a pronunciation tutor to detect phoneme-level mispronunciations, and a prosody tutor which treats intonation and rhythm of the speech. The pronunciation tutor exploits VFS speaker adaptation technique to improve precision of phoneme labeling. On the adaptation, we developed a new adaptation scheme that used Japanese utterances to adapt English acoustic models. This method enables speaker adaptation for speakers who are not good at English pronunciation. The prosody tutor compares pitch pattern of native speakers' utterances and a student's one, and suggests how to improve intonation. In addition to intonation tutoring, the system compares duration of phrases between native speakers and a student. Evaluation experiments are carried out to compare native speakers' evaluation and the system's one against Japanese speakers' speech, and we obtained good correlation between the two evaluation, which shows that the proposed system can be as good teacher as native English speaker.

  257. Evaluation the maximum entorpy based trigger language model

    KISHIMOTO Yukinobu, KATOH Masaharu, ITO Akinori, KOHDA Masaki

    2002 (1) 157-158 2002/03/18

    ISSN: 1340-3168

  258. Speech recognition based on kana and kanji string

    KINNO H., KATOH M., ITO A., KOHDA M.

    2002 (1) 155-156 2002/03/18

    ISSN: 1340-3168

  259. Evaluation of MLLR Adaptation for Dialog Speech Recognition

    KATO Masaharu, ITO Akinori, KOHDA Masaki

    2002 (1) 135-136 2002/03/18

    ISSN: 1340-3168

  260. Erratum: Language modeling by stochastic dependency grammar for Japanese speech recognition (Systems and Computers in Japan (November 15, 2001) 32:12 (10-15))

    Ito, A., Hori, C., Katoh, M., Kohda, M.

    Systems and Computers in Japan 33 (3) 74-74 2002/03/01

    DOI: 10.1002/scj.1115  

    ISSN: 0882-1666

  261. Continuous speech recognition consortium -An open repository for CSR tools and models

    Akinobu Lee, Tatsuya Kawahara, Kazuya Takeda, Masato Mimura, Atsushi Yamada, Akinori Ito, Katsunobu Itou, Kiyohiro Shikano

    Proceedings of the 3rd International Conference on Language Resources and Evaluation, LREC 2002 1438-1441 2002/01/01

  262. Piecewise linear two-dimensional warping

    Akinori Ito, Chiori Hori, Masaharu Katoh, Masaki Kohda

    Systems and Computers in Japan 32 (12) 1-9 2001/11/15

    DOI: 10.1002/scj.1072  

    ISSN: 0882-1666

  263. Product Software of Continuous Speech Recognition Consortium -2000 version-

    KAWAHARA T, SUMIYOSHI T, LEE A, TAKEDA K, MIMURA M, ITO A, ITOU K, SHIKANO K

    IPSJ SIG Notes 2001 (100) 37-42 2001/10/19

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    Continuous Speech Recognition Consortium(CSRC) was founded last year under IPSJ SIG-SLP for forther enhancement of Japanese Dictation Toolkit that had been developed by the IPA project. An overview of the software developed in the first year(Oct. 2000-Sep. 2001) is given in this report. We have revised the LVCSR(large vocabulary continuous speech recognition) engine Julius, and constructed new acoustic models using very large speech corpora. Moreover, a. variety of acoustic and language models as well as toolkits are being set up. The software is currently available by contacting the address below.

  264. Performance improvement of LVCSR using vocal tract length normalization

    FUJITA Daisuke, KATOH Masaharu, ITO Akinori, KOHDA Masaki

    2001 (2) 3-4 2001/10/01

    ISSN: 1340-3168

  265. A Statistical Language Modeling Toolkit for word and class n-gram.

    ITO A., KOHDA M.

    2001 (1) 77-78 2001/03/01

    ISSN: 1340-3168

  266. Japanese Dictation Toolkit -1999 version-

    Tatsuya Kawahara, Akinobu Lee, Tetsunori Kobayashi, Kazuya Takeda, Nobuaki Minematsu, Shigeki Sagayama, Katsunobu Itoh, Akinori Ito, Mikio Yamamoto, Atsushi Yamada, Takehito Utsuro, Kiyohiro Shikano

    J. Acoustical Society of Japan 57 (3) 210-214 2001/03/01

    Publisher: The Acoustical Society of Japan

    DOI: 10.20697/jasj.57.3_210  

    ISSN: 0369-4232

  267. New state clustering of hidden markov network with Korean phonological rules for speech recognition

    SJ Oh, HY Chung, CJ Hwang, BK Kim, A Ito

    2001 IEEE FOURTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING 39-44 2001

  268. Optimization of the Parameter Set for Word Graph Generation

    KATOH Masaharu, SAIIN Toshinori, ITO Akinori, KOHDA Masaki

    IPSJ SIG Notes 2000 (119) 107-112 2000/12/21

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    Language model weight and insertion penalty greatly affects the recognition preformance of LVCSR system. In the multi-pass LCVSR system that uses word graphas an intermediate data structure, theses decorder parameter should be optimized in order to generate a good word graph. We proposed the rescoring based method that uses bigram LM insted of generating many word graphs for each parameter setting. As the rescoring is much faster than the re-generation of a word graph, the optimization time of the proposed method is much shorter. In this paper, we tested proposed method on Japanese News Article Sentences(ASJ-JNAS). When obtaied enough development data, the recognition performance is improved.

  269. Statistical Language Model Toolkit for Word and Class N-gram

    ITO Akinori, KOHDA Masaki

    IEICE technical report. Natural language understanding and models of communication 100 (521) 67-72 2000/12/15

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    This paper describes a statistical language model toolkit for word and class-based n-gram. This toolkit has command-level compatibility with CMU-Cambridge SLM Toolkit, and supports ARPA-style language model. Furthermore, the toolkit supports class n-gram and n-gram count mixture as well as combined language model using linear interpolation. As the language model combination is supported within the API level, the SLM library in this toolkit enables any tool to exploit the LM combination. To demonstrate the potential of the toolkit, several language models are created from six-year Mainichi Shimbun database. We evaluated verious combination of word n-gram and POS n-gram, and we found that the combination of word trigram and POS trigram reasonably improves the perplexity.

  270. Optimization of the Parameter Set for Word Graph Generation

    KATOH Masaharu, SAIIN Toshinori, ITO Akinori, KOHDA Masaki

    IEICE technical report. Natural language understanding and models of communication 100 (520) 107-112 2000/12/14

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    Language model weight and insertion penalty areatly affects the recognition preformance of LVCSR system. In the multi-pass LCVSR system that uses word graphas an intermediate data structure, theses decorder parameter should be optimized in oeder to generate a good word graph. We proposed the rescoring based method that uses bigram LM insted of generating many word graphs for each parameter setting. As the rescoring is much faster than the re-generation of a word graph, the optimization time of the proposed method is much shorter. In this paper, we tested proposed method on Japanese News Article Sentences (ASJ-JNAS). When obtaied enough development data, the recognition performance is improved.

  271. Changes in fruit quality as influenced by shading of netted melon plants (Cucumis melo L. 'Andesu' and 'Luster')

    Nishizawa, T., Ito, A., Motomura, Y., Ito, M., Togashi, M.

    Journal of the Japanese Society for Horticultural Science 69 (5) 563-569 2000/10/26

    DOI: 10.2503/jjshs.69.563  

    ISSN: 1882-3351

  272. Optimaization of the parameter set for word graph generation

    KATOH Masaharu, SAIIN Toshinori, ITO Akinori, KOHDA Masaki

    2000 (2) 33-34 2000/09/01

    ISSN: 1340-3168

  273. w3m: a pager/text-based WWW browser

    Akinori Ito

    bit 32 (9) 28-33 2000/09

    Publisher: Kyoritsu Shuppan Co. Ltd.

    ISSN: 0385-6984

  274. A Study on MLLR-Based Speaker Models Using for Speaker Verification

    KATOH Masaharu, KANOU Junya, ITO Akinori, KOHDA Masaki

    IEICE technical report. Speech 100 (137) 25-32 2000/06/16

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    In this paper, we described text prompted speaker verification system. In this system, speaker specific models are trained by the Maximum Likelihood Linear Regression based adaptation that is used speech recognition. Regression classes are designed as a tree structure and selected automatically based upon the size of training data. We compared the following two criteria for cluster selection-the amount of frames and the Minimum Description Length(MDL)principle. And, we research the MAP adaptation with them. Experimental results show that applying the MAP adaptation after the MLLR-MDL adaptation is significant improvement on the verification performance. We also apply the SAT compact models insted of SI models. The SAT compact model is better when training data and testing data are recorded in different sessions.

  275. Language Modeling by an Ergodic HMM based on an N-gram

    ITO Akinori, SAITO Hideki, KATOH Masaharu, KOHDA Masaki

    IEICE technical report. Speech 100 (137) 67-74 2000/06/16

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    This paper proposes a new language model based on an Ergodic HMM. This model is created by extending a deterministic finite state automaton equivalent to an n-gram into a nondeterministic one. We call the proposed model "Hidden State N-gram(HS-ngram)." We carried out experiments to compare the perplexity of n-gram and that of HS-ngram. The result showed that the proposed models(SH-bigram and HS-trigram)gave lower perplexity than the original model. From the rescoring experiment from the recognition result of an LVCSR system, HS-trigram slightly outperformed trigram model.

  276. Optimization of language model weight and insertion penalty for word graph generation

    SAIIN Toshinori, KATOH Masaharu, ITO Akinori, KOHDA Masaki

    IEICE technical report. Speech 100 (137) 75-82 2000/06/16

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    Language model weight and insertion penalty greatly affects the recognition performance of LVCSR system. In the multi-pass LVCSR system that uses word graph as an intermediate data structure, these decoder parameters should be optimized in order to generate a good word graph. In this paper, a new method to optimize these parameter is proposed. This method uses rescoring of the word graph using bigram LM instead of generating many word graphs for each parameter setting. As the rescoring is much faster than the re-generation of a word graph, the optimization time of the proposed method is much shorter than that of re-generation based one. However, as the method minimizes the first-pass WER, improvement of the second-pass WER is not garanteed. From the experimental result for the newspaper task, it is found that the proposed method doesn't only improve the first-pass WER but also improves the second-pass WER in most case.

  277. Evaluation of Japanese Dictation Toolkit : 1999 version

    Kawahara T, Lee A, Kobayashi T, Takeda K, Minematsu N, Sagayama S, Itou K, Ito A, Yamamoto M, Yamada A, Utsuro T, Shikano K

    IPSJ SIG Notes 2000 (54) 9-16 2000/06/02

    Publisher: 一般社団法人情報処理学会

    ISSN: 0919-6072

    More details Close

    A sharable software repository for Japanese LVCSR (Large Vocabulary Continuous Speech Recognition) is introduced. It has been developed under collaboration of researchers of different academic institutes in Japan. The platform consists of a standard recognition engine, Japanese phone models and Japanese statistical language models as well as Japanese morphological analysis tools. As an integrated system of these modules, we have implemented a baseline 20000-word and 60000-word dictation system and evaluated various components. The software repository is freely available to the public.

  278. Language modeling using ergodic HMM based on trigram

    SAITOH Hideki, KATOH Masaharu, ITO Akinori, KOHDA Masaki

    2000 (1) 51-52 2000/03/01

    ISSN: 1340-3168

  279. Optimization of language model weight and insertion penalty for word graph generation

    SAIIN Toshinori, OKA Naoki, KATOH Masaharu, ITO Akinori, KOHDA Masaki

    2000 (1) 47-48 2000/03/01

    ISSN: 1340-3168

  280. Task adaptation using part-of-speech tag and high frequency word N-gram

    OGASAWARA Norimitsu, KATOH Masaharu, ITO Akinori, KOHDA Masaki

    2000 (1) 75-76 2000/03/01

    ISSN: 1340-3168

  281. A study on MDL criterion based regression cluster setting for MLLR adaptation

    KANOU Junya, KATOH Masaharu, ITO Akinori, KOHDA Masaki

    2000 (1) 103-104 2000/03/01

    ISSN: 1340-3168

  282. Language modeling by stochastic dependency grammar for Japanese speech recognition

    Akinori Ito, Chiori Hori, Masaharu Kotow, Masaki Kohda

    6th International Conference on Spoken Language Processing, ICSLP 2000 2000/01/01

  283. IPA Japanese dictation free software project

    Katsunobu Itou, Kiyohiro Shikano, Tatsuya Kawahara, Kazuya Takeda, Atsushi Yamada, Akinori Ito, Takehito Utsuro, Tetsunori Kobayashi, Nobuaki Minematsu, Mikio Yamamoto, Shigeki Sagayama, Akinobu Lee

    2nd International Conference on Language Resources and Evaluation, LREC 2000 2000/01/01

  284. Free software toolkit for Japanese large vocabulary continuous speech recognition

    Tatsuya Kawahara, Akinobu Lee, Tetsunori Kobayashi, Kazuya Takeda, Nobuaki Minematsu, Shigeki Sagayama, Katsunobu Itou, Akinori Ito, Mikio Yamamoto, Atsushi Yamada, Takehito Utsuro, Kiyohiro Shikano

    6th International Conference on Spoken Language Processing, ICSLP 2000 2000/01/01

  285. Study on Large Vocabulary Continuous Speech Recognition with a phoneme graph based hypothesis restriction

    OKA Naoki, KATOH Masaharu, ITO Akinori, KOHDA Masaki

    IEICE technical report. Natural language understanding and models of communication 99 (524) 67-72 1999/12/21

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

  286. Study on Large Vocabulary Continuous Speech Recognition with a phoneme graph based hypothesis restriction

    OKA Naoki, KATOH Masaharu, ITO Akinori, KOHDA Masaki

    IPSJ SIG Notes 1999 (108) 199-204 1999/12/20

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    In this paper, we study about fast search strategies for large vocabulary continuous speech recognition (LVCSR). Many fast search strategies have been proposed until. In [2], we proposed a new search strategy with a phoneme graph based hypothesis retriction, which efficiently reduces the search space. For 5000-word task, exprimental results showed that the method can reduce 70 % of the elapsed time without any error increasing. For further faster search, we incorporated 1-phoneme look-ahead technique into phoneme graph generation. We evaluate the proposed method with 20000-word Japanese newspaper task. Expremental results show that the method can reduce about 60 % of the elapsad time without error rate increasing.

  287. A study on MLLR adapted speaker model for speaker verification

    KANOU Junya, KATOH Masaharu, ITO Akinori, KOHDA Masaki

    IPSJ SIG Notes 1999 (108) 55-60 1999/12/20

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    In this paper, we propose a method to make automatically the regression cluster corresponding to the amount of adaptation data by MDL criterion. Claimant speaker models are made by MLLR adaptation. To increase the number of regression clusters, we use a tree structure. It is made with top-down clustering based on acoustic distance. The MDL criterion is compared with the frame threshold criterion and fixed regression clusters criterion. In the experiment on the text-prompted speaker verification, MDL criterion becomes the repression of cluster division, and the most suitable number of cluster corresponding to the amount of adaptation data is chosen.

  288. A study on MLLR adapted speaker model for speaker verification

    KANOU Junya, KATOH Masaharu, ITO Akinori, KOHDA Masaki

    IEICE technical report. Natural language understanding and models of communication 99 (523) 55-60 1999/12/20

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

  289. Fast and Robust Optimization of Language Model Weight and Insertion Penalty from N - best Candidates

    ITO Akinori, KOHDA Masaki

    IPSJ SIG Notes 1999 (91) 35-40 1999/10/29

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    An LVCSR system has many parameters to be optimized. In this paper, we investigate several issues about language model weight and word insertion penalty. From recognition results obtained by changing these parameters, we found three important observations. The first one was that the optimum point of these parameter values depended to the test set for the optimization. The second one was that the parameter space had many local optimum, which meant that one had to try all points in the parameter space to find the global optimum point. The third one was that the potential increment of WER in suboptimum region of the parameter space was about 2%. Based on these observations, We propose three new methods to optimize language model weight and insertion penalty. Firstly, a new method is proposed to preselect n-best candidates for n-best rescoring based parameter optimization. Secondly, a method to choose robust parameter setting is proposed. This method splits a development test set into several sets. Accoding to the optimization results for each set, This method choosed the optimum point by considering the average of WER as well as its variances. Finally, a method to find sub-optimum parameter setting is proposed. This optimization is based on neighborhood search, and it finds a parameter setting rapidly.

  290. A Report on Eurospeech99 and IEEE Multimedia Signal Processing Workshop

    NAKAMURA Satoshi, OKAWA Shigeki, ITOH Akinori, TAMOTO Masafumi, MIZUNO Hideyuki, UNOKI Masashi, TOKUDA Keiichi, KABURAGI Tokihiko, HATAOKA Nobuo

    IPSJ SIG Notes 1999 (91) 21-28 1999/10/29

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    This paper summarizes the topics in ESCA Eurospeech99 held at Budapest, Hungary, from Sep. 5 to Sep. 9, 1999 and in IEEE Multimedia Signal Processing Workshop held at Helsinger, Denmark, from Sep. 13 to Sep. 15, 1999.

  291. A study on MLLR adapted speaker model for speaker verification

    KANOU Junya, KATOH Masaharu, ITO Akinori, KOHDA Masaki

    1999 (2) 49-50 1999/09/01

    ISSN: 1340-3168

  292. Fast optimization of language model weight and insertion penalty using n-best candidate

    ITO A., KOHDA M.

    1999 (2) 65-66 1999/09/01

    ISSN: 1340-3168

  293. A Study on Increase of Performance Based on Combine Multiple Recognizer Output

    KATOH Masaharu, ITO Akinori, KOHDA Masaki

    1999 (2) 85-86 1999/09/01

    ISSN: 1340-3168

  294. A new metric language model evaluation based on likelihood gain

    ITO A., KOHDA M.

    1999 (2) 73-74 1999/09/01

    ISSN: 1340-3168

  295. Language modeling using ergodic HMM based on bigram

    SAITOH Hideki, ITO Akinori, KATOH Masaharu, KOHDA Masaki

    1999 (2) 101-102 1999/09/01

    ISSN: 1340-3168

  296. A metric based on likelihood difference for n-gram language model evaluation

    ITO Akinori, KOHDA Masaki, OSTENDORF Mari

    IEICE technical report. Speech 99 (121) 95-102 1999/06/18

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    Perplexity has been widely used as an evaluation metric of stochastic language model. Recently, several papers reported that correlation between perplexity and word error rate was poor when complicated language models were used, such as mixture model. In this paper, a new metric for n-gram language model is proposed, that is intended to substitute perplexity. The major difference of the proposed metric from perplexity is that, while perplexity utilizes probabilities of word occurences in the evaluation text, the proposed metric accumulates differences of linguistic scores between a word in the evaluation text and the maximum score available in that context. A sigmoid-like nonlinear function is applied to the score difference and the average of that values is calculated. Applying the nonlinear function suppresses the effect of language score difference that does not affect word errer rate improvement. Correlation between the proposed metric and word accuracy was investigated for a speech recognition simulator and real speech recognizer. The result proved that the proposed metric had higher correlation between word accuracy than perplexity.

  297. Construction and Evaluation of Language Models Based on Stochastic Context Free Grammar for Speech Recognition

    HORI Chiori, KATOH Masaharu, ITO Akinori, KOHDA Masaki

    IEICE technical report. Speech 99 (121) 79-86 1999/06/18

    Publisher: The Institute of Electronics, Information and Communication Engineers

    ISSN: 0913-5685

    More details Close

    It is well known that Stochastic Context Free Grammar (SCFG) is a very effective language model since it can express not only local constraints like an N-gram, but also global constraints over a whole sentence. However, to estimate parameters of an SCFG, the Inside-Outside algorithm has to be used, which needs huge computation in proportion to the cude of the number of non-terminal symbols and the length of the input sequences. Therefore, the SCFG has rarely been used for speech recognition. In this paper, we propose a new SCFG to which phrasebased dependency grammar is applied to decrease the huge computation. In the test using the EDR corpus, we compared the proposed model with the other types of SCFGs in terms of perplexity and computational amount. We constructed a large-scale SCFG using the Mainichi news corpus, and compared it with trigram for a 5,000-word Japanese newspaper reading task.

  298. Japanese Dictation Toolkit -1997 version-

    Tatsuya Kawahara, Akinobu Lee, Tetsunori Kobayashi, Kazuya Takeda, Nobuaki Minematsu, Katsunobu Itoh, Akinori Ito, Mikio Yamamoto, Atsushi Yamada, Takehito Utsuro, Kiyohiro Shikano

    J. Acoustical Society of Japan 55 (3) 175-180 1999/03/01

    Publisher: The Acoustical Society of Japan

    DOI: 10.20697/jasj.55.3_175  

    ISSN: 0369-4232

  299. Japanese Dictation Toolkit -1997 version

    Tatsuya Kawahara, Akinobu Lee, Tetsunori Kobayashi, Kazuya Takeda, Nobuaki Minematsu, Katsunobu Itou, Akinori Ito, Mikio Yamamoto, Atsushi Yamada, Takehito Utsuro, Kiyohiro Shikano

    Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi) 20 (3) 233-239 1999

    DOI: 10.1250/ast.20.233  

    ISSN: 0388-2861

  300. A Study on A Phoneme-Graph-based Hypothesis Restriction for Large Vocabulary Continuous Speech Recognition

    HORI Takaaki, OKA Naoki, KATOH Masaharu, ITO Akinori, KOHDA Masaki

    IEICE technical report. Natural language understanding and models of communication 98 (461) 25-32 1998/12/11

    Publisher: The Institute of Electronics, Information and Communication Engineers

    More details Close

    In this paper, we study about fast search strategies for Large Vocabulary Continuous Speech Recognition(LVCSR), and propose a new method-a phoneme-graph-based hypothesis restriction, which effectually prunes the search space. In the proposed method, a phoneme graph is generated at the pre-processing stage, and then the best word sequence is searched while restricting expansion of hypotheses using the information of the phoneme graph at the main recognition stage. The phoneme-graph-based restriction consists of the limitation of phoneme boundaries and the Forward-Backward Pruning, which enable to reduce the search space dramatically. The proposed method was tested on a 5,000-word Japanese newspaper reading task. The experimental results show that this method can reduce about 70% of the elapsed time without any error increasing.

  301. A Study on A Phoneme -Graph- based Hypothesis Restriction for Large Vocabulary Continuous Speech Recognition

    HORI Takaaki, OKA Naoki, KATOH Masaharu, ITO Akinori, KOHDA Masaki

    IPSJ SIG Notes 1998 (114) 113-120 1998/12/10

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    In this paper, we study about fast search strategies for Large Vocabulary Continuous Speech Recognition(LVCSR), and propose a new method-a phoneme-graph-based hypothesis restriction, with effectually prunes the search space. In the proposed method, a phoneme graph is generated at the pre-processing stage, and then the best word sequence is searched while restricting expansion of hypotheses using the information of the phoneme graph at the main recognition stage. The phoneme-graph-based restriction consists of the limitation of phoneme boundaries and the Forward-Backward Pruning, which enable to reduce the search space dramatically. The proposed method was tested on a 5, 000-word Japanese newspaper reading task. The experimental results show that this method can reduce about 70% of the elapsed time without any error increasing.

  302. A study on a large vocabulary continuous speech recognition system with a state clustering-based HM-Net

    HORI Takaaki, OKA Naoki, KATOH Masaharu, ITO Akinori, KOHDA Masaki

    1998 (2) 95-96 1998/09/01

    ISSN: 1340-3168

  303. Evaluation of N-gram language models trained on newspaper corpus by speech recognition experiments

    KAMEYAMA Yoshihiro, KATOH Masaharu, ITO Akinori, KOHDA Masaki

    1998 (2) 73-74 1998/09/01

    ISSN: 1340-3168

  304. ここまでできるぞ音声/言語処理技術 : 音声編

    新田恒雄, 小林哲則, 鹿野清宏, 武田一哉, 河原達也, 伊藤克亘, 峯松信昭, 伊藤彰則, 宇津呂武仁, 山本幹雄, 山田篤, 西村雅史, 甲斐充彦, 中川聖一, 服部浩明, 阿部匡伸, 松浦博

    情報処理学会研究報告. SLP, 音声言語情報処理 98 (49) 9-16 1998/05/28

    Publisher: 社団法人情報処理学会

    ISSN: 0919-6072

    More details Close

    マルチメディア時代が到来し, 様々なサービス提供が始まっている。本報告では, 今後, ますます重要性を増す音声インタフェース技術に焦点をあて, 音声認識および音声合成を中心とした最新技術を紹介している。内容は, 音声認識技術として, 日本語ディクテーションソフトウエア, Web検索ソフトウエア, 大語彙音声認識チップを, また音声合成技術として, 音声コンテンツ制作支援ツール, テキスト-音声変換ソフトウエアから成る。

  305. SIG - SLP/SIG - NL Joint Session "Recent Advances in Speech and Language Processing Technologies" -Speech Processing Technologies-

    NITTA Tsuneo, KOBAYASHI Tetsunori, SHIKANO Kiyohiro, TAKEDA Kazuya, KAWAHARA Tatsuya, ITOU Katunobu, MINEMATSU Nobuaki, ITO Akinori, UTSURO Takehito, YAMAMOTO Mikio, YAMADA Atsushi, NISHIMURA Masafumi, KAI Mitsuhiko, NAKAGAWA Seiichi, HATTORI Hiroaki, ABE Masanobu, MATSU'URA Hiroshi

    IPSJ SIG Notes 1998 (48) 9-16 1998/05/28

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    Computer-human interaction by voice is one of the most important technology in the coming multimedia era. In this report, we introduce recent advances in speech processing technologies through focussing both of speech recognition and speech synthesis. Contents are : Japanese dictation software, Web-page retrieval software, large-vocabulary speech recognition chips, a speech editing tool for designing multimedia applications, and TTS (Text-To-Speech) software for PCs.

  306. Evaluation of Japanese Dictation ToolKit -1997 version-

    KAWAHARA Tatsuya, LEE Akinobu, KOBAYASHI Tetsunori, TAKEDA Kazuya, MINEMATSU Nobuaki, ITOU Katsunobu, ITO Akinori, YAMAMOTO Mikio, YAMADA Atsushi, UTSURO Takehito, SHIKANO Kiyohiro

    IPSJ SIG Notes 1998 (48) 109-114 1998/05/28

    Publisher: Information Processing Society of Japan (IPSJ)

    More details Close

    The project of developing LVCSR (Large Vocabulary Continuous Speech Recognition) platform is introduced. It is a collaboration of researchers of different academic institutes and intended to develop a sharable software repository of not only databases but also models and programs. The platform consists of a standard recognition engine, Japanese phone models and Japanese statistical language models. As an integrated system of these modules, we have implemented a baseline 500-word dictation system and evaluated various components. The software repository is available to the public.

  307. Evaluation of N-gram task adaptation by speech recognition simulation

    ITO Akinori, KOHDA Masaki

    1998 (1) 43-44 1998/03/01

    ISSN: 1340-3168

  308. Effect of Cut-off and Learning Text on The Language Model from The Newspaper Corpus

    KAMEYAMA Yoshihiro, KATOH Masaharu, ITO Akinori, KOHDA Masaki

    1998 (1) 49-50 1998/03/01

    ISSN: 1340-3168

  309. A Study on Word Spotting based on Likelihood Normalization Using Phoneme HMMs

    KATOH Masaharu, HORI Takaaki, ITO Akinori, KOHDA Masaki

    IEICE technical report. Natural language understanding and models of communication 97 (440) 9-14 1997/12/12

    Publisher: The Institute of Electronics, Information and Communication Engineers

    More details Close

    In recent speech recognition, hidden Markov model (HMM) has been useful. We consider likelihood score of HMMs from a point of theory of probability. In continuous speech recognition, each hypothesis will have different length and position of speech segment. It affects the system performance by comparing the HMMs' scores directly. In this paper, we describe normalization of likelihood based on Bayes' theorem. To normalize likelihood, we use connected phoneme HMMs that allow Japanese syllable rule. In this method, we need no additional calculation to get scores, and we need no models except phoneme HMMs to the system. We apply it to the word-spotting, and obtain significant improvement of system performance.

  310. A Study on A State Clustering - Based Topology Design Method for HM - Nets

    HORI Takaaki, KATOH Masaharu, ITO Akinori, KOHDA Masaki

    IPSJ SIG Notes 1997 (120) 47-52 1997/12/11

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    A Hidden Markov Network (HM-Net) is a highly accurate and robust acoustic model which represents a tied-state structure of context dependent Hidden Markov Models as a network. A Successive State Splitting (SSS) method and its improved ones have been already proposed to generate HM-Nets. However, there is a common problem in these algorithms. The problem is that much amount of computation is required when large amount of training data is used, because state splitting and parameter estimation are repeated using the training data. Although topologies of HM-Nets are usually designed with a part of training data and then only their output density distributions are estimated with all of the data, HM-Nets with large-scale topologies for large vocabulary continuous speech recognition (LVCSR) cannot be derived. In this paper, we propose a state clustering-based rapid topology design method to generate high accuracy HM-Nets for LVCSR. In continuous phoneme recognition experiments, it is shown that the proposed method is a fast algorithm and can generate HM-Nets equivalent to ones designed by conventional methods when the same training data is used.

  311. Common Platform of Japanese Large Vocabulary Continuous Speech Recognition Research -Speech Recognizer Design-

    KAWAHARA Tatsuya, LEE Akinobu, ITOU Katsunobu, KOBAYASHI Tetsunori, ITO Akinori, UTSURO Takehito, SHIMIZU Toru, TAMOTO Masafumi, ARAI Kazuhiro, MINEMATSU Nobuaki, YAMAMOTO Mikio, TAKEZAWA Toshiyuki, TAKEDA Kazuya, MATSUOKA Tatsuo, SHIKANO Kiyohiro

    IPSJ SIG Notes 1997 (101) 1-6 1997/10/24

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    For Japanese large vocabulary continuous speech recognition (LVCSR) research, we are developing standard baseline software repository that includes language models, acoustic models and recognition engines. In this report, specifications and algorithms of the speech recognizer currently designed are described.

  312. Common Platform of Japanese Large Vocabulary Continuous Speech Recognition Research -Developement of text corpus-

    ITOU Katunobu, ITO Akinori, UTSURO Takehito, KAWAHARA Tatsuya, KOBAYASHI Tetsunori, SHIMIZU Toru, TAMOTO Masafumi, ARAI Kazuhiro, MINEMATSU Nobuaki, YAMAMOTO Mikio, TAKEZAWA Toshiyuki, TAKEDA Kazuya, MATSUOKA Tatsuo, SHIKANO Kiyohiro

    IPSJ SIG Notes 1997 (101) 7-12 1997/10/24

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    For Japanese large vocabulary continuous speech recognition (LVCSR) research, we are developing standard baseline software repsitory that includes language models, acoustic models and recognition engines. In this report, design and specification of the text corpus are described.

  313. Common Platform of Japanese Large Vocabulary Continuous Speech Recognition Research -Construction of Acoustic Model-

    TAKEDA Kazuya, MINEMATSU Nobuaki, ITO Akinori, ITOU Katsunobu, UTSURO Takehito, KAWAHARA Tatsuya, KOBAYASHI Tetsunori, SHIMIZU Toru, TAMOTO Masafumi, ARAI Kazuhiro, YAMAMOTO Mikio, TAKEZAWA Toshiyuki, MATSUOKA Tatsuo, SHIKANO Kiyohiro

    IPSJ SIG Notes 1997 (101) 13-18 1997/10/24

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    For Japanese large vocabulary continuous speech recognition (LVCSR) research, we are developing standard baseline software repository that includes language models, acoustic models and recognition engines. In this report, construction of acoustic models is discussed.

  314. A Study of Language Modeling using Stochastic Context Free Grammar with "Dependency Grammar"

    YAGINUMA Masanobu, KATOH Masaharu, ITO Akinori, KOHDA Masaki

    IEICE technical report. Natural language understanding and models of communication 97 (330) 33-40 1997/10/17

    Publisher: The Institute of Electronics, Information and Communication Engineers

    More details Close

    In this paper, we propose a language model using stochastic context free grammar (SCFG) for speech recognition. To train an SCFG, inside-outside (I/O) algorithm is used. We modified I/O algorithm to treat with dependency grammar. To express dependency grammar, two categories of word are introduced : functional words (particle, auxiliary, suffix, etc.) and content words (noun, verb, adjective, etc.). Using dependency grammar, training time is reduced from the cube of number of nonterminal symbols to the square of it. We carried out an experiment to compare the proposed method with two conventional methods : the trigram model and the original SCFG model. We obtained significant reduction of training time compared with the original SCFG. The perplexity of the proposed model was smaller than other two models. Furthermore, we researched initial values to reduce training time andimprove performance.

  315. On the effect of vocabulary size on N-gram task adaptation

    ITO Akinori, KOHDA Masaki

    1997 (2) 61-62 1997/09/01

    ISSN: 1340-3168

  316. Study of initial values for language modeling using stochastic context free grammar

    YAGINUMA Masanobu, KATOH Masaharu, ITO Akinori, KOHDA Masaki

    1997 (2) 51-52 1997/09/01

    ISSN: 1340-3168

  317. A Study on Word Spotting using Phoneme HMMs based Likelihood Normalization

    KATOH Masaharu, HORI Takaaki, ITO Akinori, KOHDA Masaki

    1997 (2) 79-80 1997/09/01

    ISSN: 1340-3168

  318. On the vocabulary size for N-gram task adaptation

    ITO Akinori, KOHDA Masaki

    IEICE technical report. Speech 97 (115) 51-58 1997/06/20

    Publisher: The Institute of Electronics, Information and Communication Engineers

    More details Close

    While N-gram language model requires large corpus for good probability estimation, it is often difficult to gather large number of samples for a specific task domain. This paper describes task adaptation technique to make N-gram model for the specific domain from a task independent large corpus (TI text) and a task specific small corpus (AD text). Simple weighted mixture is employed to mix two corpora. This paper first points out the relationship between weighted mixture method and MAP/Bayes eatimation. Next, the effect of vocabulary restriction is investigated. As the TI text has many words which don't appear in the object task, perplexity of the model decreases by replacing these words to "unknown" symbol. In this paper, it is shown that perplexity of the model can be reduced by the vocabulary restriction and the vocabulary sizes of TI and AD texts must be determined individually.

  319. Reading Japanese Corpus using N-gram.

    ITO A, MANZAKI H, KATOH M, KOHDA M

    1997 (1) 9-10 1997/03/01

    ISSN: 1340-3168

  320. A Study on Improvement of HM-Nets using Decision Tree-based Successive State Splitting

    HORI Takaaki, KATOH Masaharu, ITO Akinori, KOHDA Masaki

    IEICE technical report. Natural language understanding and models of communication 96 (420) 17-24 1996/12/13

    Publisher: The Institute of Electronics, Information and Communication Engineers

    More details Close

    The important aspects of context-dependent acoustic modeling using a limited training data set for robust speech reognition are how to tie the model parameters and how to handle the unknown contexts. From this point of view, we proposed the Decision Tree-based Successive State Splitting algorithm (DT-SSS), and showed HM-Nets generated with this algorithm had high accuracy and enabled to represent any contexts. But this algorithm was not taken temporal splits into consideration, and therefore did not make the best use of the strong point of SSS. In this paper, we incorporate temporal splits into DT-SSS and generate HM-Nets from various initial models. In continuous phoneme reognition experiments, we show the effects of these improvements.

  321. A study on word preselection using HMM state sequence

    KATOH Masaharu, HORI Takaaki, ITO Akinori, KOHDA Masaki

    1996 (2) 87-88 1996/09/01

    ISSN: 1340-3168

  322. Study on the adaptation of a stochastic language model using small corpus.

    ITO Akinori, KOHDA Masaki

    1996 (2) 37-38 1996/09/01

    ISSN: 1340-3168

  323. A Study on HM-Net using Successive State Splitting based on Phonetic Decision Tree

    HORI Takaaki, KATOH Masaharu, ITO Akinori, KOHDA Masaki

    IEICE technical report. Speech 96 (93) 15-22 1996/06/14

    Publisher: The Institute of Electronics, Information and Communication Engineers

    More details Close

    The important aspects of context-dependent acoustic modeling using a limited training data set for robust speech recognition are how to tie the model parameters and how to handle the unknown contexts. The Successive State Splitting algorithm (SSS) is a good method which design the topology of tied-state HMMs automatically, but it doesn't cover unknown contexts adequately and also has some problems in the contextual splits. In this paper, we propose a new SSS algorithm which includes the contextual splits based on the phonetic decision tree. This method is able to generate high accurate HM-Nets which can represent any contexts. In continuous phoneme recognition experiments, it is shown that the proposed method is effective.

  324. A Study on Utilizing to Word Preselection Using Optimal Phonemes Sequence

    KATOH Masaharu, ITO Akinori, KOHDA Masaki

    IEICE technical report. Speech 96 (92) 9-14 1996/06/13

    Publisher: The Institute of Electronics, Information and Communication Engineers

    More details Close

    In this paper, a fast word preselection method is proposed for HMM-based word recognition. In this method, candidate words are selected using phoneme-based matching. First, phoneme recognition is carried out on the input speech and an optimal phoneme sequence is recognized. Then phoneme DP matching is executed to choose word candidates. Finally, the word candidates are verified frame-by-frame using subword HMM. We evaluated the proposed method by 15,000 word recognition experiment. When 150 word candidates (1% of total vocabulary) were selected, the omission rate was less than 1%. Compared with full search algorithm, the proposed method took only 4.6% of CPU time and the number of comparison operation was 8.6%. We also carried out an experiment which investigated the performance of the method with simplified HMM.

  325. N - gram estimation from Japanese large corpus and task adaptation of N - gram

    ITO Akinori, DAISHIMA Naoto, MARUYAMA Atsushi, KATOH Masaharu, KOHDA Masaki

    IPSJ SIG Notes 1996 (55) 25-30 1996/05/27

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    N-gram language models were constructed from EDR corpus, 5-million-word Japanese corpus. The models were investigated under various conditions about training text size, vocabulary and cut-off condition. The result of the experiments clarified the optimum condition under a certain training text size. We carried out another experiments about task adaptation. An N-gram model from a dialog wag mixed with the N-gram from EDR corpus, which made about 60% reduction of perplexity

  326. N-gram Language Model by String Pattern and Pattern Class

    Ito Akinori, Kohda Masaki

    Proceedings of the IEICE General Conference 1996 (1) 345-346 1996/03/11

    Publisher: The Institute of Electronics, Information and Communication Engineers

  327. A study on utilizing to preprosess using optimal phonemes sequence.

    KATOH Masaharu, ITO Akinori, KOHDA Masaki

    1996 (1) 79-80 1996/03/01

    ISSN: 1340-3168

  328. Language Modelling by String Pattern and Pattern Class N-gram.

    ITO Akinori, KOHDA Masaki

    1996 (1) 193-194 1996/03/01

    ISSN: 1340-3168

  329. 対話音声認識のための事前タスクの適応の検討

    伊藤彰則

    信学技報,SP96-81 1996

  330. The performance prediction on sentence recognition using a finite state word automaton

    T Otsuki, A Ito, S Makino, T Ohtomo

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E79D (1) 47-53 1996/01

    ISSN: 0916-8532

  331. Language modeling by string pattern N-gram for Japanese speech recognition

    A Ito, M Kohda

    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4 1 490-493 1996

  332. Language Modelling by String Pattern N-gram

    ITO Akinori, KOHDA Masaki

    IEICE technical report. Natural language understanding and models of communication 95 (429) 19-24 1995/12/15

    Publisher: The Institute of Electronics, Information and Communication Engineers

    More details Close

    Markov model based language models (N-gram) are popular among sentence/dialog speech recognition. On applying these models to Japanese speech recognition, one has to decide what to be a unit of N-gram. As Japanese sentence is not divided into words, the morphemic analysis is required before word-by-word processing. But it is difficult to get the precise analysis automatically for spontaneous speech transcription. In this paper, we propose several language models which enable fully automatic construction of the model. We examined three types of models: N-gram by string pattern, N-gram by automatic morphemic analysis and string pattern class N-gram. These models were compared by perplexity. From the experimental results, the string pattern class N-gram got better performance than morpheme N-gram.

  333. Language Modelling by String Pattern N - gram

    ITO Akinori, KOHDA Masaki

    IPSJ SIG Notes 1995 (120) 105-112 1995/12/14

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    Markov model based language models (N-gram) are popular among sentence/dialog speech recognition. On applying these models to Japanese speech recognition, one has to dicide what to be a unit of N-gram. As Japanese sentence is not divided into words, the morphemic analysis is required before word-by-word processing. But it is difficult to get the precise analysis automatically for spontaneous speech transcription. In this paper, we propose several language models which enable fully automatic construction of the model. We examined three types of models : N-gram by string pattern, N-gram by automatic morphemic analysis and string pattern class N-gram. These models were compared by perplexity. From the experimental results, the string pattern class N-gram got better performance than morpheme N-gram.

  334. Automatic generation of Japanese Bunsetsu structure represented

    ITO Akinori, KOHDA Masaki

    1995 (2) 19-20 1995/09/01

    ISSN: 1340-3168

  335. SuperTAINS: Tohoku University Network realizes multimedia applications through sub-giga network

    Yukiyoshi Kameyama, Akinori Ito, Hiroaki Kobayashi

    Computer and Network LAN 13 (6) 114-120 1995/06

    Publisher: Ohmsha

  336. A NEW HMNET CONSTRUCTION ALGORITHM REQUIRING NO CONTEXTUAL FACTORS

    M SUZUKI, S MAKINO, A ITO, H ASO, H SHIMODAIRA

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E78D (6) 662-668 1995/06

    ISSN: 0916-8532

  337. On a Bunsetsu structure model with several constraints for speech recognition

    ITO Akinori, MAKINO Shozo

    IPSJ SIG Notes 1995 (51) 43-50 1995/05/25

    Publisher: Information Processing Society of Japan (IPSJ)

    ISSN: 0919-6072

    More details Close

    This paper describes a bunsetsu (phrase) model for Japanese spontaneous speech recognition. This model is represented as a finite automaton which covers almost all expressions in dialog transcriptions in the ASJ continuous speech corpus. This model contains 3386 conceptual words and 615 functional words. Next, stochastic language models are combined with the bunsetsu model. Two types of stochastic models are investigated : a stochastic regular grammar and a N-gram model. When combined with the bunsetsu model, a bigram model gets smaller perplexity. Finally, several attributes are introduced the bunsetsu model to express constraints between distant words in a phrase. The finite automaton model with attributes is automatically converted to a finite automaton without attributes, which can be easily used for conventional speech recognition schemes.

  338. On introducing several grammatical constraints into a bunsetsu structure model for spoken dialog recognition

    ITO Akinori, MAKINO Shozo

    1995 (1) 183-184 1995/03/01

    ISSN: 1340-3168

  339. 対話音声認識のための事前タスク適応の検討

    伊藤彰則

    信学技報NLC96-50,SP96-81 1995

  340. Performance prediction of word recognition using the probability of word occurrence

    Takashi Otsuki, Teruhiko Otomo, Akinori Ito, Shozo Makino

    Electronics and Communications in Japan (Part III: Fundamental Electronic Science) 78 (3) 10-19 1995

    DOI: 10.1002/ecjc.4430780302  

    ISSN: 1520-6440 1042-0967

  341. Performance prediction of word recognition using the transition information between phonemes or between characters

    Takashi Otsuki, Shozo Makino, Akinori Ito, Toshio Sone

    Systems and Computers in Japan 25 (7) 72-81 1994

    DOI: 10.1002/scj.4690250707  

    ISSN: 1520-684X 0882-1666

  342. The performance evaluation on Sentence recognition system using a finite state automaton-the relationship between word recognition score and sentence recognition score-

    Otsuki Takashi, Ito Akinori, Makino Shozo, Otomo Teruhiko

    IEICE technical report. Speech 93 (183) 41-48 1993/08/19

    Publisher: The Institute of Electronics, Information and Communication Engineers

    More details Close

    This report presents the performance evaluation method on sentence recognition system which uses finite state automaton.The relationship between word recognition score and sentence recognition score can be predicted using the number of sentences at a short distance.But it is not clear that how we get this number when the finite state automaton is used as linguistic information.Therefore,we propose the algorithm to calcurate this number to predict the relationship between word recognition score and sentence recognition score.And we carry out the prediction using the method we proposed,and carry out simulation to evaluate the accuracy of prediction.

  343. The performance evaluation method on sentence recognition system which uses the transition information between word categories.

    46 197-198 1993/03/01

  344. Detection of Unknown Words in the Morphemic Analysis for Construction of a Word Dictionary

    46 55-56 1993/03/01

  345. A NEW WORD PRESELECTION METHOD BASED ON AN EXTENDED REDUNDANT HASH ADDRESSING FOR CONTINUOUS SPEECH RECOGNITION

    A ITO, S MAKINO

    ICASSP-93 : 1993 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5 2 B299-B302 1993

    ISSN: 0736-7791

  346. Detection of Unknown Words using a Bunsetsu Automaton

    45 167-168 1992/09/28

  347. Detection of Unknown Words in the Morphemic Analysis for Corpus

    44 177-178 1992/02/24

  348. Redundant Hash Addressing法と機能語予測CYK法を組み合わせた連続音声認識の統語処理

    伊藤 彰則, 牧野 正三

    全国大会講演論文集 44 165-166 1992/02/24

    More details Close

    連続音声認識の統語処理法としては,拡張LR法に基づくものやChart Parser,Earley法などに基づくものが提案されている.しかし,これらのアルゴリズムでは主に文法からの単語予測というトップダウンの処理を中心にしているため,文法的に予測された単語については,すベて入力系列とマッチングを行ってみる必要がある.これは認識精度を高めるという点においては有効であるが,大語彙の連続音声認識システムを構築する場合に計算量が問題となる.本稿では,KohonenのRedundant Hash Addressing法を連続音声認識に拡張し,これを筆者らの提案した連続音声認識アルゴリズムである機能語予測CYK法の予備選択として使う方法を示す.この方法を用いることにより,入力音素系列中からそこに含まれる実質語を高速に予備選択することができるため,単語マッチングの計算量を削減することができる.

  349. 言語情報を利用した文字認識における文字認識率と単語認識率の関係

    大槻 恭士, 伊藤 彰則, 牧野 正三, 曽根 敏夫

    全国大会講演論文集 44 141-142 1992/02/24

    More details Close

    文字認識の後処理として,単語辞書や文字連接情報などの言語情報が用いられている.特に文字連接情報は,簡単で高速な処理で単語辞書と同等な効果が得られることが報告されている.本稿では,これらの言語情報を用いた文字認識における,文字認識率と単語認識率の関係を,実際に認識を行なわずに求める手法を示す.

  350. A JAPANESE TEXT DICTATION SYSTEM BASED ON PHONEME RECOGNITION AND A DEPENDENCY GRAMMAR

    S MAKINO, A ITO, M ENDO, K KIDO

    ICASSP 91, VOLS 1-5 1 273-276 1991

    ISSN: 0736-7791

  351. 文章朗読音声を対象とした連続音声認識のための言語処理

    伊藤彰則

    東北大応用情報研究センターシンポジウム予稿集 143-150 1990

Show all ︎Show first 5

Books and Other Publications 7

  1. Issues in Japanese Psycholinguistics from Comparative Perspectives

    Masatoshi Koizumi

    De Gruyter Mouton 2023/07

    ISBN: 9783110778946

  2. 社会言語科学の源流を追う

    横山, 詔一, 杉戸, 清樹, 佐藤, 和之, 米田, 正人, 前田, 忠彦, 阿部, 貴人

    ひつじ書房 2018/09

    ISBN: 9784894769311

  3. 音響情報ハイディング技術

    鵜木, 祐史, 西村, 竜一, 伊藤, 彰則, 西村, 明, 近藤, 和弘, 薗田, 光太郎

    コロナ社 2018/03

    ISBN: 9784339011357

  4. 音響学入門

    鈴木陽一, 赤木正人, 伊藤彰則, 佐藤洋, 苣木禎史, 中村健太郎

    2010/02

  5. Spoken Language Systems

    Seiichi Nakagawa, Michio Okada, Tatsuya Kawahara

    Ohmsha/IOS Press 2005/09/15

  6. IT Text Speech Recognition System

    Kiyohiro Shikano, Katsunobu Itoh, Tatsuya Kawahara, Kazuya Takeda, Mikio Yamamoto

    Ohmsha 2001/05/15

  7. Recent Research towards Advanced Man-Machine Interface through Spoken Language

    Shozo Makino, Akinori Ito, Mitsuru Endo, Ken'iti Kido

    Elsevier 1996/01

Show all Show first 5

Presentations 9

  1. DNN-based talking movie generation with face direction consideration

    Toru Ishikawa, Takashi Nose, Akinori Ito

    Smart Innovation, Systems and Technologies 2019/01/01

    More details Close

    © Springer Nature Switzerland AG 2019. In this paper, we propose a method to generate a talking head animation considering the direction of the face. The proposed method parametrizes a facial image using the active appearance model (AAM) and models the parameters of the AAM using a feedforward deep neural network. Since the AAM is a two-dimensional face model, conventional methods that use the AAM assumes only the frontal face. Thus, when combining the generated face and other parts such as a head and a body, the direction of the face and the head was often inconsistent. The proposed method models the shape parameters of the AAM using the principal component analysis (PCA) so that the direction and movement of individual facial parts are modeled separately; thus we substitute the face direction of the generated animation with that of the head part so that the direction of the face and the head coincides. We conducted an experiment to demonstrate that the proposed method can generate face animation with proper face direction.

  2. Two-stage sequence-to-sequence neural voice conversion with low-to-high definition spectrogram mapping

    Sou Miyamoto, Takashi Nose, Kazuyuki Hiroshiba, Yuri Odagiri, Akinori Ito

    Smart Innovation, Systems and Technologies 2019/01/01

    More details Close

    © Springer Nature Switzerland AG 2019. In this study, we propose a voice conversion technique with two-stage conversion, which is realized by using two models consisting of U-Net and pix2pix. Using U-Net, we tried to reproduce intonation of a target speaker by performing low-dimensional feature conversion considering the time direction. We introduced pix2pix for the task of spectrogram enhancement. The pix2pix is trained to map from low definition spectrogram to high definition spectrogram (low-to-high spectrogram mapping). Low definition spectrogram is reconstructed from low dimensional mel-cepstrum converted by U-Net and high definition spectrogram is extracted from natural speech. In objective evaluations, we showed that the proposed method was effective in improvement of mel-cepstral distance (MCD) and Log F0 RMSE. Subjective evaluations revealed that the use of the proposed method had a certain effect in improving speech individuality while maintaining the same level of naturalness as the conventional method.

  3. Evaluation of english speech recognition for Japanese learners using DNN-based acoustic models

    Jiang Fu, Yuya Chiba, Takashi Nose, Akinori Ito

    Smart Innovation, Systems and Technologies 2019/01/01

    More details Close

    © Springer Nature Switzerland AG 2019. Regarding the assistance of computer-assisted language learning (CALL) systems to make foreign language learning easier, it is necessary to recognize the utterances of the learner with high accuracy. The quality of CALL systems mainly depends on the accuracy of automatic speech recognition (ASR). However, since the pronunciation of non-native speakers is greatly different from that of native speakers, existing ASR system cannot well recognize speech accurately. To solve this problem, this research projects an acoustic model based on deep neural networks (DNN), which is trained by using ERJ (English Read by Japanese) database collected from 202 Japanese learners. Compared with traditional ASR systems, this new system significantly promotes the speech recognition accuracy.

  4. Comparison of speech recognition performance between kaldi and google cloud speech API

    Takashi Kimura, Takashi Nose, Shinji Hirooka, Shinji Hirooka, Yuya Chiba, Akinori Ito

    Smart Innovation, Systems and Technologies 2019/01/01

    More details Close

    © Springer Nature Switzerland AG 2019. In recent years, many systems having a speech interface have grown. The speech interface includes spoken dialogue function and high performance of a spoken dialogue system has been required. The spoken dialogue system consists of a speech recognition module. In this study, we focus on the speech recognition module of the spoken dialogue system and aim for improving the spoken dialogue system by enhancing the performance of the speech recognition system. Among several speech recognition systems, Kaldi is a widely used speech recognition system in many kinds of researches. On the other hand, several speech recognition services that are Web API is also provided, such as IBM Watson Speech to Text, Microsoft Bing Speech API, and Google Cloud Speech API, which is known that it has high performance. This paper compares speech recognition performance between Kaldi and Google Cloud Speech API in WER and RTF and confirms the recognition performance of each recognition system.

  5. Segmental pitch control using speech input based on differential contexts and features for customizable neural speech synthesis

    Shinya Hanabusa, Takashi Nose, Akinori Ito

    Smart Innovation, Systems and Technologies 2019/01/01

    More details Close

    © Springer Nature Switzerland AG 2019. This paper proposes a technique for controlling the pitch of synthetic speech at a segmental level using user input speech within a framework of speech synthesis based on deep neural networks (DNNs). In a previous study, we proposed tailor-made speech synthesis, the speech synthesis technique which enables users to control the synthetic speech naturally and intuitively. We introduced differential fundamental frequency (F0) contexts into speaker model training of speech synthesis based on DNNs. The differential F0 context represents relative log F0 at the segmental level of training data. In this study, we use the user speech to determine the F0 contexts for synthetic speech. This approach allows users to modify and control the segmental pitch more flexibly, which will enhance the performance of the tailor-made speech synthesis.

  6. A study on a spoken dialogue system with cooperative emotional speech synthesis using acoustic and linguistic information

    Mai Yamanaka, Yuya Chiba, Takashi Nose, Akinori Ito

    Smart Innovation, Systems and Technologies 2019/01/01

    More details Close

    © Springer Nature Switzerland AG 2019. This study examines an emotion labeling method for a system utterance of a non-task-oriented spoken dialogue system. The conventional study proposed the cooperative emotion labeling, which generates an emotional speech with an emotion label estimated from user and system utterances. However, this method had a problem that the system cannot decide the emotion label when the emotion is not estimated from the linguistic information. Therefore, we propose a method that uses both the acoustic and the linguistic information for the emotion recognition. In this paper, we show the performance of the emotion recognition when using the acoustic features first. Then, a dialogue experiment based on scenarios is conducted to verify the effectiveness of the proposed emotion labeling method.

  7. Muting machine speech using audio watermarking

    Akinori Ito

    Smart Innovation, Systems and Technologies 2019/01/01

    More details Close

    © Springer Nature Switzerland AG 2019. Spoken dialog systems have become popular and are used in a home environment, such as smart speakers. A problem will occur when two or more smart speakers are in the same environment, in which a dialog system misdetects the other dialog systems voice as a users voice. In this paper, a method to mute synthesized speech is proposed to prevent a speech recognizer from recognizing speech uttered by a machine. The audio watermark technique is used to indicate that a machine utters the speech, and the speech recognizer attenuates the observed speech if it contains the watermark. The watermark is embedded in high frequency so that humans cannot perceive the watermark and the watermark is robustly extracted. From the experimental result, we found that the proposed method robustly determine the existence of the watermark when the SNR is no less than 0 dB.

  8. Melody completion based on convolutional neural networks and generative adversarial learning

    Kosuke Nakamura, Takashi Nose, Yuya Chiba, Akinori Ito

    Smart Innovation, Systems and Technologies 2019/01/01

    More details Close

    © Springer Nature Switzerland AG 2019. In this paper, we deal with melody completion, a technique which smoothly completes melodies that are partially masked. Melody completion can be used to help people compose or arrange pieces of music in several ways, such as editing existing melodies or connecting two other melodies. In recent years, various methods have been proposed for realizing high-quality completion via neural networks. Therefore, in this research, we examine a method of melody completion based on an image completion network. We represent melodies of a certain length as images and train a completion network to complete those images. The completion network consists of convolution layers and is trained in the framework of generative adversarial networks. We also consider chord progression from musical pieces as conditions.

  9. Leveraging a small corpus by different frame shifts for training of a speech recognizer

    Akinori Ito

    Smart Innovation, Systems and Technologies 2019/01/01

    More details Close

    © Springer Nature Switzerland AG 2019. During the feature extraction process for speech recognition, a window function is first applied to the input waveform to extract temporally-limited spectrum. By shifting the window function with a short time period, we can analyze the temporal change of speech spectrum. This time period is called “the frame shift,” which is usually 5 to 10 ms. In this paper, frame shift is re-considered from two aspects. The first one is the appropriateness of 10 ms as the frame shift. The frame-based process is based on the assumption that temporal change of speech spectrum is slow enough compared with the frame shift, which does not hold for kinds of consonants such as plosives. Thus, this paper experimentally shows that feature value fluctuates much according to the first position of the frame. Then a training method is proposed that uses temporally shifted samples as independent samples to compensate for the fluctuation of feature caused by the difference of the beginning position of a frame. The second aspect is that the frame shift could be longer if the fluctuation can be compensated. To prove this, an experiment was conducted to change frame shift from 10 to 60 ms, and it was found that the result of 40 ms frame shift outperformed the result of 10 ms frame shift, and comparable recognition performance with 10 ms frame shift result was obtained with 50 ms frame shift.

Show all Show first 5

Industrial Property Rights 5

  1. スコアリングモデル生成装置、学習データ生成装置、検索システム、スコアリングモデル生成方法、学習データ生成方法、検索方法及びそのプログラム

    特許第5700566号

    Property Type: Patent

  2. 音声評価装置,音声評価方法,及びプログラム

    特許第5805474号

    Property Type: Patent

  3. モデルパラメータ配列装置とその方法とプログラム

    大庭 隆伸, 堀 貴明, 中村 篤, 伊藤 彰則

    特許第5610304号

    Property Type: Patent

  4. モデル縮減装置とその方法とプログラム

    大庭 隆伸, 堀 貴明, 中村 篤, 伊藤 彰則

    特許第5780516号

    Property Type: Patent

  5. データ通信方法、データ通信システムおよびデータ通信プログラム

    鈴木 陽一, 伊藤 彰則, 阿部 俊一郎, 須藤 裕史, 吉木 伸二, 染谷 大

    特許第4911385号

    Property Type: Patent

Research Projects 23

  1. Music Information Processing Competitive

    2004/04 - Present

  2. Development of a CALL system using speech recognition technology Competitive

    System: Grant-in-Aid for Scientific Research

    2004/04 - Present

  3. Development of Speech Recognition System Competitive

    System: Ordinary Research

    2002/04 - Present

  4. Development of spoken dialog systems Competitive

    2002/04 - Present

  5. Pseudo-Dynamic Preservation and Elucidation of Neural Processing of Endangered Languages Based on Natural Discourse Corpora with Physiological Indices

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Scientific Research (A)

    Institution: Tohoku University

    2024/04/01 - 2028/03/31

  6. Development of a virtual classmate for assistance of online course

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Scientific Research (B)

    Institution: Tohoku University

    2021/04/01 - 2026/03/31

  7. 話者・地域・スタイルモーフィング音声合成による実環境リスニング学習支援

    能勢 隆, 伊藤 彰則

    Offer Organization: 日本学術振興会

    System: 科学研究費助成事業

    Category: 基盤研究(B)

    Institution: 東北大学

    2022/04/01 - 2025/03/31

  8. Field-based Cognitive Neuroscientific Study of Word Order in Language and Order of Thinking from the OS Language Perspective

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Scientific Research (S)

    Institution: Tohoku University

    2019/06/26 - 2024/03/31

  9. Measurement of entrepreneurship using natural language processing and application to the improvement of education program

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Challenging Research (Exploratory)

    Institution: Tohoku University

    2020/07/30 - 2023/03/31

  10. Research and development of multi-modal interactive English learning system based on deep learning

    ITO Akinori

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Scientific Research (A)

    Institution: Tohoku University

    2017/04/01 - 2021/03/31

    More details Close

    We developed technologies for an English conversation learning system based on deep learning and created a CALL system for practicing English conversation: (1) We established technology for recognizing English speech spoken by Japanese with high accuracy to improve the accuracy of interfaces for speech, facial expressions, and gestures based on deep learning. (2) To establish English pronunciation evaluation and English conversation simulation technology based on deep learning, we investigated the effects of facial expressions and gestures on English proficiency evaluation. In addition, we established a method to evaluate pronunciation with high accuracy for interactive speech. (3) We integrated the technologies to create a spoken dialogue English conversation learning system.

  11. OS言語からみた「言語の語順」と「思考の順序」に関するフィールド言語心理学的研究

    小泉 政利, 安永 大地, 木山 幸子, 大塚 祐子, 遊佐 典昭, 酒井 弘, 大滝 宏一, 杉崎 鉱司, Jeong Hyeonjeong, 新国 佳祐, 玉岡 賀津雄, 伊藤 彰則, 金 情浩, 那須川 訓也, 里 麻奈美, 矢野 雅貴, 小野 創

    Offer Organization: 日本学術振興会

    System: 科学研究費助成事業

    Category: 基盤研究(A)

    Institution: 東北大学

    2019/04/01 - 2020/03/31

    More details Close

    8月にトンガ王国で以下のような調査・実験を行うための準備を進めた。 (1)語彙処理,文処理,正順語順の判定,格助詞脱落などの諸問題を網羅した一連の実験と質問紙調査 (2)主語関係節と目的語関係節の理解過程の比較実験 (3)統語的能格性の獲得に関する行動実験 また、関連する研究動向について情報収集を行うために、日本言語学会 第158回 大会(一橋大学)に参加した。

  12. Basic research for YASASHII NIHONGO database construction

    MAEDA Rikako, SATOH Kazuyuki, ITO Akinori, SUGITO Seiju, SUN Weiting, BABA Yasumasa, MIZUNO Yoshimichi, MISONOU Yasuko, YONEDA Masato

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Scientific Research (C)

    Institution: Daito Bunka University

    2015/04/01 - 2018/03/31

    More details Close

    To support GENSAI practice for beginner learner of Japanese language who disadvantaged in catching emergency information, I collected and analyzed YASASHII NIHONGO resources for GENSAI. And I built learning resources for the people who try to be a user of "YASASHII NIHONGO for GENSAI". On building learning resources, I focused on emergency information which prepared to send until 72 hours later after an earthquake occurs.

  13. Research of Human-Kind Dialogue System with Recognition and Synthesis of Various Speech Based on State Estimation

    Nose Takashi, MORI Hiroki

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Scientific Research (B)

    Institution: Tohoku University

    2015/04/01 - 2018/03/31

    More details Close

    In this research project, we improved and advanced techniques of recognition and synthesis of various speech, and studied a state estimation technique of system users and its applications to realize a dialogue system kind to users. Specifically, (1) We studied the validity of using emotions and a technique for emotion estimation. (2) We proposed and evaluated a sentence selection technique based on extended entropy where phonetic and prosodic contexts are taken into account. (3) We recorded and analyzed dialogue data for willingness estimation. (4) We constructed a large-scale emotional speech corpus that can be used for emotional speech synthesis/recognition and emotion estimation. (5) We proposed and evaluated variance compensation and taylor-made speech synthesis as a technique of synthesizing various and high-quality speech synthesis.

  14. Development of Easy Japanese composition support system using sentence difficulty estimation and speech synthesis

    Ito Akinori, CHIBA Yuya, NAGANO Takeshi

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Scientific Research (B)

    Institution: Tohoku University

    2014/04/01 - 2017/03/31

    More details Close

    We conducted development of Easy Japanese composition support system YANSIS and related investigations. We developed a method of automatic estimation of difficulty of a sentence, and investigated relation between intelligibility of Japanese speech listened by non-Japanese-native speakers and speech rate, pause, and speech degradation by reverberation. This investigation revealed the most appropriate speech rate for Easy Japanese speech. In addition, we implemented the function of automatic sentence difficulty estimation and speech synthesizer into YANSIS.

  15. Development of an English conversation learning system based on spoken dialog with an agent

    ITO Akinori, HIROI Yutaka

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Challenging Exploratory Research

    Institution: Tohoku University

    2012/04/01 - 2015/03/31

    More details Close

    In this project, we developed a system for training of communication skill in English for Japanese learners, in which the learner makes conversation exercises with a robot or a virtual character. First, we developed a robot that could move in a room by following a person, understand a position by recognizing the pointing gesture and made conversations with the learner in English. Then we developed a speech recognition method where the learner’s speech with grammatical mistakes can be recognized correctly. Finally, we developed a method for conversation exercise with a virtual character where the learner can acquire a proper timing for answering the interlocutor’s utterance.

  16. Automatic prosody evaluation and grammatical mistake detection for English learning by Japanese native speakers

    ITO Akinori, SUZUKI Motoyuki, MAKINO Shozo, OHKAWA Yuichi

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Scientific Research (B)

    Institution: Tohoku University

    2008 - 2010

    More details Close

    I have developed two methods that enable speaking exercise in a computer-assisted language learning (CALL) system : a method for evaluating prosody of an English utterance made by a learner, and a method for detecting grammatical mistakes included in the learner's utterance. As for prosody evaluation, I developed an estimation method of word importance factors using a decision tree, and obtained a high correlation to human assessment score, which is comparable to correlation between scores given by human evaluators. As for the grammatical mistake detection, I proposed a method for training an n-gram language model from artificially generated sentences with mistakes, and obtained 89.2% word accuracy.

  17. Development of a computer-assisted language learning system utilizing a speaker adaptation and a grammatical error modeling

    ITO Akinori, SUZUKI Motoyuki, MAKINO Shozo

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Scientific Research (B)

    Institution: Tohoku University

    2004 - 2007

    More details Close

    1. Improvement of Pronunciation Evaluation We developed a method to detect pronunciation mistakes made by a foreign language learner by applying a speech recognition technology. We focused on the following two targets: learning of English by Japanese native speakers and learning of Japanese by Korean native speakers. To improve the accuracy of detection of mispronunciations, we developed a bilingual speaker adaptation method to adapt both of English and Japanese HMMs to the learner. To solve a problem that the strictness of the detection of mispronunciations depends on the linguistic context of the pronunciation, we developed a method of detecting mispronunciations using a decision tree, which gave a detection accuracy of more than 90% on English utterances made by Japanese native speakers. 2. Evaluation of Intonation and Rhythm In addition to the detection of mispronunciations, we developed methods to evaluate intonation and rhythm of the English utterances. We found that the log F0, log power and their derivatives were good features for the evaluation of intonation. To adjust the strictness of intonation evaluation from word to word, we introduced a method to estimate the word importance factors using a decision tree. We also found that a word duration ratio was a good feature for rhythm evaluation. 3. Development of an interactive CALL system We developed a method of detecting grammatical errors from utterances of a learner for the application to an interactive CALL system that enables for a learner to learn a foreign language through dialogues with a computer. As for the learning of Japanese, we developed a method to recognize the learner's speech using a finite state automaton to which grammatical error rules were applied. As for the learning of English, we developed a method to use an n-gram language model trained from a corpus that was automatically generated using the grammatical error rules.

  18. Large Vocabulary Continuous Speech Recognition System on Japanese Newspaper Reading Task

    KOHDA Masaki, KATOH Masaharu, ITO Akinori

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Scientific Research (C)

    Institution: Yamagata University

    1998 - 2000

    More details Close

    We investigated large vocabulary continuous speech recognition (LVCSR) system on Japanese newspaper reading task, and obtained the following results. (1) Acoustic models : A Hidden Markov Network (HM-Net) is a highly accurate and robust acoustic model which represents a tied-state structure of context dependent Hidden Markov Models as a network. We propose a state clustering-based rapid topology design method to generate high accuracy HM-Nets for LVCSR.Furthermore, MLLR (Maximum Likelihood Linear Regression)-based speaker adaptation of acoustic models is investigated, and a regression class selection algorithm based on the BIC principle is proposed. (2) Language models : N-gram task adaptation method is investigated, which uses large corpus of the general task (TI text) and small corpus of the specific task (AD text), and employs a simple weighting to mix TI and AD texts. Furthermore we propose a new SCFG (Stochastic Context Free Grammar) model which uses a phrase-based dependency grammar instead of general CFG.Word error rate in the case of using the mixture model besed on the proposed SCFG model and trigram becomes less than that in the case of using only the trigram. (3) Decoder : We investigate about fast search strategies for LVCSR, and propose a new method - a phoneme-graph-based hypothesis restriction, which effectually prunes the search space. In the proposed method, a phoneme graph is generated at the pre-processing stage, and then the best word sequence is searched while restricting expansion of hypotheses using the information of the phoneme graph at the main recognition stage. In the multiple pass LVCSR system that uses word graph as an intermediate data structure, decoder parameters should be optimized in order to generate a good word graph. A new method to optimize these parameters is proposed. This method uses rescoring of the word graph using bigram LM instead of generating many word graphs for each parameter setting. (4) Software Tool : We describe a statistical language model toolkit for word and class-based n-gram. This toolkit has command-level compatibility with CMU-Cambridge SLM Toolkit, and supports class n-gram and n-gram count mixture as well as combined language model using linear interpolation.

  19. 日本語音声認識のための統計的言語モデルとそのタスク適応に関する研究

    伊藤 彰則

    Offer Organization: 日本学術振興会

    System: 科学研究費助成事業

    Category: 奨励研究(A)

    Institution: 山形大学

    1997 - 1998

    More details Close

    今年度の研究では,「日本語連続音声認識のための形態素解析によらない統計的言語モデル」の研究を行った.この研究は2つのサブテーマから成っている.一つは統計的に選ばれた文字列を単位とした言語モデルの作成であり,もう一つは統計的手法に基づく漢字かな混じり文への読みの付与である. 文字列を単位とした言語モデルの作成においては,提案法の評価実験として,さまざまな文字列への分割方法の比較実験,および学習テキストと評価テキストのタスクと規模を変えた実験を行った.その結果,頻度による文字列の抽出と左最長一致法による解析の組み合わせにおいて,もっとも大きいパ-プレキシティ低減効果(最大9.3%)が見られた.また,コーパスによる性能差を見るために,3種類の対話コーパスと,書き言葉であるEDRコーパスを用いた比較実験を行った.その結果,単一タスクであるATR会話コーパスにおけるパ-プレキシティ低減率がもっとも大きかった.これは,学習テキストのみから統計量の推定と分割単位の双方を決定するためであり,本手法の適用限界を示すものと言うことができる. 統計的な手法を用いた読みの付与では,EDRコーパスを用いて,N-gramモデルを応用した読み付与システムを作成し評価した.その結果,当該文字の前後1文字を用いてモデルを作成した場合が最も高性能であることが明らかとなった.システムの最高性能として96.27%の読み付与精度が得られた.

  20. Continuous speech recognition with adaptabilty to the speaking rate of an input speech

    MAKINO Shozo, SUZUKI Motoyuki, SONE Hideaki

    Offer Organization: Japan Society for the Promotion of Science

    System: Grants-in-Aid for Scientific Research

    Category: Grant-in-Aid for Scientific Research (B)

    Institution: Tohoku University

    1995 - 1997

    More details Close

    This tesearch developed a spoken word recognition system which used phoneme duration information estimated from the speaking rate of an input speech. In this research, the speaking rate is assumed to be reflected to the average vowel length. The acoustic processor transforms the input speech into a similarity matrix using the modified LVQ2. The average vowel length is computed from the preliminary recognition result. The duration of each phoneme in each word template is estimated from the average length of vowels in the input speech. By taking into account the estimated phoneme duration, the spoken word recognition experiments were carried out using the DTW.The word recognition score was 97.3% for the 212 word vocabulary uttered by 5 male speakers (test set). The phoneme duration information is collected from the 212 word vocabulary uttered by another 5 male and 10 female speakers (training set). The hybrid combination of the prceiding phoneme dependent estimation and the follwoing phoneme dependent estimation gave the best performance. The above-mentioned method was extended to phoneme recognition. The phoneme accuracy increased from 71.8% to 86.3% for phonemes in the 212 word vocabulary uttered by 5 male speakers (test set).

  21. 連想的手法を用いた自由発話文音声の認識

    伊藤 彰則

    Offer Organization: 日本学術振興会

    System: 科学研究費助成事業

    Category: 奨励研究(A)

    Institution: 東北大学

    1994 - 1994

    More details Close

    本研究は,自由発話文の認識のために新しい枠組の開発を目指したものであり,その方針として「連想関係」を中心にすえた.本研究は次の3つの段階から成る. 1.大規模な言語データベースを解析し,その中に表れる連想関係やその他の言語情報を調査する. 2.連想情報を用いて効率の良い認識のできるアルゴリズムを開発する. 3.これらの結果を用いて,実際に稼働する実験システムを試作する. 本年度の研究成果として,これらの段階のうち,(1)言語データベースの解析と,各種言語現象の調査,(2)連想関係を用いて認識を行うアルゴリズム「拡張RHA法」の開発の2つを行った.以下にその概略を述べる. 1.言語データベースの解析: 分析に用いたデータベースは,日本音響学会研究用連続音声データベースの模擬対話テキストデータベースに含まれる書き起こしテキストである.この中の44対話(3633発話,19019文節)を分析対象とした.まず,このテキストに対して形態素解析を行ない,実質語3386個,機能語615個を抽出した.次に,この分析結果から,対話音声のための文節モデルを構築した.このモデルは,従来我々が文章朗読音声認識のために用いてきた文節モデルを拡張したものである.この文節モデルを用いて,データベース内の単語間の遷移確率,perplexity等を求めている. 認識アルゴリズム「拡張RHA法」の開発: 連想関係を用いて連続音声中から単語認識を行なうアルゴリズム「拡張RHA法」を開発した.この認識法は,各種の情報を用いて単語を連想し認識するというものであり,従来のパターンマッチング的手法とは異なる.今回は,連想情報源として認識された音素のみを用い,従来的な連続音声認識の単語予備選択法として用い,その有効性を検証した.全く同じ枠組で,例えば単語の連続関係等の情報を有効に用いた音声認識が可能である.

  22. 統計と連想に基づく連続音声認識に関する研究

    伊藤 彰則

    Offer Organization: 日本学術振興会

    System: 科学研究費助成事業

    Category: 奨励研究(A)

    Institution: 東北大学

    1993 - 1993

    More details Close

    今年度の研究内容としては,(1)文法情報の構築,(2)連想に基づく単語検出法の構築,および(3)単語連想における統計情報の利用,の3つが挙げられる.今回の研究では,当初の研究計画にある「単語から単語,あるいは単語から場面への連想情報の構築」は行っていない. 文法情報は,本研究の基礎となる重要な情報源である.本研究では,自然な発話の認識を目指して,会話音声中の文節構造を表現する有限オートマトンの構築を行った.会話資料としては,日本音響学会の連続音声データベース中の会話音声の書き起こしテキストを用いている.このテキストから間投詞などのいわゆる不用語を除き,残った表現を受理する文節内文法を有限オートマンで表現した.この文法の構築は,筆者の以前構築した文章音声のための文節内文法を改変する形で行われた. 連想に基づく単語検出法の研究として,「拡張RHA法」を提案した.拡張RHA法は,高速な単語認識法に用いられる「RHA(Redundant Hash Addressing)法」を連続音声認識用に拡張したものである.RHA法を連続音声認識に応用する際には,(1)単語向けの手法を連続音声用に変更することと,(2)元のRHA法の精度を改善することの2点が重要であった.(1)として,RHA法に「活性点(activation point)」の概念を導入し,RHAを単語検出に応用した.また(2)として,あらかじめ音素認識誤りを見込んだ「拡張fragment」を導入し,検出の高精度化をはかった.単語検出実験により,従来この用途に用いられてきた「連続DP法」と比較し,検出性能は遜色なく,検出速度は数倍高速であることが確かめられた. 拡張RHA法による単語検出に統計的要素を導入する一手段として,拡張fragmentによる単語検出法を提案した.拡張RHA法において,単語を連想するための単位は,あらかじめ固定された長さの音素組であったが,拡張fragmentを用いる方法では,その単位を統計的に決定する.この手法では,検出対象となる単語集合が与えられたとき,ひとつの連想単位から連想される単語が一定数以下になるように統計的に連想単位を決定する.具体的には,不定長の音素組を使って単語を連想するようになる.これによって無駄な連想が抑えられ,単語の誤検出を少なく抑えることができるようになった.

  23. 機能語予測CYK法を用いた連続音声認識システムに関する研究

    伊藤 彰則

    Offer Organization: 日本学術振興会

    System: 科学研究費助成事業

    Category: 奨励研究(A)

    Institution: 東北大学

    1992 - 1992

Show all Show first 5

Works 2

  1. palmkit: a toolkit for statistical language modeling

    http://palmkit.sourceforge.net/ 2001/11/05 -

    Type: Software

  2. w3m: a web browser

    http://w3m.sourceforge.net/ 1999/01/10 -

    Type: Software

Social Activities 4

  1. サイエンスカフェ

    2013/06/28 -

    More details Close

    「スマホやロボットとどうやって会話できるのか?」と題して、おんせい認識・合成・対話技術について公開の公演を行った。

  2. 出前講義

    2008/12/04 -

    More details Close

    宮城県仙台第二高校において,「ロボットとの対話」という題目で,高校生を対象に出前講義を行った.

  3. 出前講義

    2008/10/18 -

    More details Close

    群馬県立太田高校において,「ロボットとの対話」という題目で,高校生を対象に出前講義を行った.

  4. ネット障害時 円滑送信

    2007/03/23 -

Other 1

  1. 日本語ディクテーション基本ソフトウェアの開発

    More details Close

    日本語の大語彙連続音声認識の研究・開発・実用化を促進する ため、誰でも利用でき、高精度な音声認識システムを開発する。 このため、不特定話者に対して利用できる高精度な音響モデル、 大量の言語データを用いて学習した言語モデル、および高速・ 高精度な音声認識エンジンの開発を行う。