TOHOKU UNIVERSITY Researchers

Details of the Researcher

Home

日本語 English

Akinori Ito

Section

Graduate School of Engineering

Job title

Professor

Degree

Dr. Eng. (Tohoku University)

researchmap

https://researchmap.jp/read0093842

J-GLOBAL ID

200901062784876580

e-Rad No.

70232428

ORCID

https://orcid.org/0000-0002-8835-7877

Research History 7

2010/04 - Present

Tohoku University　Graduate School of Engineering　Professor
2002/04 - 2010/03

Graduate School of Engineering, Tohoku University　Associate Professor
1999/10 - 2002/03

Faculty of Engineering, Yamagata University　Associate Professor
1995/04 - 1999/09

Faculty of Engineering, Yamagata University　Lecturer
1998/05 - 1999/04

College of Engineering, Boston University　Visiting Scholar
1992/04 - 1995/03

Education Center for Information Processing, Tohoku University　Assistant Professor
1991/04 - 1992/03

Research Center for Applied Information Sciences, Tohoku University　Assistant Professor

Show all Show first 5

Education 2

Tohoku University　Graduate School, Division of Engineering　Department of Information Engineering

- 1991/03
Tohoku University　Faculty of Engineering　Department of Commumication Engineering

- 1986/03

Committee Memberships 42

Journal of Information Hiding and Multimedia Signal Processing　Associate Editor

2009/04 - Present
日本音響学会　評議員

2007/05 - Present
日本音響学会　代議員

2005/05 - Present
日本音響学会　会長

2019/05 - 2021/05
日本音響学会　理事

2009/06 - 2021/05
日本音響学会編集委員会　委員長

2015/06 - 2017/06
日本音響学会編集委員会　委員長

2015/06 - 2017/06
電子情報通信学会マルチメディア情報ハイディング・エンリッチメント研究会　委員長

2015/05 - 2017/04
Acoustical Society of Japan　Vice President

2013/06 - 2015/06
日本音響学会　副会長

2013/06 - 2015/06
情報処理学会音声言語情報処理研究会　運営委員

2004/05 - 2015/04
日本音響学会編集委員会　副主査

2007/05 - 2009/04
情報処理学会音楽情報科学研究会　運営委員

2007/05 - 2009/04
日本音響学会編集委員会　副主査

2007/05 - 2009/04
情報処理学会音楽情報科学研究会　運営委員

2007/05 - 2009/04
電子情報通信学会音声研究会　運営委員

2005/05 - 2008/05
日本音響学会音声研究会　運営委員

2005/05 - 2008/05
電子情報通信学会音声研究会　運営委員

2005/05 - 2008/05
日本音響学会音声研究会　運営委員

2005/05 - 2008/05
日本音響学会学術委員会　幹事

2005/09 - 2007/06
日本音響学会学術委員会　幹事

2005/09 - 2007/06
日本音響学会電子化推進委員会　委員

2005/09 - 2007/05
電子情報通信学会和文論文誌D 編集委員会　編集幹事

2005/05 - 2007/04
日本音響学会編集委員会　編集幹事

2005/05 - 2007/04
電子情報通信学会和文論文誌D 編集委員会　編集幹事

2005/05 - 2007/04
日本音響学会編集委員会　編集幹事

2005/05 - 2007/04
日本音響学会編集委員会　編集委員

2003/05 - 2005/04
日本音響学会編集委員会　編集委員

2003/05 - 2005/04
日本音響学会東北支部　幹事

2002/05 - 2005/04
電子情報通信学会和文論文誌D編集委員会　編集委員

2002/05 - 2005/04
日本音響学会東北支部　幹事

2002/05 - 2005/04
電子情報通信学会和文論文誌D編集委員会　編集委員

2002/05 - 2005/04
電子情報通信学会音声研究会　幹事

2002/05 - 2004/04
日本音響学会音声研究会　幹事

2002/05 - 2004/04
電子情報通信学会音声研究会　幹事

2002/05 - 2004/04
日本音響学会音声研究会　幹事

2002/05 - 2004/04
情報処理学会音声言語情報処理研究会連続音声認識コンソーシアム　実行委員

2001/01 - 2003/09
情報処理学会音声言語情報処理研究会連続音声認識コンソーシアム　実行委員

2001/01 - 2003/09
情報処理学会音声言語研究会　連絡委員

1997/05 - 2001/04
情報処理学会音声言語研究会　連絡委員

1997/05 - 2001/04
大学入試センター教科専門委員会問題作成部会　委員

1996/04 - 1997/03
大学入試センター教科専門委員会問題作成部会　委員

1996/04 - 1997/03

Show all ︎Show first 5

Professional Memberships 6

Human Interface Society
International Speech Communication Association
The Institute of Electrical and Electronics Engineers
情報処理学会
電子情報通信学会
日本音響学会

︎Show all ︎Show first 5

Research Interests 5

Computer Assisted Language Learning System
music information processing
natural language processing
speech processing
speech recognition

Research Areas 2

Humanities & social sciences / Foreign language education /
Informatics / Intelligent informatics /

Awards 5

Best Paper Award of International Conference on Natural Language Processing and Knowledge Engineering

2008/10　Organizing Committee of International Conference on Natural Language Processing and Knowledge Engineering
Best Paper Award of International Conference on Intelligent Information Hiding and Multimedia Signal Processing

2007/11　Organizing Committee of International Conference on Intelligent Information Hiding and Multimedia Signal Processing
Best Paper Award of The 5th International Conference on Education and Information Systems, Technologies and Applications

2007/07　Organizing Committee of The 5th International Conference on Education and Information Systems, Technologies and Applications
石田（實）記念財団研究奨励賞

2003/11/28　石田（實）記念財団　音声言語処理に関する研究
Open Software Prize

2000/06/07　電子ネットワーク協議会　ソフトウェア“w3m”の開発

Papers 352

Automatic assessment of English proficiency for Japanese learners without reference sentences based on deep neural network acoustic models Peer-reviewed

Jiang Fu, Yuya Chiba, Takashi Nose, Akinori Ito

Speech Communication　116　86-97　2020/01

DOI： 10.1016/j.specom.2019.12.002 　

ISSN： 0167-6393
Calculation of approximate heart rate variability indicators based on low-resolution heart rate data provided by widely used commercially available wearable devices Peer-reviewed

Xue Li, Goh Onoguchi, Hiroshi Komatsu, Chiaki Ono, Noriko Warita, Zhiqian Yu, Atsuko Nagaoka, Sho Horikoshi, Kenji Iwabuchi, Kohei Fuji, Mizuki Hino, Yuta Takahashi, Hisashi Ohseto, Natsuko Kobayashi, Saya Kikuchi, Yasuto Kunii, Taku Obara, Shinichi Kuriyama, Noriyasu Homma, Parashkev Nachev, Akinori Ito, Hiroaki Tomita

Biomedical Signal Processing and Control　112　108579-108579　2026/02
Publisher: Elsevier BV
DOI： 10.1016/j.bspc.2025.108579 　

ISSN： 1746-8094
Language Independent Speech-to-Singing-Voice Conversion

Akinori Ito

2025/11/10

DOI： 10.51094/jxiv.1902 　

More details Close

This research addresses the challenge of converting spoken voice into singing voice in a language-independent manner. Traditional speech-to-singing systems often rely on language-specific phoneme alignment or require parallel singing datasets, which limits their applicability across languages and speakers. To overcome these constraints, the authors propose a novel framework that utilizes voiced/unvoiced (V/UV) classification and music state modeling to align speech with musical scores without relying on linguistic content. The approach begins by extracting a V/UV state sequence from input speech using a convolutional layer built on top of a pretrained HuBERT model. Simultaneously, a music state sequence is generated from a monophonic musical score using a decay function that models note intensity over time. These two sequences are then aligned using Dynamic Time Warping (DTW), allowing the system to synchronize speech features with musical timing and pitch. After alignment, the World vocoder is employed to analyze and synthesize the singing voice. The spectral and aperiodic components of speech are aligned to the music sequence, while pitch is replaced with musical pitch to produce the final singing output. Experimental results demonstrate the effectiveness of the proposed V/UV classification using the ATR speech database. The system could generate singing voices from spoken input without requiring phoneme-level annotations or parallel singing data, but still, there is room for quality improvement.
Adaptive Fine-Grained Pruning via Binary Search for Efficient Environmental Sound Classification Peer-reviewed

Changlong Wang, Akinori Ito, Takashi Nose

IEEE Access　13　173201-173208　2025/10
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
DOI： 10.1109/access.2025.3617879 　

eISSN： 2169-3536
Ensemble Learning with Parallel-Trained Pretrained Models for Enhanced Environmental Sound Classification Peer-reviewed

Changlong Wang, Akinori Ito, Takashi Nose, Chia-Ping Chen

Lecture Notes in Networks and Systems　376-385　2025/08/28
Publisher: Springer Nature Switzerland
DOI： 10.1007/978-3-031-94898-5_28 　

ISSN： 2367-3370

eISSN： 2367-3389
Evaluation of Different Training Strategies and Recognizers in Low Resource Speech Recognition Using Wav2vec2.0 Peer-reviewed

Takaki Koshikawa, Akinori Ito, Takashi Nose

Lecture Notes in Networks and Systems　508-518　2025/08/28
Publisher: Springer Nature Switzerland
DOI： 10.1007/978-3-031-94898-5_38 　

ISSN： 2367-3370

eISSN： 2367-3389
Japanese Shadowing Training Using Synchronized Partial Captions

Syuyu Fang, Akinori Ito, Takashi Nose

2025 13th International Conference on Information and Education Technology (ICIET)　177-181　2025/04/18
Publisher: IEEE
DOI： 10.1109/iciet66371.2025.11046256 　
Adaptive Depth-Wise Pruning for Efficient Environmental Sound Classification Peer-reviewed

Changlong Wang, Akinori Ito, Takashi Nose

IEEE Access　13　69751-69759　2025/04/16
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
DOI： 10.1109/access.2025.3561590 　

eISSN： 2169-3536
The Development of an Emotional Embodied Conversational Agent and the Evaluation of the Effect of Response Delay on User Impression Peer-reviewed

Simon Christophe Jolibois, Akinori Ito, Takashi Nose

Applied Sciences　15　(8)　4256　2025/04/11

DOI： 10.3390/app15084256 　
Robust Human Tracking Using a 3D LiDAR and Point Cloud Projection for Human-Following Robots Peer-reviewed

Sora Kitamoto, Yutaka Hiroi, Kenzaburo Miyawaki, Akinori Ito

Sensors　25　(6)　2025/03/12

DOI： 10.3390/s25061754 　
Reversible Spectral Speech Watermarking with Variable Embedding Locations Against Spectrum-Based Attacks Peer-reviewed

Xuping Huang, Akinori Ito

Applied Sciences　15　(1)　381　2025/01/03

DOI： 10.3390/app15010381 　
Generation of Listening Motion of Embodied Conversational Agents Using Speech and Text Information

Haruki Ito, Akinori Ito, Takashi Nose

2025

DOI： 10.1007/978-3-032-05994-9_10 　
Unified model for voice conversion of speech and singing voice using adaptive pitch constraints Peer-reviewed

Shogo Fukawa, Takashi Nose, Shuhei Imai, Akinori Ito

Acoustical Science and Technology　46　(1)　120-123　2025/01/01
Publisher: Acoustical Society of Japan
DOI： 10.1250/ast.e24.47 　

ISSN： 1346-3969

eISSN： 1347-5177
We open our mouths when we are silent Peer-reviewed

Shoki Kawanishi, Yuya Chiba, Akinori Ito, Takashi Nose

Acoustical Science and Technology　46　(1)　96-99　2025/01/01
Publisher: Acoustical Society of Japan
DOI： 10.1250/ast.e24.21 　

ISSN： 1346-3969

eISSN： 1347-5177
Fast end-to-end non-parallel voice conversion based on speaker-adaptive neural vocoder with cycle-consistent learning Peer-reviewed

Shuhei Imai, Aoi Kanagaki, Takashi Nose, Shogo Fukawa, Akinori Ito

Acoustical Science and Technology　46　(1)　116-119　2025/01/01
Publisher: Acoustical Society of Japan
DOI： 10.1250/ast.e24.46 　

ISSN： 1346-3969

eISSN： 1347-5177
LLM as decoder: Investigating Lattice-based Speech Recognition Hypotheses Rescoring Using LLM Peer-reviewed

Sheng Li, Yuka Ko, Akinori Ito

2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)　1-5　2024/12/03
Publisher: IEEE
DOI： 10.1109/apsipaasc63619.2025.10848752 　
A Study on Variable Embedding Locations of Reversible Spectral Speech Watermarking

Xuping Huang, Akinori Ito

2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)　1-6　2024/12/03
Publisher: IEEE
DOI： 10.1109/apsipaasc63619.2025.10848605 　
Suboptimal Allocation of Defense Schedule Using Simulated Annealing Peer-reviewed

Akinori Ito

Journal for Academic Computing and Networking　28　106-113　2024/11

DOI： 10.24669/jacn.28.1_106 　
Selection of key sentences from lecture video transcription and its application to feedback to the learner Peer-reviewed

Miki Takeuchi, Akinori Ito, Takashi Nose

Proceedings of the 2024 8th International Conference on Education and Multimedia Technology　218-223　2024/06/22
Publisher: ACM
DOI： 10.1145/3678726.3678733 　
Development of a Personal Guide Robot That Leads a Guest Hand-in-Hand While Keeping a Distance Peer-reviewed

Hironobu Wakabayashi, Yutaka Hiroi, Kenzaburo Miyawaki, Akinori Ito

Sensors　24　(7)　2345-2345　2024/04/07
Publisher: MDPI AG
DOI： 10.3390/s24072345 　

eISSN： 1424-8220

More details Close

This paper proposes a novel tour guide robot, “ASAHI ReBorn”, which can lead a guest by hand one-on-one while maintaining a proper distance from the guest. The robot uses a stretchable arm interface to hold the guest’s hand and adjusts its speed according to the guest’s pace. The robot also follows a given guide path accurately using the Robot Side method, a robot navigation method that follows a pre-defined path quickly and accurately. In addition, a control method is introduced that limits the angular velocity of the robot to avoid the robot’s quick turn while guiding the guest. We evaluated the performance and usability of the proposed robot through experiments and user studies. The tour-guiding experiment revealed that the proposed method that keeps distance between the robot and the guest using the stretchable arm enables the guests to look around the exhibits compared with the condition where the robot moved at a constant velocity.
Imperceptible and Reversible Acoustic Watermarking Based on Modified Integer Discrete Cosine Transform Coefficient Expansion Peer-reviewed

Xuping Huang, Akinori Ito

Applied Sciences　14　(7)　2757-2757　2024/03/25
Publisher: MDPI AG
DOI： 10.3390/app14072757 　

eISSN： 2076-3417

More details Close

This paper aims to explore an alternative reversible digital watermarking solution to guarantee the integrity of and detect tampering with data of probative importance. Since the payload for verification is embedded in the contents, algorithms for reversible embedding and extraction, imperceptibility, payload capacity, and computational time are issues to evaluate. Thus, we propose a reversible and imperceptible audio information-hiding algorithm based on modified integer discrete cosine transform (intDCT) coefficient expansion. In this work, the original signal is segmented into fixed-length frames, and then intDCT is applied to each frame to transform signals from the time domain into integer DCT coefficients. Expansion is applied to DCT coefficients at a higher frequency to reserve hiding capacity. Objective evaluation of speech quality is conducted using listening quality objective mean opinion (MOS-LQO) and the segmental signal-to-noise ratio (segSNR). The audio quality of different frame lengths and capacities is evaluated. Averages of 4.41 for MOS-LQO and 23.314 [dB] for segSNR for 112 ITU-T test signals were obtained with a capacity of 8000 bps, which assured imperceptibility with the sufficient capacity of the proposed method. This shows comparable audio quality to conventional work based on Linear Predictive Coding (LPC) regarding MOS-LQO. However, all segSNR scores of the proposed method have comparable or better performance in the time domain. Additionally, comparing histograms of the normalized maximum absolute value of stego data shows a lower possibility of overflow than the LPC method. A computational cost, including hiding and transforming, is an average of 4.884 s to process a 10 s audio clip. Blind tampering detection without the original data is achieved by the proposed embedding and extraction method.
Character Expressions in Meta-Learning for Extremely Low Resource Language Speech Recognition Peer-reviewed

Rui Zhou, Akinori Ito, Takashi Nose

Proceedings of the 2024 16th International Conference on Machine Learning and Computing　2024/02/02
Publisher: ACM
DOI： 10.1145/3651671.3651730 　
Evaluation of Environmental Sound Classification using Vision Transformer Peer-reviewed

Changlong Wang, Akinori Ito, Takashi Nose, Chia-Ping Chen

Proceedings of the 2024 16th International Conference on Machine Learning and Computing　665-669　2024/02/02
Publisher: ACM
DOI： 10.1145/3651671.3651733 　
Toward Photo-Realistic Facial Animation Generation Based on Keypoint Features Peer-reviewed

Zikai Shu, Takashi Nose, Akinori Ito

Proceedings of the 2024 16th International Conference on Machine Learning and Computing　39　334-339　2024/02/02
Publisher: ACM
DOI： 10.1145/3651671.3651731 　
Speaker Intimacy Estimation in Chat-Talks Based on Verbal and Non-Verbal Information Peer-reviewed

Yuya Chiba, Akinori Ito

IEEE Access　12　184592-184606　2024

DOI： 10.1109/ACCESS.2024.3507945 　
A Replaceable Curiosity-Driven Candidate Agent Exploration Approach for Task-Oriented Dialog Policy Learning Peer-reviewed

Xuecheng Niu, Akinori Ito, Takashi Nose

IEEE Access　2024

DOI： 10.1109/ACCESS.2024.3462719 　
Multilingual Meta-Transfer Learning for Low-Resource Speech Recognition Peer-reviewed

Rui Zhou, Takaki Koshikawa, Akinori Ito, Takashi Nose, Chia-Ping Chen

IEEE Access　2024

DOI： 10.1109/ACCESS.2024.3486711 　
Scheduled Curiosity-Deep Dyna-Q: Efficient Exploration for Dialog Policy Learning Peer-reviewed

Xuecheng Niu, Akinori Ito, Takashi Nose

IEEE Access　12　46940-46952　2024

DOI： 10.1109/ACCESS.2024.3376418 　

eISSN： 2169-3536
Development of a Play-Tag Robot with Human–Robot Contact Peer-reviewed

Yutaka Hiroi, Kenzaburo Miyawaki, Akinori Ito

Applied Sciences　13　(23)　12909-12909　2023/12/01
Publisher: MDPI AG
DOI： 10.3390/app132312909 　

eISSN： 2076-3417

More details Close

Many robots that play with humans have been developed so far, but developing a robot that physically contacts humans while playing is challenging. We have developed robots that play tag with humans, which find players, approach them, and move away from them. However, the developed algorithm for approaching a player was insufficient because it did not consider how the arms are attached to the robot. Therefore, in this paper, we assume that the arms are fixed on both sides of the robot and develop a new algorithm to approach the player and touch them with an arm. Since the algorithm aims to move along a circular orbit around a player, we call this algorithm “the go-round mode”. To investigate the effectiveness of the proposed method, we conducted two experiments. The first is a simulation experiment, which showed that the proposed method outperformed the previous one. In the second experiment, we implemented the proposed method in a real robot and conducted an experiment to chase and touch the player. As a result, the robot could touch the player in all the trials without collision.
Multimodal Expressive Embodied Conversational Agent Design Peer-reviewed

Simon Jolibois, Akinori Ito, Takashi Nose

Communications in Computer and Information Science　244-249　2023/07/09
Publisher: Springer Nature Switzerland
DOI： 10.1007/978-3-031-35989-7_31 　

ISSN： 1865-0929

eISSN： 1865-0937
Spoken term detection from utterances of minority languages Invited Peer-reviewed

Akinori Ito, Satoru Mizuochi, Takashi Nose

Issues in Japanese Psycholingustics from Comparative Perspectives　1　2023/07
Effect of Data Size and Machine Translation on the Accuracy of Automatic Personality Classification Peer-reviewed

Yuki Fukazawa, Akinori Ito, Takashi Nose

Advances in Intelligent Information Hiding and Multimedia Signal Processing　405-413　2023/05/24
Publisher: Springer Nature Singapore
DOI： 10.1007/978-981-99-0105-0_36 　

ISSN： 2190-3018

eISSN： 2190-3026
Spoken Dialogue System Development Without Speech Recognition Towards Language Revitalization Peer-reviewed

Akinori Ito

Advances in Intelligent Information Hiding and Multimedia Signal Processing　393-404　2023/05/24
Publisher: Springer Nature Singapore
DOI： 10.1007/978-981-99-0105-0_35 　

ISSN： 2190-3018

eISSN： 2190-3026
A Robotic System for Remote Teaching of Technical Drawing Peer-reviewed

Yuataka Hiroi, Akinori Ito

Education Sciences　13　(4)　2023/03/28

DOI： 10.3390/educsci13040347 　
Personality Analysis of Entrepreneurial Text for Entrepreneurship Education Peer-reviewed

Akinori Ito, Kotaro Takeda, Shuichi Ishida

2023 5th International Conference on Natural Language Processing (ICNLP)　2023/03
Publisher: IEEE
DOI： 10.1109/icnlp58431.2023.00047 　
Path Following Algorithm with Small Error for Guide Robot Peer-reviewed

Hironobu Wakabayashi, Yutaka Hiroi, Kenzaburo Miyawaki, Akinori Ito

Robot Intelligence Technology and Applications 7　56-67　2023/03/01
Publisher: Springer International Publishing
DOI： 10.1007/978-3-031-26889-2_6 　

ISSN： 2367-3370

eISSN： 2367-3389
Confidence-based Utterance Selection for a Recognizer-free Spoken Dialogue System Peer-reviewed

Akinori Ito

Proceedings of the 2023 15th International Conference on Machine Learning and Computing　481-484　2023/02/17
Publisher: ACM
DOI： 10.1145/3587716.3587796 　
Response Sentence Modification Using a Sentence Vector for a Flexible Response Generation of Retrieval-based Dialogue Systems Peer-reviewed

Ryota Yahagi, Akinori Ito, Takashi Nose, Yuya Chiba

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)　2022/11/07
Publisher: IEEE
DOI： 10.23919/apsipaasc55919.2022.9979841 　
Design and Construction of Japanese Multimodal Utterance Corpus with Improved Emotion Balance and Naturalness Peer-reviewed

Daisuke Horii, Akinori Ito, Takashi Nose

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)　2022/11/07
Publisher: IEEE
DOI： 10.23919/apsipaasc55919.2022.9980272 　
Multimodal Dialogue Response Timing Estimation Using Dialogue Context Encoder Peer-reviewed

Ryota Yahagi, Yuya Chiba, Takashi Nose, Akinori Ito

Lecture Notes in Electrical Engineering　133-141　2022/11/01
Publisher: Springer Nature Singapore
DOI： 10.1007/978-981-19-5538-9_9 　

ISSN： 1876-1100

eISSN： 1876-1119
Combination of deep-learning-based audio separation and speech enhancement for noise reduction of extracted signal from polyphonic music Peer-reviewed

Soichiro Kobayashi, Takashi Nose, Akinori Ito

Proceedings of the 24th International Congress of Acoustics　2022/10
Successive Binary Partition K-means Method for Clustering with Less Cluster Size Bias Peer-reviewed

Akinori Ito

2022 7th International Conference on Signal and Image Processing (ICSIP)　2022/07/20
Publisher: IEEE
DOI： 10.1109/icsip55141.2022.9886452 　
Development of a Teleoperated Play Tag Robot with Semi-Automatic Play Peer-reviewed

Yoshitaka Kasai, Yutaka Hiroi, Kenzaburo Miyawaki, Akinori Ito

2022 IEEE/SICE International Symposium on System Integration (SII)　2022/01/09
Publisher: IEEE
DOI： 10.1109/sii52469.2022.9708883 　
Spoken Term Detection of Zero-Resource Language Using Posteriorgram of Multiple Languages

Satoru MIZUOCHI, Takashi NOSE, Akinori ITO

Interdisciplinary Information Sciences　28　(1)　1-13　2022
Publisher: Graduate School of Information Sciences, Tohoku University
DOI： 10.4036/iis.2022.a.04 　

ISSN： 1340-9050

eISSN： 1347-6157
Study on the Background Music Cancellation System for Speech Privacy Peer-reviewed

Jianning Huang, Akinori Ito

2021 IEEE 6th International Conference on Signal and Image Processing (ICSIP)　2021/10/22
Publisher: IEEE
DOI： 10.1109/icsip52628.2021.9688835 　
Analysis of Feature Extraction by Convolutional Neural Network for Speech Emotion Recognition Peer-reviewed

Daisuke Horii, Akinori Ito, Takashi Nose

2021 IEEE 10th Global Conference on Consumer Electronics (GCCE)　2021/10/12
Publisher: IEEE
DOI： 10.1109/gcce53005.2021.9621964 　
Speaker Intimacy in Chat-Talks: Analysis and Recognition based on Verbal and Non-Verbal Information Peer-reviewed

Chiba, Yuya, Yoshihiro Yamazaki, Akinori Ito

Proceedings of the 25th Workshop on the Semantics and Pragmatics of Dialogue　2021/09
Effect of Training Data Selection for Speech Recognition of Emotional Speech Peer-reviewed

Yusuke Yamada, Yuya Chiba, Takashi Nose, Akinori Ito

International Journal of Machine Learning and Computing　11　(5)　362-366　2021/09
Improvement of Automatic English Pronunciation Assessment with Small Number of Utterances Using Sentence Speakability Peer-reviewed

Satsuki Naijo, Akinori Ito, Takashi Nose

Interspeech 2021　2021/08/30
Publisher: ISCA
DOI： 10.21437/interspeech.2021-1132 　
Neural Spoken-Response Generation Using Prosodic and Linguistic Context for Conversational Systems Peer-reviewed

Yoshihiro Yamazaki, Yuya Chiba, Takashi Nose, Akinori Ito

Interspeech 2021　2021/08/30
Publisher: ISCA
DOI： 10.21437/interspeech.2021-381 　
Development of a Mobile Robot That Plays Tag with Touch-and-Away Behavior Using a Laser Range Finder Peer-reviewed

Yoshitaka Kasai, Yutaka Hiroi, Kenzaburo Miyawaki, Akinori Ito

Applied Sciences　11　(16)　7522-7522　2021/08/17
Publisher: MDPI AG
DOI： 10.3390/app11167522 　

eISSN： 2076-3417

More details Close

The development of robots that play with humans is a challenging topic for robotics. We are developing a robot that plays tag with human players. To realize such a robot, it needs to observe the players and obstacles around it, chase a target player, and touch the player without collision. To achieve this task, we propose two methods. The first one is the player tracking method, by which the robot moves towards a virtual circle surrounding the target player. We used a laser range finder (LRF) as a sensor for player tracking. The second one is a motion control method after approaching the player. Here, the robot moves away from the player by moving towards the opposite side to the player. We conducted a simulation experiment and an experiment using a real robot. Both experiments proved that with the proposed tracking method, the robot properly chased the player and moved away from the player without collision. The contribution of this paper is the development of a robot control method to approach a human and then move away safely.
SMOC corpus: A large-scale Japanese spontaneous multimodal one-on-one chat-talk corpus for dialog systems Peer-reviewed

Yoshihiro Yamazaki, Yuya Chiba, Takashi Nose, Akinori Ito

Acoustical Science and Technology　42　(4)　210-213　2021/07/01
Publisher: Acoustical Society of Japan
DOI： 10.1250/ast.42.210 　

ISSN： 1346-3969

eISSN： 1347-5177
A Light-weight Hand-waving Gesture Recognition Method Using Kinect V2 and Frequency Analysis Peer-reviewed

Yuki Misaki, Yutaka Hiroi, Akinori Ito

2021 IEEE/SICE International Symposium on System Integration, SII 2021　750-755　2021/01/11

DOI： 10.1109/IEEECONF49454.2021.9382709 　
CycleGAN-Based High-Quality Non-Parallel Voice Conversion with Spectrogram and WaveRNN Peer-reviewed

Aoi Kanagaki, Masaya Tanaka, Takashi Nose, Ryohei Shimizu, Akira Ito, Akinori Ito

2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020　356-357　2020/10/13

DOI： 10.1109/GCCE50665.2020.9291952 　
Incremental response generation using prefix-to-prefix model for dialogue system Peer-reviewed

Ryota Yahagi, Yuya Chiba, Takashi Nose, Akinori Ito

2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020　349-350　2020/10/13

DOI： 10.1109/GCCE50665.2020.9291883 　
A study on minimum spectral error analysis of speech Peer-reviewed

Takuma Hayasaka, Takashi Nose, Akinori Ito

2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020　362-363　2020/10/13

DOI： 10.1109/GCCE50665.2020.9291840 　
Filler prediction based on bidirectional LSTM for generation of natural response of spoken dialog Peer-reviewed

Yoshihiro Yamazaki, Yuya Chiba, Takashi Nose, Akinori Ito

2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020　360-361　2020/10/13

DOI： 10.1109/GCCE50665.2020.9291867 　
Successive Japanese lyrics generation based on encoder-decoder model Peer-reviewed

Rikiya Takahashi, Takashi Nose, Yuya Chiba, Akinori Ito

2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020　126-127　2020/10/13

DOI： 10.1109/GCCE50665.2020.9291718 　
Analysis and Estimation of Sentence Speakability for English Pronunciation Evaluation Peer-reviewed

Satsuki Naijo, Yuya Chiba, Takashi Nose, Akinori Ito

2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020　353-355　2020/10/13

DOI： 10.1109/GCCE50665.2020.9292072 　
LJSing: large-scale singing voice corpus of single Japanese singer Peer-reviewed

Takuto Fujimura, Takashi Nose, Akinori Ito

2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020　364-365　2020/10/13

DOI： 10.1109/GCCE50665.2020.9291704 　
Improving Pronunciation Clarity of Dysarthric Speech Using CycleGAN with Multiple Speakers Peer-reviewed

Shuhei Imai, Takashi Nose, Aoi Kanagaki, Satoshi Watanabe, Akinori Ito

2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020　366-367　2020/10/13

DOI： 10.1109/GCCE50665.2020.9292041 　
Spoken term detection based on acoustic models trained in multiple languages for zero-resource language Peer-reviewed

Satoru Mizuochi, Yuya Chiba, Takashi Nose, Akinori Ito

2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020　351-352　2020/10/13

DOI： 10.1109/GCCE50665.2020.9291761 　
Integration of accent sandhi and prosodic features estimation for japanese text-to-speech synthesis Peer-reviewed

Daisuke Fujimaki, Takashi Nose, Akinori Ito

2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020　358-359　2020/10/13

DOI： 10.1109/GCCE50665.2020.9291906 　
Language modeling in speech recognition for grammatical error detection based on neural machine translation Peer-reviewed

Jiang Fu, Yuya Chiba, Takashi Nose, Akinori Ito

Acoustical Science and Technology　41　(5)　788-791　2020/09/01
Publisher: Acoustical Society of Japan
DOI： 10.1250/ast.41.788 　

ISSN： 1346-3969

eISSN： 1347-5177
Construction and analysis of a multimodal chat-talk corpus for dialog systems considering interpersonal closeness Peer-reviewed

Yoshihiro Yamazaki, Yuya Chiba, Takashi Nose, Akinori Ito

LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings　443-448　2020
Multi-stream attention-based BLSTM with feature segmentation for speech emotion recognition Peer-reviewed

Yuya Chiba, Takashi Nose, Akinori Ito

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH　2020-October　3301-3305　2020

DOI： 10.21437/Interspeech.2020-1199 　

ISSN： 2308-457X

eISSN： 1990-9772
A symbol-level melody completion based on a convolutional neural network with generative adversarial learning Peer-reviewed

Kosuke Nakamura, Takashi Nose, Yuya Chiba, Akinori Ito

Journal of Information Processing　28　248-257　2020

DOI： 10.2197/ipsjjip.28.248 　

ISSN： 0387-5806

eISSN： 1882-6652
Human-machine metacommunication towards development of a human-like agent: A short review Peer-reviewed

Akinori Ito

Acoustical Science and Technology　41　(1)　166-169　2020

DOI： 10.1250/ast.41.166 　

ISSN： 1346-3969

eISSN： 1347-5177
Evaluation of Person Tracking Methods for Human-Robot Physical Play Peer-reviewed

Koyuki Ikemoto, Yutaka Hiroi, Akinori Ito

Proceedings of the 2020 IEEE/SICE International Symposium on System Integration, SII 2020　416-421　2020/01

DOI： 10.1109/SII46433.2020.9026275 　
A pedestrian avoidance method considering personal space for a guide robot Peer-reviewed

Yutaka Hiroi, Akinori Ito

Robotics　8　(4)　2019/12/01

DOI： 10.3390/ROBOTICS8040097 　

eISSN： 2218-6581
Realization of a robot system that plays "darumasan-ga-koronda" game with humans Peer-reviewed

Yutaka Hiroi, Akinori Ito

Robotics　8　(3)　2019/09/01

DOI： 10.3390/robotics8030055 　

eISSN： 2218-6581
Improving human scoring of prosody using parametric speech synthesis Peer-reviewed

Hafiyan Prafianto, Takashi Nose, Yuya Chiba, Akinori Ito

Speech Communication　111　14-21　2019/08
Publisher: Elsevier {BV}
DOI： 10.1016/j.specom.2019.06.001 　

ISSN： 0167-6393
Effect of Mutual Self-Disclosure in Spoken Dialog System on User Impression Peer-reviewed

Shunsuke Tada, Yuya Chiba, Takashi Nose, Akinori Ito

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings　806-810　2019/03/04

DOI： 10.23919/APSIPA.2018.8659630 　
Latent words recurrent neural network language models for automatic speech recognition Peer-reviewed

Ryo Masumura, Taichi Asami, Takanobu Oba, Sumitaka Sakauchi, Akinori Ito

IEICE Transactions on Information and Systems　E102D　(12)　2557-2567　2019

DOI： 10.1587/transinf.2018EDP7242 　

ISSN： 0916-8532

eISSN： 1745-1361
Preface

Jeng Shyang Pan, Akinori Ito, Pei Wei Tsai, Lakhmi C. Jain

Smart Innovation, Systems and Technologies　110　v-vi　2019

DOI： 10.1109/ICB.2012.6199777 　

ISSN： 2190-3018

eISSN： 2190-3026
Multi-condition training for noise-robust speech emotion recognition Peer-reviewed

Yuya Chiba, Takashi Nose, Akinori Ito

Acoustical Science and Technology　40　(6)　406-409　2019

DOI： 10.1250/ast.40.406 　

ISSN： 1346-3969

eISSN： 1347-5177
Evaluation of English speech recognition for Japanese learners using DNN-based acoustic models Peer-reviewed

Jiang Fu, Yuya Chiba, Takashi Nose, Akinori Ito

Smart Innovation, Systems and Technologies　110　93-100　2019

DOI： 10.1007/978-3-030-03748-2_11 　

ISSN： 2190-3018

eISSN： 2190-3026
Comparison of speech recognition performance between kaldi and google cloud speech API Peer-reviewed

Takashi Kimura, Takashi Nose, Shinji Hirooka, Yuya Chiba, Akinori Ito

Smart Innovation, Systems and Technologies　110　109-115　2019

DOI： 10.1007/978-3-030-03748-2_13 　

ISSN： 2190-3018

eISSN： 2190-3026
Segmental pitch control using speech input based on differential contexts and features for customizable neural speech synthesis Peer-reviewed

Shinya Hanabusa, Takashi Nose, Akinori Ito

Smart Innovation, Systems and Technologies　110　124-131　2019

DOI： 10.1007/978-3-030-03748-2_15 　

ISSN： 2190-3018

eISSN： 2190-3026
Melody completion based on convolutional neural networks and generative adversarial learning Peer-reviewed

Kosuke Nakamura, Takashi Nose, Yuya Chiba, Akinori Ito

Smart Innovation, Systems and Technologies　110　116-123　2019

DOI： 10.1007/978-3-030-03748-2_14 　

ISSN： 2190-3018

eISSN： 2190-3026
Two-stage sequence-to-sequence neural voice conversion with low-to-high definition spectrogram mapping Peer-reviewed

Sou Miyamoto, Takashi Nose, Kazuyuki Hiroshiba, Yuri Odagiri, Akinori Ito

Smart Innovation, Systems and Technologies　110　132-139　2019

DOI： 10.1007/978-3-030-03748-2_16 　

ISSN： 2190-3018

eISSN： 2190-3026
DNN-based talking movie generation with face direction consideration Peer-reviewed

Toru Ishikawa, Takashi Nose, Akinori Ito

Smart Innovation, Systems and Technologies　110　157-164　2019

DOI： 10.1007/978-3-030-03748-2_19 　

ISSN： 2190-3018

eISSN： 2190-3026
A study on a spoken dialogue system with cooperative emotional speech synthesis using acoustic and linguistic information Peer-reviewed

Mai Yamanaka, Yuya Chiba, Takashi Nose, Akinori Ito

Smart Innovation, Systems and Technologies　110　101-108　2019

DOI： 10.1007/978-3-030-03748-2_12 　

ISSN： 2190-3018

eISSN： 2190-3026
Leveraging a small corpus by different frame shifts for training of a speech recognizer Peer-reviewed

Akinori Ito

Smart Innovation, Systems and Technologies　110　82-89　2019

DOI： 10.1007/978-3-030-03748-2_10 　

ISSN： 2190-3018

eISSN： 2190-3026
Muting machine speech using audio watermarking Peer-reviewed

Akinori Ito

Smart Innovation, Systems and Technologies　110　74-81　2019

DOI： 10.1007/978-3-030-03748-2_9 　

ISSN： 2190-3018

eISSN： 2190-3026
Improvement of accent sandhi rules based on Japanese accent dictionaries Peer-reviewed

Hiroto Aoyama, Takashi Nose, Yuya Chiba, Akinori Ito

Smart Innovation, Systems and Technologies　110　140-148　2019

DOI： 10.1007/978-3-030-03748-2_17 　

ISSN： 2190-3018

eISSN： 2190-3026
Multiple player detection and tracking method using a laser range finder for a robot that plays with human Peer-reviewed

Yuko Nakamori, Yutaka Hiroi, Akinori Ito

ROBOMECH Journal　5　(1)　25　2018/12/01

DOI： 10.1186/s40648-018-0122-x 　

eISSN： 2197-4225
An Analysis of the Effect of Emotional Speech Synthesis on Non-Task-Oriented Dialogue System. Peer-reviewed

Yuya Chiba, Takashi Nose, Taketo Kase, Mai Yamanaka, Akinori Ito

Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, Melbourne, Australia, July 12-14, 2018　371-375　2018/07
Publisher: Association for Computational Linguistics
Improving User Impression in Spoken Dialog System with Gradual Speech Form Control. Peer-reviewed

Yukiko Kageyama, Yuya Chiba, Takashi Nose, Akinori Ito

Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, Melbourne, Australia, July 12-14, 2018　235-240　2018/07
Publisher: Association for Computational Linguistics
Domain adaptation based on mixture of latent words language models for automatic speech recognition Peer-reviewed

Ryo Masumura, Taichi Asami, Takanobu Oba, Hirokazu Masataki, Sumitaka Sakauchi, Akinori Ito

IEICE Transactions on Information and Systems　E101D　(6)　1581-1590　2018/06
Publisher: Institute of Electronics, Information and Communication, Engineers, IEICE
DOI： 10.1587/transinf.2017EDP7210 　

ISSN： 0916-8532

eISSN： 1745-1361
Analyses of example sentences collected by conversation for example-based non-task-oriented dialog system Peer-reviewed

Yukiko Kageyama, Yuya Chiba, Takashi Nose, Akinori Ito

IAENG International Journal of Computer Science　45　(2)　285-293　2018/05/28

ISSN： 1819-656X

eISSN： 1819-9224
Spoken term detection of zero-resource language using machine learning Peer-reviewed

Akinori Ito, Masatoshi Koizumi

ACM International Conference Proceeding Series　45-49　2018/02/26

DOI： 10.1145/3193063.3193068 　
Analysis of efficient multimodal features for estimating user's willingness to talk: Comparison of human-machine and human-human dialog Peer-reviewed

Yuya Chiba, Takashi Nose, Akinori Ito

Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017　2018-February　428-431　2018/02/05
Publisher: IEEE
DOI： 10.1109/APSIPA.2017.8282069 　
Analysis of efficient multimodal features for estimating user's willingness to talk: Comparison of human-machine and human-human dialog Peer-reviewed

Yuya Chiba, Takashi Nose, Akinori Ito

Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017　2018-February　428-431　2018/02/05

DOI： 10.1109/APSIPA.2017.8282069 　
Enhancement of person detection and tracking for a robot that plays with human Peer-reviewed

Yuko Nakamori, Yutaka Hiroi, Akinori Ito

SII 2017 - 2017 IEEE/SICE International Symposium on System Integration　2018-January　494-499　2018/02/01
Publisher: IEEE
DOI： 10.1109/SII.2017.8279261 　
Enhancement of person detection and tracking for a robot that plays with human Peer-reviewed

Yuko Nakamori, Yutaka Hiroi, Akinori Ito

SII 2017 - 2017 IEEE/SICE International Symposium on System Integration　2018-January　494-499　2018/02/01

DOI： 10.1109/SII.2017.8279261 　
Special section on enriched multimedia — Potential and possibility of multimedia contents for the future

Akinori Ito

IEICE Transactions on Information and Systems　E101D　(1)　1　2018

DOI： 10.1587/transinf.2017MUF0001 　

ISSN： 0916-8532

eISSN： 1745-1361
Dialog-based interactive movie recommendation: Comparison of dialog strategies Peer-reviewed

Hayato Mori, Yuya Chiba, Takashi Nose, Akinori Ito

Smart Innovation, Systems and Technologies　82　77-83　2018
Publisher: Springer Science and Business Media Deutschland GmbH
DOI： 10.1007/978-3-319-63859-1_10 　

ISSN： 2190-3018

eISSN： 2190-3026
Response selection of interview-based dialog system using user focus and semantic orientation Peer-reviewed

Shunsuke Tada, Yuya Chiba, Takashi Nose, Akinori Ito

Smart Innovation, Systems and Technologies　82　84-90　2018
Publisher: Springer Science and Business Media Deutschland GmbH
DOI： 10.1007/978-3-319-63859-1_11 　

ISSN： 2190-3018

eISSN： 2190-3026
Detection of singing mistakes from singing voice Peer-reviewed

Isao Miyagawa, Yuya Chiba, Takashi Nose, Akinori Ito

Smart Innovation, Systems and Technologies　82　130-136　2018
Publisher: Springer Science and Business Media Deutschland GmbH
DOI： 10.1007/978-3-319-63859-1_17 　

ISSN： 2190-3018

eISSN： 2190-3026
Evaluation of nonlinear tempo modification methods based on sinusoidal modeling Peer-reviewed

Kosuke Nakamura, Yuya Chiba, Takashi Nose, Akinori Ito

Smart Innovation, Systems and Technologies　82　104-111　2018
Publisher: Springer Science and Business Media Deutschland GmbH
DOI： 10.1007/978-3-319-63859-1_14 　

ISSN： 2190-3018

eISSN： 2190-3026
Development and evaluation of julius-compatible interface for Kaldi ASR Peer-reviewed

Yusuke Yamada, Takashi Nose, Yuya Chiba, Akinori Ito, Takahiro Shinozaki

Smart Innovation, Systems and Technologies　82　91-96　2018
Publisher: Springer Science and Business Media Deutschland GmbH
DOI： 10.1007/978-3-319-63859-1_12 　

ISSN： 2190-3018

eISSN： 2190-3026
Voice conversion from arbitrary speakers based on deep neural networks with adversarial learning Peer-reviewed

Sou Miyamoto, Takashi Nose, Suzunosuke Ito, Harunori Koike, Yuya Chiba, Akinori Ito, Takahiro Shinozaki

Smart Innovation, Systems and Technologies　82　97-103　2018
Publisher: Springer Science and Business Media Deutschland GmbH
DOI： 10.1007/978-3-319-63859-1_13 　

ISSN： 2190-3018

eISSN： 2190-3026
A study on 2D photo-realistic facial animation generation using 3D facial feature points and deep neural networks Peer-reviewed

Kazuki Sato, Takashi Nose, Akira Ito, Yuya Chiba, Akinori Ito, Takahiro Shinozaki

Smart Innovation, Systems and Technologies　82　113-118　2018
Publisher: Springer Science and Business Media Deutschland GmbH
DOI： 10.1007/978-3-319-63859-1_15 　

ISSN： 2190-3018

eISSN： 2190-3026
Analyzing effect of physical expression on English proficiency for multimodal computer-assisted language learning Peer-reviewed

Haoran Wu, Yuya Chiba, Takashi Nose, Akinori Ito

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH　2018-September　1746-1750　2018/01/01
Publisher: ISCA
DOI： 10.21437/Interspeech.2018-1425 　

ISSN： 2308-457X

eISSN： 1990-9772
Analysis of preferred speaking rate and pause in spoken Easy Japanese for non-native listeners Peer-reviewed

Hafiyan Prafiyanto, Takashi Nose, Yuya Chiba, Akinori Ito

Acoustical Science and Technology　39　(2)　92-100　2018
Publisher: Acoustical Society of Japan
DOI： 10.1250/ast.39.92 　

ISSN： 1346-3969

eISSN： 1347-5177
Guest editorial: Introduction to the special issue on the enrichment of sound, speech and music media

Yôiti Suzuki, Akinori Ito, Kazuhiro Kondo

Journal of Information Hiding and Multimedia Signal Processing　8　(6)　1323-1324　2017/11
Publisher: Ubiquitous International
ISSN： 2073-4212

eISSN： 2073-4239
Enrichment of audio signal using side information Peer-reviewed

Akinori Ito

Journal of Information Hiding and Multimedia Signal Processing　8　(6)　1325-1334　2017/11

ISSN： 2073-4212

eISSN： 2073-4239
Manipulating vocal signal in mixed music sounds using side information based on the fundamental frequency Peer-reviewed

Akinori Ito, Yuto Sasaki

Journal of Information Hiding and Multimedia Signal Processing　8　(6)　1372-1381　2017/11

ISSN： 2073-4212

eISSN： 2073-4239
HMM-Based Photo-Realistic Talking Face Synthesis Using Facial Expression Parameter Mapping with Deep Neural Networks Peer-reviewed

Kazuki Sato, Takashi Nose, Akinori Ito

Journal of Computer and Communications　5　(10)　55-65　2017/08

DOI： 10.4236/jcc.2017.510006 　
Cluster-based approach to discriminate the user’s state whether a user is embarrassed or thinking to an answer to a prompt Peer-reviewed

Yuya Chiba, Takashi Nose, Akinori Ito

Journal on Multimodal User Interfaces　11　(2)　185-196　2017/06/01

DOI： 10.1007/s12193-017-0238-y 　

ISSN： 1783-7677

eISSN： 1783-8738
Construction and analysis of phonetically and prosodically balanced emotional speech database Peer-reviewed

Emika Takeishi, Takashi Nose, Yuya Chiba, Akinori Ito

2016 Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques, O-COCOSDA 2016　16-21　2017/05/03
Publisher: Institute of Electrical and Electronics Engineers Inc.
DOI： 10.1109/ICSDA.2016.7918977 　
Recognition of sounds using square cauchy mixture distribution Peer-reviewed

Akinori Ito

2016 IEEE International Conference on Signal and Image Processing, ICSIP 2016　726-730　2017/03/27

DOI： 10.1109/SIPROCESS.2016.7888359 　
A precise evaluation method of prosodic quality of non-native speakers using average voice and prosody substitution Peer-reviewed

Hafiyan Prafianto, Takashi Nose, Akinori Ito

ICALIP 2016 - 2016 International Conference on Audio, Language and Image Processing - Proceedings　208-212　2017/02/07

DOI： 10.1109/ICALIP.2016.7846620 　
A Compression Method for Spherical Microphone Array Recordings using Principal Component Analysis Peer-reviewed

Hironori Sato, Arif Wicaksono, Shuichi Sakamoto, Cesar Salvador, Jorge Trevino, Yôiti Suzuki, Akinori Ito

Proc. 2017 RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing (NCSP'17)　2PM1-3-4　433-436　2017/02
Special section on enriched multimedia -new technology trends in creation, utilization and protection of multimedia information

Akinori Ito

IEICE Transactions on Information and Systems　E100D　(1)　1　2017/01

ISSN： 0916-8532

eISSN： 1745-1361
Demonstration experiment of data hiding into OOXML document for suppression of plagiarism Peer-reviewed

Akinori Ito

Smart Innovation, Systems and Technologies　63　3-10　2017

DOI： 10.1007/978-3-319-50209-0_1 　

ISSN： 2190-3018

eISSN： 2190-3026
Estimation of user’s willingness to talk about the topic: Analysis of interviews between humans Peer-reviewed

Yuya Chiba, Akinori Ito

Lecture Notes in Electrical Engineering　999 LNEE　411-419　2017
Publisher: Springer Verlag
DOI： 10.1007/978-981-10-2585-3_34 　

ISSN： 1876-1100

eISSN： 1876-1119
Collection of example sentences for non-task-oriented dialog using a spoken dialog system and comparison with hand-crafted DB Peer-reviewed

Yukiko Kageyama, Yuya Chiba, Takashi Nose, Akinori Ito

Communications in Computer and Information Science　713　458-464　2017
Publisher: Springer Verlag
DOI： 10.1007/978-3-319-58750-9_63 　

ISSN： 1865-0929
Synthesis of photo-realistic facial animation from text based on HMM and DNN with animation unit Peer-reviewed

Kazuki Sato, Takashi Nose, Akinori Ito

Smart Innovation, Systems and Technologies　64　29-36　2017

DOI： 10.1007/978-3-319-50212-0_4 　

ISSN： 2190-3018

eISSN： 2190-3026
Development of an easy Japanese writing support system with text-to-speech function Peer-reviewed

Takeshi Nagano, Hafiyan Prafianto, Takashi Nose, Akinori Ito

Smart Innovation, Systems and Technologies　64　221-228　2017

DOI： 10.1007/978-3-319-50212-0_27 　

ISSN： 2190-3018

eISSN： 2190-3026
A study on tailor-made speech synthesis based on deep neural networks Peer-reviewed

Shuhei Yamada, Takashi Nose, Akinori Ito

Smart Innovation, Systems and Technologies　63　159-166　2017

DOI： 10.1007/978-3-319-50209-0_20 　

ISSN： 2190-3018

eISSN： 2190-3026
A Crowd Avoidance Method Using Circular Avoidance Path for Robust Person Following Peer-reviewed

Kohei Morishita, Yutaka Hiroi, Akinori Ito

Journal of Robotics　2017　1　2017
Publisher: Hindawi Limited
DOI： 10.1155/2017/3148202 　

ISSN： 1687-9600

eISSN： 1687-9619
Multiple description vector quantizer design based on redundant representation of central code Peer-reviewed

Akinori Ito

European Signal Processing Conference　2016-November　106-109　2016/11/28

DOI： 10.1109/EUSIPCO.2016.7760219 　

ISSN： 2219-5491
Investigation of combining various major language model technologies including data expansion and adaptation Peer-reviewed

Ryo Masumura, Taichi Asami, Takanobu Oba, Hirokazu Masataki, Sumitaka Sakauchi, Akinori Ito

IEICE Transactions on Information and Systems　E99D　(10)　2452-2461　2016/10

DOI： 10.1587/transinf.2016SLP0013 　

ISSN： 0916-8532

eISSN： 1745-1361
Tempo Modification of Mixed Music Signal by Nonlinear Time Scaling and Sinusoidal Modeling Peer-reviewed

Tsukasa Nishino, Takashi Nose, Akinori Ito

Proceedings - 2015 International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2015　146-149　2016/02/19

DOI： 10.1109/IIH-MSP.2015.86 　
Conversion of Speaker's Face Image Using PCA and Animation Unit for Video Chatting Peer-reviewed

Yuki Saito, Takashi Nose, Takahiro Shinozaki, Akinori Ito

Proceedings - 2015 International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2015　433-436　2016/02/19

DOI： 10.1109/IIH-MSP.2015.85 　
Playing with a Robot: Realization of «red Light, Green Light» Using a Laser Range Finder Peer-reviewed

Keisuke Sakai, Yutaka Hiroi, Akinori Ito

Proceedings - 2015 3rd International Conference on Robot, Vision and Signal Processing, RVSP 2015　1-4　2016/02/03

DOI： 10.1109/RVSP.2015.9 　
Estimating the user's state before exchanging utterances using intermediate acoustic features for spoken dialog systems Peer-reviewed

Yuya Chiba, Takashi Nose, Masashi Ito, Akinori Ito

IAENG International Journal of Computer Science　43　(1)　1-9　2016/02/01

ISSN： 1819-656X

eISSN： 1819-9224
DNNを利用したAnimation Unitの変換に基づく顔画像変換の検討 Peer-reviewed

齋藤優貴, 能勢隆, 伊藤彰則

電子情報通信学会論文誌　J199-D　(11)　1112-1115　2016
Multiple Description Vector Quantizer Design Based on Redundant Representation of Central Code Peer-reviewed

Akinori Ito

2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO)　106-109　2016

DOI： 10.1109/EUSIPCO.2016.7760219 　

ISSN： 2076-1465
Influence of the height of a robot on comfortableness of verbal interaction Peer-reviewed

Yutaka Hiroi, Akinori Ito

IAENG International Journal of Computer Science　43　(4)　447-455　2016

ISSN： 1819-656X

eISSN： 1819-9224
発話状態推定に基づく協調的感情音声合成による音声対話システムの評価 Peer-reviewed

加瀬嵩人, 能勢隆, 千葉祐弥, 伊藤彰則

電子情報通信学会誌A　J199-A　(1)　25-35　2016/01/01
Estimation of User's Willingness to Talk About the Topic: Analysis of Interviews Between Humans. Peer-reviewed

Yuya Chiba, Akinori Ito

Dialogues with Social Robots - Enablements, Analyses, and Evaluation, Seventh International Workshop on Spoken Dialogue Systems, IWSDS 2016, Saariselkä, Finland, January 13-16, 2016　411-419　2016
Publisher: Springer
DOI： 10.1007/978-981-10-2585-3_34 　
Investigation of Pause Insertion Effect in Spoken Easy Japanese for Non-Native Listeners Peer-reviewed

Hafiyan Prafianto, Takeshi Nagano, Takashi Nose, Akinori Ito

Proceedings of 12th Western Pacific Acoustics Conference　507-511　2015/12/08
Automatic Generation of Proper Noun Entries in a Speech Recognizer for Local Information Recognition Peer-reviewed

Kenta Shiga, Takashi Nose, Akinori Ito, Ryo Masumura, Hirokazu Masataki

Proceedings of 12th Western Pacific Acoustics Conference　2015/12/08
Development of a mobile robot moving on a handrail - Control for preceding a person keeping a distance Peer-reviewed

Yuma Fujiwara, Yutaka Hiroi, Yuki Tanaka, Akinori Ito

Proceedings - IEEE International Workshop on Robot and Human Interactive Communication　2015-November　413-418　2015/11/20

DOI： 10.1109/ROMAN.2015.7333579 　
YANSIS: An “Easy Japanese” writing support system Peer-reviewed

Takeshi Nagano, Akinori Ito

Proceedings of 8th International Conference ICT for Language Learning　2015/11/12
A Computer-Assisted English Conversation Training System for Response-Timing-Aware Oral Conversation Exercise Peer-reviewed

Naoto Suzuki, Yutaka Hiroi, Yuya Chiba, Takashi Nose, Akinori Ito

情報処理学会論文誌　56　(11)　2177-2189　2015/11/01

ISSN： 1882-7764
Investigation of Precision of Human Perception of Pointing Gesture and a Method for Precision Improvement Peer-reviewed

廣井富, 伊藤彰則

情報処理学会論文誌　56　(8)　1634-1645　2015/08/15

ISSN： 1882-7764
Robot: Have i done something wrong? - Analysis of prosodic features of speech commands under the robot's unintended behavior Peer-reviewed

Noriko Totsuka, Yuya Chiba, Takashi Nose, Akinori Ito

ICALIP 2014 - 2014 International Conference on Audio, Language and Image Processing, Proceedings　887-890　2015/01/13

DOI： 10.1109/ICALIP.2014.7009922 　
Subjective evaluation of packet loss recovery techniques for voice over IP Peer-reviewed

Masahito Okamoto, Takashi Nose, Akinori Ito, Takeshi Nagano

ICALIP 2014 - 2014 International Conference on Audio, Language and Image Processing, Proceedings　711-714　2015/01/13

DOI： 10.1109/ICALIP.2014.7009887 　
A study on the effect of speech rate on perception of spoken easy Japanese using speech synthesis Peer-reviewed

Hafiyan Prafianto, Takashi Nose, Yuya Chiba, Akinori Ito, Kazuyuki Sato

ICALIP 2014 - 2014 International Conference on Audio, Language and Image Processing, Proceedings　476-479　2015/01/13

DOI： 10.1109/ICALIP.2014.7009839 　
Hierarchical Latent Words Language Models for Robust Modeling to Out-Of Domain Tasks. Peer-reviewed

Ryo Masumura, Taichi Asami, Takanobu Oba, Hirokazu Masataki, Sumitaka Sakauchi, Akinori Ito

Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015　1896-1901　2015
On appropriateness and estimation of the emotion of synthesized response speech in a spoken dialogue system Peer-reviewed

Taketo Kase, Takashi Nose, Akinori Ito

Communications in Computer and Information Science　528　747-752　2015

DOI： 10.1007/978-3-319-21380-4_126 　

ISSN： 1865-0929
On appropriateness and estimation of the emotion of synthesized response speech in a spoken dialogue system Peer-reviewed

Taketo Kase, Takashi Nose, Akinori Ito

Communications in Computer and Information Science　528　747-752　2015
Publisher: Springer Verlag
DOI： 10.1007/978-3-319-21380-4_126 　

ISSN： 1865-0929
Entropy-Based Sentence Selection for Speech Synthesis Using Phonetic and Prosodic Contexts Peer-reviewed

Takashi Nose, Yusuke Arao, Takao Kobayashi, Komei Sugiura, Yoshinori Shiga, Akinori Ito

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5　3491-3495　2015
Tempo Modification of Mixed Music Signal by Nonlinear Time Scaling and Sinusoidal Modeling Peer-reviewed

Tsukasa Nishino, Takashi Nose, Akinori Ito

2015 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP)　146-149　2015

DOI： 10.1109/IIH-MSP.2015.86 　
Conversion of Speaker's Face Image Using PCA and Animation Unit for Video Chatting Peer-reviewed

Yuki Saito, Takashi Nose, Takahiro Shinozaki, Akinori Ito

2015 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP)　433-436　2015

DOI： 10.1109/IIH-MSP.2015.85 　
On Appropriateness and Estimation of the Emotion of Synthesized Response Speech in a Spoken Dialogue System Peer-reviewed

Taketo Kase, Takashi Nose, Akinori Ito

HCI INTERNATIONAL 2015 - POSTERS' EXTENDED ABSTRACTS, PT I　528　747-752　2015

DOI： 10.1007/978-3-319-21380-4_126 　

ISSN： 1865-0929
Latent words recurrent neural network language models Peer-reviewed

Ryo Masumura, Taichi Asami, Takanobu Oba, Hirokazu Masataki, Sumitaka Sakauchi, Akinori Ito

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH　2015-January　2380-2384　2015

ISSN： 2308-457X

eISSN： 1990-9772
Combinations of various language model technologies including data expansion and adaptation in spontaneous speech recognition Peer-reviewed

Ryo Masumura, Taichi Asami, Takanobu Oba, Hirokazu Masataki, Sumitaka Sakauchi, Akinori Ito

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH　2015-January　463-467　2015

ISSN： 2308-457X

eISSN： 1990-9772
Hierarchical latent words language models for robust modeling to out-of domain tasks Peer-reviewed

Ryo Masumura, Taichi Asami, Takanobu Oba, Hirokazu Masataki, Sumitaka Sakauchi, Akinori Ito

Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing　1896-1901　2015
Publisher: The Association for Computational Linguistics
DOI： 10.18653/v1/d15-1217 　
Entropy-based sentence selection for speech synthesis using phonetic and prosodic contexts Peer-reviewed

Takashi Nose, Yusuke Arao, Takao Kobayashi, Komei Sugiura, Yoshinori Shiga, Akinori Ito

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH　2015-January　3491-3495　2015

ISSN： 2308-457X

eISSN： 1990-9772
Preface Peer-reviewed

Junzo Watada, Akinori Ito, Jeng Shyang Pan, Han Chieh Chao, Chien Ming Chen

Proceedings - 2014 10th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2014　XXV　2014/12/24

DOI： 10.1109/IIH-MSP.2014.5 　
Analysis of english pronunciation of singing voices sung by Japanese speakers Peer-reviewed

Kazumichi Yoshida, Takashi Nose, Akinori Ito

Proceedings - 2014 10th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2014　554-557　2014/12/24

DOI： 10.1109/IIH-MSP.2014.143 　
Assessing the intended enthusiasm of singing voice using energy variance Peer-reviewed

Akinori Ito

Proceedings - 2014 10th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2014　558-561　2014/12/24

DOI： 10.1109/IIH-MSP.2014.144 　
Teaching a robot where objects are: Specification of object location using human following and human orientation estimation Peer-reviewed

Keisuke Sakai, Yutaka Hiroi, Akinori Ito

World Automation Congress Proceedings　490-495　2014/10/24

DOI： 10.1109/WAC.2014.6936012 　

ISSN： 2154-4824

eISSN： 2154-4832
Analysis of spectral enhancement using global variance in HMM-based speech synthesis Peer-reviewed

Takashi Nose, Akinori Ito

Proceedings of Interspeech　2014/09/18
Accent type and phrase boundary estimation using acoustic and language models for automatic prosodic labeling Peer-reviewed

Tomoki Koriyama, Hiroshi Suzuki, Takashi Nose, Takahiro Shinozaki, Akinori Ito

Proceedings of Interspeech　2014/09/17
User modeling by using bag-of-behaviors for building a dialog system sensitive to the interlocutor's internal state Peer-reviewed

Yuya Chiba, Masashi Ito, Takashi Nose, Akinori Ito

Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue　2014/07/18
TEMPO MODIFICATION OF MUSIC SIGNAL USING SINUSOIDAL MODEL AND LPC-BASED RESIDUE MODEL Peer-reviewed

Akinori Ito, Yuki Igarashi, Masashi Ito, Takashi Nose

Proceedings of International Congress on Sound and Vibration　2014/07/13
User Modeling by Using Bag-of-Behaviors for Building a Dialog System Sensitive to the Interlocutor’s Internal State Peer-reviewed

Yuya Chiba, Takashi Nose, Akinori Ito, Masashi Ito

Proceedings of 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue　74　2014/06/18
Packet loss concealment of voice-over IP packet using redundant parameter transmission under severe loss conditions Peer-reviewed

Takeshi Nagano, Akinori Ito

Journal of Information Hiding and Multimedia Signal Processing　5　(2)　285-294　2014/04

ISSN： 2073-4212

eISSN： 2073-4239
Modeling User's State During Dialog Turn Using HMM For Multi-modal Spoken Dialog System Peer-reviewed

Yuya Chiba, Masashi Ito, Akinori Ito

Proceedings of The Seventh International Conference on Advances in Computer-Human Interactions　343-346　2014/03/02
Automatic evaluation of singing enthusiasm for karaoke Peer-reviewed

Ryunosuke Daido, Masashi Ito, Shozo Makino, Akinori Ito

Computer Speech and Language　28　(2)　501-517　2014/03

DOI： 10.1016/j.csl.2012.07.007 　

ISSN： 0885-2308

eISSN： 1095-8363
Speech recognition in a home environment using parallel decoding with GMM-based noise modeling Peer-reviewed

Kohei Machida, Takashi Nose, Akinori Ito

2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014　2014/02/12

DOI： 10.1109/APSIPA.2014.7041622 　
Controlling Switching Pause Using an AR Agent for Interactive CALL System Peer-reviewed

Naoto Suzuki, Takashi Nose, Akinori Ito, Yutaka Hiroi

Communications in Computer and Information Science　435 PART II　588-593　2014
Publisher: Springer Verlag
DOI： 10.1007/978-3-319-07854-0_102 　

ISSN： 1865-0929
Manipulation of vocal signal in mixed music signal using side information of F0 and backing spectrum Peer-reviewed

Akinori Ito, Yuto Sasaki

International Conference on Signal Processing Proceedings, ICSP　2015-January　(October)　605-609　2014

DOI： 10.1109/ICOSP.2014.7015075 　

ISSN： 2164-5221
Analysis of spectral enhancement using global variance in HMM-based speech synthesis Peer-reviewed

Takashi Nose, Akinori Ito

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH　2917-2921　2014

ISSN： 2308-457X

eISSN： 1990-9772
User modeling by using bag-of-behaviors for building a dialog system sensitive to the interlocutor's internal state Peer-reviewed

Yuya Chiba, Takashi Nose, Akinori Ito, Masashi Ito

SIGDIAL 2014 - 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Proceedings of the Conference　74-78　2014

DOI： 10.3115/v1/w14-4310 　
Tempo modification of music signal using sinusoidal model and LPC-based residue model Peer-reviewed

Akinori Ito, Yuki Igarashi, Masashi Ito, Takashi Nose

21st International Congress on Sound and Vibration 2014, ICSV 2014　1　928-935　2014
Modeling user's state during dialog turn using HMM for multi-modal spoken dialog system Peer-reviewed

Yuya Chiba, Akinori Ito, Masashi Ito

ACHI 2014 - 7th International Conference on Advances in Computer-Human Interactions　343-346　2014
Foreword to the special issue on the speech communication and its related technologies Peer-reviewed

Akinori Ito

Acoustical Science and Technology　34　(2)　63　2013

DOI： 10.1250/ast.34.63 　

ISSN： 1346-3969

eISSN： 1347-5177
ASAHI: OK for failure a robot for supporting daily life, equipped with a robot avatar Peer-reviewed

Yutaka Hiroi, Akinori Ito

ACM/IEEE International Conference on Human-Robot Interaction　141-142　2013

DOI： 10.1109/HRI.2013.6483541 　

ISSN： 2167-2148

eISSN： 2167-2148
Evaluation of robot design using virtual reality Peer-reviewed

Yutaka Hiroi, Akinori Ito

Transactions of the Virtual Reality Society of Japan　18　(2)　161-170　2013
Publisher: THE VIRTUAL REALITY SOCIETY OF JAPAN
DOI： 10.18974/tvrsj.18.2_161 　

ISSN： 1344-011X

More details Close

We can make a robot suitable for users' preference by designing its appearance and interaction through subjective evaluation. However, for evaluating users' impressions using real robots, it is necessary to build many robots with various specifications such as height, which is time-consuming and costly. In this paper, we propose a robot design methodology based on augmented reality (AR). We conducted experiments to evaluate a robot's head size using both AR and real robots, and similar results were obtained from both evaluation experiments in an environment with simple background. Next, we conducted experiments to evaluate a robot's head size using both AR and real robots in a real environment, and similar results were obtained from both evaluation experiments. From these experiments, we can conclude that the CG-based robot evaluation is as effective as that using real robots. In addition, the AR technology enables us to evaluate the robot in a real environment, which realizes more realistic evaluation of robot design without building real robots.
Estimation of User's State during a Dialog Turn with Sequential Multi-modal Features Peer-reviewed

Yuya Chiba, Masashi Ito, Akinori Ito

Communications in Computer and Information Science　374　(PART II)　572-576　2013
Publisher: Springer Verlag
DOI： 10.1007/978-3-642-39476-8_115 　

ISSN： 1865-0929
Multi-modal voice activity detection by embedding image features into speech signal Peer-reviewed

Yohei Abe, Akinori Ito

Proceedings - 2013 9th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2013　271-274　2013
Publisher: IEEE Computer Society
DOI： 10.1109/IIH-MSP.2013.76 　
Acoustic features and auditory impressions of death growl and screaming voice Peer-reviewed

Keizo Kato, Akinori Ito

Proceedings - 2013 9th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2013　460-463　2013

DOI： 10.1109/IIH-MSP.2013.120 　
Speech recognition under noisy environments using multiple microphones based on asynchronous and intermittent measurements Peer-reviewed

Kohei Machida, Akinori Ito

2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013　1-4　2013

DOI： 10.1109/APSIPA.2013.6694362 　
ASAHI: OK for Failure A Robot for Supporting Daily Life, Equipped with a Robot Avatar Peer-reviewed

Yutaka Hiroi, Akinori Ito

PROCEEDINGS OF THE 8TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI 2013)　141-+　2013

DOI： 10.1109/HRI.2013.6483541 　

ISSN： 2167-2121
A packet loss recovery of G.729 speech using discriminative model and N-gram Peer-reviewed

Takeshi Nagano, Akinori Ito

Proceedings - 2013 9th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2013　267-270　2013

DOI： 10.1109/IIH-MSP.2013.75 　
Evaluation of sinusoidal modeling for polyphonic music signal Peer-reviewed

Yuki Igarashi, Masashi Ito, Akinori Ito

Proceedings - 2013 9th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2013　464-467　2013

DOI： 10.1109/IIH-MSP.2013.121 　
A Mobile Robot System With Semi-Autonomous Navigation Using Simple And Robust Person Following Behavior Peer-reviewed

Yutaka Hiroi, Shohei Matsunaka, Akinori Ito

Journal of Man, Machine and Technology　1　(1)　44-62　2012/12

DOI： 10.4156/jmmt.vol1.issue1.4 　
Packet Loss Concealment of VoIP Under Severe Loss Conditions Peer-reviewed

Akinori Ito, Takeshi Nagano

International Symposium on Wireless Personal Multimedia Communication　2012/09/24
Advanced Information Hiding for G.711 Telephone Speech Peer-reviewed

Akinori Ito, Yoiti Suzuki

Multimedia Information Hiding Technologies and Methodologies for Controlling Data　2012/09/23
Model shrinkage for discriminative language models Peer-reviewed

Takanobu Oba, Takaaki Hori, Atsushi Nakamura, Akinori Ito

IEICE Transactions on Information and Systems　E95-D　(5)　1465-1474　2012/05

DOI： 10.1587/transinf.E95.D.1465 　

ISSN： 0916-8532

eISSN： 1745-1361
Effect of Linguistic Contents on Human Estimation of Internal State of Dialog System Users Peer-reviewed

Yuya Chiba, Masashi Ito, Akinori Ito

Proceedings of The Interdisciplinary Workshop on Feedback Behavior in Dialog　11-14　2012
Round-robin duel discriminative language models Peer-reviewed

Takanobul Oba, Takaaki Hori, Atsushi Nakamura, Akinori Ito

IEEE Transactions on Audio, Speech and Language Processing　20　(4)　1244-1255　2012

DOI： 10.1109/TASL.2011.2174225 　

ISSN： 1558-7916

eISSN： 1558-7924
Robust Transmission of Audio Signals over the Internet: An Advanced Packet Loss Concealment for MP3-Based Audio Signals Peer-reviewed

Akinori Ito, Kiyoshi Konno, Masashi Ito, Shozo Makino

Interdisciplinary Information Sciences　18　(2)　99-105　2012
Publisher: The Editorial Committee of the Interdisciplinary Information Sciences
DOI： 10.4036/iis.2012.99 　

ISSN： 1340-9050

More details Close

This paper describes packet loss concealment methods for MP3 audio. The proposed methods are based on estimation of modified discrete cosine transform (MDCT) coefficients of the lost packets. The estimation of MDCT coefficients of lower dimensions is performed by switching two concealment methods: the sign correction method and the correlation-based method. The concealment methods are switched based on redundant side information calculated subband-by-subband for reducing MDCT prediction errors. Next, a method for improving estimation of MDCT coefficients of higher dimensions was proposed. The method estimates the absolute value and sign of an MDCT coefficient independently. The subjective evaluation experiment proved that both of the improvement methods for lower and higher dimensions effectively improved the subjective audio quality.
Mobile Robot System With Semi-Autonomous Navigation Using Simple And Robust Person Following Behavior Peer-reviewed

Yutaka Hiroi, Shohei Matsunaka, Akinori Ito

Journal of Man, Machine and Technology　1　(1)　44-62　2012
Spoken document retrieval by discriminative modeling in a high dimensional feature space Peer-reviewed

Takanobu Oba, Takaaki Hori, Atsushi Nakamura, Akinori Ito

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings　5153-5156　2012

DOI： 10.1109/ICASSP.2012.6289080 　

ISSN： 1520-6149
Estimating a user's internal state before the first input utterance Peer-reviewed

Yuya Chiba, Akinori Ito

Advances in Human-Computer Interaction　2012　2012

DOI： 10.1155/2012/865362 　

ISSN： 1687-5893

eISSN： 1687-5907
Effect of robot height on comfortableness of spoken dialog Peer-reviewed

Yutaka Hiroi, Takayuki Nakayama, Hisanori Kuroda, Shinji Miyake, Akinori Ito

International Conference on Human System Interaction, HSI　29-34　2012

DOI： 10.1109/HSI.2012.14 　

ISSN： 2158-2246

eISSN： 2158-2254
Estimation of user's internal state before the user's first utterance using acoustic features and face orientation Peer-reviewed

Yuya Chiba, Masashi Ito, Akinori Ito

International Conference on Human System Interaction, HSI　23-28　2012

DOI： 10.1109/HSI.2012.13 　

ISSN： 2158-2246

eISSN： 2158-2254
Recognition of utterances with grammatical mistakes based on optimization of language model towards interactive CALL systems Peer-reviewed

Takuya Anzai, Akinori Ito

2012 Conference Handbook - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2012　2012
A Japanese lyrics writing support system for amateur songwriters Peer-reviewed

Chihiro Abe, Akinori Ito

2012 Conference Handbook - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2012　2012
A spoken dialogue system using virtual conversational agent with augmented reality Peer-reviewed

Shinji Miyake, Akinori Ito

2012 Conference Handbook - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2012　2012
A packet loss recovery of G.729 speech under severe packet loss condition Peer-reviewed

Takeshi Nagano, Akinori Ito

2012 Conference Handbook - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2012　2012
Automatic assessment of easiness of Japanese for writing aid of "Easy Japanese" Peer-reviewed

Meng Zhang, Akinori Ito, Kazuyuki Sato

ICALIP 2012 - 2012 International Conference on Audio, Language and Image Processing, Proceedings　303-307　2012

DOI： 10.1109/ICALIP.2012.6376630 　
Packet loss concealment of VoIP under severe loss conditions Peer-reviewed

Akinori Ito, Takeshi Nagano

International Symposium on Wireless Personal Multimedia Communications, WPMC　489-490　2012

ISSN： 1347-6890
Influence of the size factor of a mobile robot moving toward a human on subjective acceptable distance Peer-reviewed

Hiroi, Yutaka, Ito, Akinori

Mobile Robots-Current Trends　177-190　2011/10/26
Publisher: IntechOpen
A System for Evaluating Singing Enthusiasm for Karaoke Peer-reviewed

Ryunosuke Daido, Seong-Jun Hahm, Masashi Ito, Shozo Makino, Akinori Ito

Proceedings of International Society of Music Information Retrieval Conference　31-36　2011/10/24
Find out what a user doing before the first utterance: discrimination of user's internal state using non-verbal information Peer-reviewed

Yuya Chiba, Akinori Ito

Proceedings of Asian-Pacific Signal and Information Processing Association Annual Summit and Conference　2011/10/19
Utterance classification for combination of multiple simple dialog systems Peer-reviewed

Seong Jun Hahm, Akinori Ito, Kentaro Awano, Masashi Ito, Shozo Makino

Proceedings - 9th IEEE International Symposium on Parallel and Distributed Processing with Applications Workshops, ISPAW 2011 - ICASE 2011, SGH 2011, GSDP 2011　171-176　2011

DOI： 10.1109/ISPAW.2011.74 　
Bit rate reduction of the MELP coder using Lempel-Ziv segment quantization Peer-reviewed

Minoru Kohata, Motoyuki Suzuki, Akinori Ito, Shozo Makino

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings　5240-5243　2011

DOI： 10.1109/ICASSP.2011.5947539 　

ISSN： 1520-6149
Round-robin duel discriminative language models in one-pass decoding with on-the-fly error correction Peer-reviewed

Takanobu Oba, Takaaki Hori, Akinori Ito, Atsushi Nakamura

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings　5588-5591　2011

DOI： 10.1109/ICASSP.2011.5947626 　

ISSN： 1520-6149
Evaluation of Abnormal Sound Detection using Multi-stage GMM in Various Environments Peer-reviewed

Akinori Ito, Akihito Aiba, Masashi Ito, Shozo Makino

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5　308-+　2011
Training a language model using webdata for large vocabulary Japanese spontaneous speech recognition Peer-reviewed

Ryo Masumura, Seongjun Hahm, Akinori Ito

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH　1465-1468　2011

eISSN： 1990-9772
Language model expansion using webdata for spoken document retrieval Peer-reviewed

Ryo Masumura, Seongjun Hahm, Akinori Ito

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH　2133-2136　2011

eISSN： 1990-9772
Manipulating vocal signal in mixed music sounds using small amount of side information Peer-reviewed

Yuto Sasaki, Seong Jun Hahm, Akinori Ito

Proceedings - 7th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIHMSP 2011　298-301　2011

DOI： 10.1109/IIHMSP.2011.21 　
Evaluation of abnormal sound detection using multi-stage GMM in various environments Peer-reviewed

Akinori Ito, Akihito Aiba, Masashi Ito, Shozo Makino

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH　301-304　2011

eISSN： 1990-9772
Toward human-robot interaction design through human-human interaction experiment Peer-reviewed

Yutaka Hiroi, Akinori Ito

Lecture Notes in Electrical Engineering　133 LNEE　(VOL. 2)　127-130　2011

DOI： 10.1007/978-3-642-25992-0_18 　

ISSN： 1876-1100

eISSN： 1876-1119
Training a language model using webdata for large vocabulary Japanese spontaneous speech recognition Peer-reviewed

Ryo Masumura, Seongjun Hahm, Akinori Ito

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5　1476-1479　2011
A system for evaluating singing enthusiasm for karaoke Peer-reviewed

Ryunosuke Daido, Seong Jun Hahm, Masashi Ito, Shozo Makino, Akinori Ito

Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011　31-36　2011
Language model expansion using webdata for spoken document retrieval Peer-reviewed

Ryo Masumura, Seongjun Hahm, Akinori Ito

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5　2144-2147　2011
Find out what a user is doing before the first utterance: Discrimination of user's internal state using non-verbal information Peer-reviewed

Yuya Chiba, Seongjun Hahm, Akinori Ito

APSIPA ASC 2011 - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011　906-909　2011
Multiple description coding using time domain division for MP3 coded sound signal Peer-reviewed

Ho seok Wey, Akinori Ito, Takuma Okamoto, Yoiti Suzuki

Journal of Information Hiding and Multimedia Signal Processing　1　(4)　269-285　2010/10

ISSN： 2073-4212

eISSN： 2073-4239
Speech recognition under multiple noise environment based on multi-mixture HMM and weight optimization by the aspect model Peer-reviewed

Seong Jun Hahm, Yuichi Ohkawa, Masashi Ito, Motoyuki Suzuki, Akinori Ito, Shozo Makino

IEICE Transactions on Information and Systems　E93-D　(9)　2407-2416　2010/09

DOI： 10.1587/transinf.E93.D.2407 　

ISSN： 0916-8532

eISSN： 1745-1361
Evaluation of head size of an interactive robot using augmented reality Peer-reviewed

Yutaka Hiroi, Shuhei Hisano, Akinori Ito

Proceedings of International Symposium on Robotics and Automation　2010/09
An HMM‐based segment quantizer and its application to low bit rate speech coding Peer-reviewed

Motoyuki Suzuki, Masashi Adachi, Minoru Kohata, Akinori Ito, Shozo Makino, Fuji Ren

Proceedings of International Congress on Acoustics　2010/08
Multiple description coding for MP3 coded sound signal Peer-reviewed

Ho-seok Wey, Akinori Ito, Takuma Okamoto, Yoiti Suzuki

Proceedings of International Congress on Acoustics　2010/08
Improved reference speaker weighting using aspect model Peer-reviewed

Seong Jun Hahm, Yuichi Ohkawa, Masashi Ito, Motoyuki Suzuki, Akinori Ito, Shozo Makino

IEICE Transactions on Information and Systems　E93-D　(7)　1927-1935　2010/07

DOI： 10.1587/transinf.E93.D.1927 　

ISSN： 0916-8532

eISSN： 1745-1361
Information hiding for G.711 speech based on substitution of least significant bits and estimation of tolerable distortion Peer-reviewed

Akinori Ito, Shun'Ichiro Abe, Yôiti Suzuki

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences　E93-A　(7)　1279-1286　2010/07

DOI： 10.1587/transfun.E93.A.1279 　

ISSN： 0916-8508

eISSN： 1745-1337
Bit Rate Reduction of Vocoder-Type Speech Coder by Reducing Temporal Redundancy Peer-reviewed

KOHATA Minoru, SUZUKI Motoyuki, ITO Akinori, MAKINO Syouzou

The IEICE transactions on information and systems　93　(5)　588-597　2010/05
Publisher: 一般社団法人電子情報通信学会
ISSN： 1880-4535

More details Close

これまでに筆者らは,連続情報源に含まれる時間的な冗長性を圧縮符号化する方式として,新しいセグメント量子化法であるLempel-Ziv Segment Quantization(LZSQ)を提案した.これは,離散情報源用の圧縮法であるLZ符号化を連続情報源に適用できるように修正したものである.本論文ではLZSQをボコーダ型の低ビット音声符号化方式に適用し,時間冗長性を圧縮することにより,更なるビットレートの削減を試みる.ボコーダ型符号化においては音質を維持するためにはビットレートの下限が2.4kbit/s程度であるといわれているが,LZSQを適用することで,音質を維持しつつ更に低レート化することが可能となる.本論文では,標準化されているボコーダ型音声符号化方式の一つである2.4kbit/sMELP符号化の6個の符号化パラメータにLZSQを適用することにより,MELP符号化と同等の音質を維持しつつ極限までビットレートを削減することを試みた.その結果,総ビットレートを約1.57kbit/sまで低減することができた.
Packet loss concealment for mdct-based audio codec using correlation-based side information Peer-reviewed

Akinori Ito, Toshiyuki Sakai, Kiyoshi Konno, Shozo Makino, Motoyuki Suzuki

International Journal of Innovative Computing, Information and Control　6　(3)　1347-1361　2010/03

ISSN： 1349-4198
Intonation evaluation of english utterances using synthesized speech for computer-assisted language learning Peer-reviewed

Akinori Ito, Tomoaki Konno, Masashi Ito, Shozo Makino, Motoyuki Suzuki

International Journal of Innovative Computing, Information and Control　6　(3)　1501-1514　2010/03

ISSN： 1349-4198
A Constant-bitrate Information Hiding into G.711 Speech Using ADPCM Output and Sample Magnitude Peer-reviewed

Akinori Ito, Hironori Handa, Yoiti Suzuki

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences　J93-A　(2)　82-90　2010/02
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5707
Source-filter separation for nonstationary voiced speech based on sinusoidal representation Peer-reviewed

Masashi Ito, Keiji Ohara, Akinori Ito, Masafumi Yano

Acoustical Science and Technology　31　(2)　181-184　2010

DOI： 10.1250/ast.31.181 　

ISSN： 1346-3969

eISSN： 1347-5177
Designing side information of multiple description coding Peer-reviewed

Akinori Ito, Shozo Makino

Journal of Information Hiding and Multimedia Signal Processing　1　(1)　10-19　2010/01

ISSN： 2073-4212

eISSN： 2073-4239
Aspect-model-based reference speaker weighting Peer-reviewed

Seongjun Hahm, Yuichi Ohkawa, Masashi Ito, Motoyuki Suzuki, Akinori Ito, Shozo Makino

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings　4302-4305　2010

DOI： 10.1109/ICASSP.2010.5495672 　

ISSN： 1520-6149
Document expansion using relevant web documents for spoken document retrieval Peer-reviewed

Ryo Masumura, Akinori Ito, Yu Uno, Masashi Ito, Shozo Makino

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2010　612-619　2010

DOI： 10.1109/NLPKE.2010.5587854 　
An Effect of Formant Amplitude in Vowel Perception Peer-reviewed

Masashi Ito, Keiji Ohara, Akinori Ito, Masafumi Yano

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4　2494-+　2010
Improvement of packet loss concealment for MP3 audio based on switching of concealment method and estimation of MDCT signs Peer-reviewed

Akinori Ito, Kiyoshi Konno, Masashi Itot, Shozo Makino

Proceedings - 2010 6th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIHMSP 2010　518-521　2010

DOI： 10.1109/IIHMSP.2010.132 　
A query-by-humming music information retrieval from audio signals based on multiple F0 candidates Peer-reviewed

Akinori Ito, Yu Kosugi, Shozo Makino, Masashi Ito

ICALIP 2010 - 2010 International Conference on Audio, Language and Image Processing, Proceedings　1-5　2010

DOI： 10.1109/ICALIP.2010.5685029 　
A spoken dialog system based on automatically-generated example database Peer-reviewed

Akinori Ito, Takahiro Morimoto, Shozo Makino, Masashi Ito

ICALIP 2010 - 2010 International Conference on Audio, Language and Image Processing, Proceedings　732-736　2010

DOI： 10.1109/ICALIP.2010.5685069 　
Grammatical error detection from English utterances spoken by Japanese Peer-reviewed

Takuya Anzai, Seongjun Hahm, Akinori Ito, Masashi Ito, Shozo Makino

APSIPA ASC 2010 - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference　482-485　2010
Speech recognition based on tree-structured clustering and aspect model in multiple noise environments Peer-reviewed

Seong Jun Hahm, Yuichi Ohkawa, Motoyuki Suzuki, Masashi Ito, Shozo Makino, Akinori Ito

APSIPA ASC 2010 - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference　454-457　2010
Evaluation of head size of an interactive robot using an augmented reality Peer-reviewed

Yutaka Hiroi, Shuhei Hisano, Akinori Ito

2010 World Automation Congress, WAC 2010　2010
An effect of formant amplitude in vowel perception Peer-reviewed

Masashi Ito, Keiji Ohara, Akinori Ito, Masafumi Yano

Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010　2490-2493　2010
Multiple description coding for an MP3 coded sound signal Peer-reviewed

Ho Seok Wey, Akinori Ito, Takuma Okamoto, Yôiti Suzuki

20th International Congress on Acoustics 2010, ICA 2010 - Incorporating Proceedings of the 2010 Annual Conference of the Australian Acoustical Society　4　3081-3088　2010
An HMM-based segment quantizer and its application to low bit rate speech coding Peer-reviewed

Motoyuki Suzuki, Masashi Adachi, Minoru Kohata, Akinori Ito, Shozo Makino, Fuji Ren

20th International Congress on Acoustics 2010, ICA 2010 - Incorporating Proceedings of the 2010 Annual Conference of the Australian Acoustical Society　5　3877-3880　2010
A speaker adaptation method for non-native speech using learners' native utterances for computer-assisted language learning systems Peer-reviewed

Yuichi Ohkawa, Motoyuki Suzuki, Hirokazu Ogasawara, Akinori Ito, Shozo Makino

SPEECH COMMUNICATION　51　(10)　875-882　2009/10

DOI： 10.1016/j.specom.2009.05.005 　

ISSN： 0167-6393

eISSN： 1872-7182
Multiple Description Coding of Flash Video based on Adaptive Allocation of DCT Coefficients Peer-reviewed

Akinori Ito, Takuya Kuraishi, Masashi Ito, Shozo Makino

Proc. 1st Asian-Pacific Signal&Info. Proc. Assoc. Annual Summit & Conf. (APSIPA ASC 2009)　2009/10
Query-by-Humming based Music Information Retrieval System Based on Novel Tonal Feature and Statistical Modeling Peer-reviewed

Motoyuki Suzuki, Takuto Ichikawa, Akinori Ito, Shozo Makino

IPSJ Journal　50　(3)　1100-1110　2009/03
Novel Tonal Feature and Statistical User Modeling for Query-by-Humming

Suzuki Motoyuki, Ichikawa Takuto, Ito Akinori, Makino Shozo

Information and Media Technologies　4　(2)　498-508　2009
Publisher: Information and Media Technologies Editorial Board
DOI： 10.11185/imt.4.498 　

More details Close

This paper describes a query-by-humming (QbH) music information retrieval (MIR) system based on a novel tonal feature and statistical modeling. Most QbH-MIR systems use a pitch extraction method in order to obtain tonal features of an input humming. In these systems, pitch extraction errors inevitably occur and degrade the performance of the system. In the proposed system, a cross-correlation function between two logarithmic frequency spectra is calculated as a tonal feature instead of a difference of two successive pitch frequencies, and probabilistic models are prepared for all tone intervals existing in the database. The similarity scores between an input humming and musical pieces in a database are calculated using the probabilistic models. The advantages of this system are that it can obtain more appropriate tonal features than the pitch-based method, and it is also robust against inaccurate humming by the user thanks to its statistical approach. From experimental results, the top-1 retrieval accuracy given by the proposed method was 86.8%, which was more than 10 points higher than the conventional single pitch method. Moreover, several integration methods were applied to the proposed method with several conditions. The majority decision method showed the highest accuracy, and 5% reduction of retrieval error was obtained.
Dictation of Japanese Speech Based on Kana and Kanji Character String Peer-reviewed

Ito, Akinori, Kinno, Hiroaki, Katoh, Masaharu, Kosaka, Tetsuo, Kohda, Masaki

International Journal of Computer Processing Of Languages　22　(01)　75-98　2009
Publisher: World Scientific
Fast and Robust Training of a Probabilistic Latent Semantic Analysis Model by the Parallel Learning and Data Segmentation Peer-reviewed

Kato, Masaharu, Kosaka, Tetsuo, Ito, Akinori, Makino, Shozo

Journal of Communication and Computer　6　(5)　28-35　2009
Publisher: 美國大衛出版公司
Evaluation of Robot-Avatar-based User-Familiarity Improvement for Elderly People Peer-reviewed

Yutaka Hiroi, Akinori Ito

Kansei Engineering International　8　(1)　59-66　2009/01

DOI： 10.5057/ER080218-1 　
Effect of the size factor on psychological threat of a mobile robot moving toward human Peer-reviewed

Hiroi, Yutaka, Ito, Akinori

KANSEI Engineering International　8　(1)　51-58　2009/01
Publisher: Japan Society of Kansei Engineering
DOI： 10.5057/ER080206-1 　
Bit rate reduction of mixed excitation linear prediction coder by Lempel-Ziv segment quantization Peer-reviewed

Minora Kohata, Motoyuki Suzuki, Akinori Ito, Shozo Makino

Acoustical Science and Technology　30　(2)　136-138　2009

DOI： 10.1250/ast.30.136 　

ISSN： 1346-3969 1347-5177
INFORMATION HIDING FOR G.711 SPEECH BASED ON SUBSTITUTION OF LEAST SIGNIFICANT BITS AND ESTIMATION OF TOLERABLE DISTORTION Peer-reviewed

Akinori Ito, Shun'ichiro Abe, Yoiti Suzuki

2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS　1409-+　2009

DOI： 10.1109/ICASSP.2009.4959857 　

ISSN： 1520-6149
Detection of abnormal sound using multi-stage GMM for surveillance microphone Peer-reviewed

Akinori Ito, Akihito Aiba, Masashi Ito, Shozo Makino

5th International Conference on Information Assurance and Security, IAS 2009　1　733-736　2009

DOI： 10.1109/IAS.2009.160 　
A band extension of G.711 speech with low computational cost for data hiding application Peer-reviewed

Akinori Ito, Hironori Handa, Yôiti Suzuki

IIH-MSP 2009 - 2009 5th International Conference on Intelligent Information Hiding and Multimedia Signal Processing　491-494　2009

DOI： 10.1109/IIH-MSP.2009.69 　
Data hiding is a better way for transmitting side information for MP3 bitstream Peer-reviewed

Akinori Ito, Shozo Makino

IIH-MSP 2009 - 2009 5th International Conference on Intelligent Information Hiding and Multimedia Signal Processing　495-498　2009

DOI： 10.1109/IIH-MSP.2009.55 　
Relative importance of formant and whole-spectral cues for vowel perception Peer-reviewed

Masashi Ito, Keiji Ohara, Akinori Ito, Masafumi Yano

INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5　132-+　2009
Evaluation of English Intonation based on Combination of Multiple Evaluation Scores Peer-reviewed

Akinori Ito, Tomoaki Konno, Masashi Ito, Shozo Makino

INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5　596-599　2009
Detailed description of triphone model using SSS-free algorithm Peer-reviewed

Motoyuki Suzuki, Daisuke Honma, Akinori Ito, Shozo Makino

INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5　1403-+　2009
Relevant document retrieval using a spoken document Peer-reviewed

Akinori Ito, Yu Uno, Ryo Masumura, Masashi Ito, Shozo Makino

2009 9th International Symposium on Communications and Information Technology, ISCIT 2009　1483-1488　2009

DOI： 10.1109/ISCIT.2009.5341051 　
Multiple description coding for wideband audio signal transmission Peer-reviewed

Hoseok Wey, Akinori Ito, Yôiti Suzuki

Proceedings of 2009 IEEE International Conference on Network Infrastructure and Digital Content, IEEE IC-NIDC2009　769-773　2009

DOI： 10.1109/ICNIDC.2009.5360882 　
Automatic query generation and query relevance measurement for unsupervised language model adaptation of speech recognition Peer-reviewed

Akinori Ito, Yasutomo Kajiura, Motoyuki Suzuki, Shozo Makino

Eurasip Journal on Audio, Speech, and Music Processing　2009　2009

DOI： 10.1155/2009/140575 　

ISSN： 1687-4714

eISSN： 1687-4722
Are Bigger Robots Scary? -The Relationship between Robot Size and Psychological Threat- Peer-reviewed

Yutaka Hiroi, Akinori Ito

Proceedings of International Conference on Advanced Intelligent Mechatronics　540-545　2008/07
Improvement of user familiarity using robot avatar Peer-reviewed

Yutaka Hiroi, Akinori Ito, Eiji Nakano

Journal of Japan Society of Kansei Engineering　7　(4)　797-805　2008/04
Publisher: Japan Society of Kansei Engineering
DOI： 10.5057/jjske2001.7.797 　

ISSN： 1346-1958

More details Close

Familiarity is one of the most important requirements for human symbiosis robots such as care service robot. Many studies have been made to provide robots with the familiarity by improving their appearance, facial expression and smoothness of the movement. This paper presents a new concept, called a "robot avatar."A robot avatar is a small robot mounted on a main robot and equipped with minimum function to play some gestures according to every scene of the task execution of the main robot. By looking at the avatar, a user feels as if the avatar is controlling the main robot. Therefore a user is informed of the next behavior of the main robot by the avatar. A prototype of the avatar named CHIRIS is designed and installed to an intelligent service robot IRIS developed by the authors. IRIS can execute some simple tasks such as serving beverages by verbal request of the user. Utilizing CHIRIS, some psychological tests about the impression of IRIS during its task execution were carried out using video. Test results showed that CHIRIS is effective to give more familiar impression to the users.
Multiple description coding of an audio stream by optimum recovery transforms Peer-reviewed

Akinori Ito, Shozo Makino

Journal of Digital Information Management　6　(2)　189-195　2008/04
Selection of optimum vocabulary and dialog strategy for noise-robust spoken dialog systems Peer-reviewed

Akinori Ito, Takanobu Oba, Takashi Konashi, Motoyuki Suzuki, Shozo Makino

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E91D　(3)　538-548　2008/03

DOI： 10.1093/ietisy/e9l-d.3.538 　

ISSN： 0916-8532
Improvement of Automatic English Prosody Evaluation Based on Word Clustering Using a Decision Tree Peer-reviewed

Akinori Ito, Tatsuki Konno, Motoyuki Suzuki, Shozo Makino

The IEICE Transaction on Information and Systems　J91-D　(2)　358-366　2008/02
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 1880-4535
Suppression of Internal Noise for Speech Recognition of Small Robots Peer-reviewed

Akinori Ito, Takashi Kanayama, Motoyuki Suzuki, Shozo Makino

Journal of Human Interface Society　10　(1)　1-10　2008/02
Automatic evaluation system of English prosody based on word importance factor Peer-reviewed

Suzuki, Motoyuki, Konno, Tatsuki, Ito, Akinori, Makino, Shozo

Journal of Systemics, Cybernetics and Informatics　6　(4)　83-90　2008
An unsupervised language model adaptation based on keyword clustering and query availability estimation Peer-reviewed

Akinori Ito, Yasutomo Kajiura, Shozo Makino, Motoyuki Suzuki

2008 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING, VOLS 1 AND 2, PROCEEDINGS　1412-1418　2008

DOI： 10.1109/ICALIP.2008.4590103 　
Packet loss concealment for MDCT-based audio codec using correlation-based side information Peer-reviewed

Akinori Ito, Kiyoshi Konno, Shozo Makino, Motoyuki Suzuki

2008 FOURTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, PROCEEDINGS　612-+　2008

DOI： 10.1109/IIH-MSP.2008.103 　
Discrimination of Task-Related Words for Vocabulary Design of Spoken Dialog Systems Peer-reviewed

Akinori Ito, Toyomi Meguro, Shozo Makino, Motoyuki Suzuki

INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5　207-+　2008
A Fast Speaker Adaptation Method using Aspect Model Peer-reviewed

Seongjun Hahm, Akinori Ito, Shozo Makino, Motoyuki Suzuki

INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5　1221-1224　2008
Recognition of English Utterances with Grammatical and Lexical Mistakes for Dialogue-based CALL System Peer-reviewed

Akinori Ito, Ryohei Tsutsui, Shozo Makino, Motoyuki Suzuki

INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5　2819-2822　2008
Intonation Evaluation of English Utterances using Synthesized Speech for Computer-Assisted Language Learning Peer-reviewed

Tomoaki Konno, Masashi Ito, Motoyuki Suzuki, Akinori Ito, Shozo Makino

IEEE NLP-KE 2008: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING　202-+　2008

DOI： 10.1109/NLPKE.2008.4906807 　
Application of Multiple Description Scalar Quantization to LogPCM and ADPCM Peer-reviewed

Ho-seok Wey, Ryouichi Nishimura, Akinori Ito, Maori Kobayashi, Yoiti Suzuki

The IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences　J90-A　(12)　918-921　2007/12
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5707
Reduction Method of Side Information for Packet Loss Concealment Based on Spectrum Striping Coding Peer-reviewed

Motoyuki Suzuki, Toshiyuki Sakai, Akinori Ito, Shozo Makino

Proceedings of 19th International Congress of Acoustics　2007/09
Detection and Direction Estimation of Calling Voice Peer-reviewed

Akinori Ito, Kota Kitadate, Motoyuki Suzuki, Shozo Makino

Proceedings of 19th International Congress of Acoustics　2007/09
Packet Loss Concealment of an Audio Stream by Time Domain and Frequency Domain Multiple Description Peer-reviewed

Akinori Ito, Toshiyuki Sakai, Motoyuki Suzuki, Shozo Makino

Proceedings of Japan-China Joint Conference on Acoustics　2007/06
Application of Multiple Description (MD) scalar quantization to speech codec Peer-reviewed

Ho seok Wey, Ryouichi Nishimura, Akinori Ito, Maori Kobayashi, Yoiti Suzuki

Proceedings of Japan-China Joint Conference on Acoustics　2007/06
A new segment quantization using Lempel-Ziv algorithm and its application to quantization of line spectral frequencies Peer-reviewed

Minoru Kohata, Motoyuki Suzuki, Akinori Ito, Shozo Makino

IEEE TRANSACTIONS ON COMMUNICATIONS　55　(4)　661-664　2007/04

DOI： 10.1109/TCOMM.2007.894090 　

ISSN： 0090-6778
Music information retrieval from a singing voice using lyrics and melody information Peer-reviewed

Motoyuki Suzuki, Toru Hosoya, Akinori Ito

Eurasip Journal on Advances in Signal Processing　2007　2007

DOI： 10.1155/2007/38727 　

ISSN： 1110-8657 1687-0433
Automatic evaluation system of English prosody for Japanese learner's speech Peer-reviewed

Motoyuki Suzuki, Tatsuki Konno, Akinori Ito, Shozo Makino

IMSCI '07: INTERNATIONAL MULTI-CONFERENCE ON SOCIETY, CYBERNETICS AND INFORMATICS, VOL 1, PROCEEDINGS　48-53　2007
Increasing correlation using a few bits for multiple description coding Invited Peer-reviewed

Akinori Ito, Shozo Makino

2007 THIRD INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, VOL II, PROCEEDINGS　259-262　2007

DOI： 10.1109/IIHMSP.2007.4457700 　
Music information retrieval from a singing voice using lyrics and melody information Peer-reviewed

Motoyuki Suzuki, Toru Hosoya, Akinori Ito, Shozo Makino

EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING　2007

DOI： 10.1155/2007/38727 　

ISSN： 1687-6180
Pronunciation error detection for computer-assisted language learning system based on error rule clustering using a decision tree Peer-reviewed

Akinori Ito, Yen-Ling Lim, Motoyuki Suzuki, Shozo Makino

Acoustical Science and Technology　28　(2)　131-133　2007

DOI： 10.1250/ast.28.131 　

ISSN： 1346-3969 1347-5177
A Phoneme Duration Model Considering Speaking-rate and Linguistic Features for Speech Recognition Peer-reviewed

大河雄一, 伊藤彰則, 鈴木基之, 牧野正三

Journal of Information Processing Society Japan　47　(12)　3380-3391　2006/12
Music Information Retrieval from a Singing Voice Based on Verification of Recognized Hypotheses Peer-reviewed

Motoyuki Suzuki, Toru Hosoya, Akinori Ito, Shozo Makino

Proceedings of 11th International Conference on Music Information Retrieval　168-171　2006/10
A New Segment Quantization of LSP Parameters with Lempel-Ziv Algorithm Peer-reviewed

Minoru Kohata, Motoyuki Suzuki, Akinori Ito, Shozo Makino

IEICE Transaction on Information and Systems　J89-D　(7)　1504-1513　2006/07
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 1880-4535
Evaluation of multiple PLSA adaptation based on separation of topic and style words Invited Peer-reviewed

Akinori Ito, Naoto Kuriyama, Motoyuki Suzuki, Shozo Makino

Proceedings of 9th Western-Pacific Acoustic Conference　2006/06
Packet loss concealment of audio stream based on multiple description by spectrum striping Invited Peer-reviewed

Motoyuki Suzuki, Toshiyuki Sakai, Jie Liu, Akinori Ito, Shozo Makino

Proceedings of 9th Western-Pacific Acoustic Conference　2006/06
An effective music information retrieval method using three-dimensional continuous DP Peer-reviewed

SP Heo, M Suzuki, A Ito, S Makino

IEEE TRANSACTIONS ON MULTIMEDIA　8　(3)　633-639　2006/06

DOI： 10.1109/TMM.2006.870717 　

ISSN： 1520-9210
Generating search query in unsupervised language model adaptaion using www

Kajiura, Yasutomo, Suzuki, Motoyuki, Ito, Akinori, Makino, Shozo

The Journal of the Acoustical Society of America　120　(5)　3043-3044　2006
Publisher: ASA
A grammatical error detection method for dialogue-based CALL system

Kweon Oh-pyo, Ito Akinori, Suzuki Motoyuki, Makino Shozo

Information and Media Technologies　1　(1)　391-410　2006
Publisher: Information and Media Technologies Editorial Board
DOI： 10.11185/imt.1.391 　

More details Close

This paper describes a method to detect grammatical errors from a non-native speaker's utterance for a dialogue-based CALL (Computer Assisted Language Learning) system. For conversation exercises, several dialogue-based CALL systems were developed. However, one of the problems in conventional dialogue-based CALL systems is that a learner is usually assigned a passive role. The goal of our system is to allow a learner to compose his/her own sentences freely in a role-playing situation. One of the biggest problems in realizing the proposed system is that the learner's utterance inevitably contains pronunciation, lexical and grammatical errors. In this paper, we focus on the correction of the lexical and grammatical errors. To correct these errors, we propose two methods to detect lexical/grammatical errors in an utterance. The conventional methods are to write a grammar that accepts the errors manually. The proposed methods 1 and 2 use the `error rules' that are independent of the recognition grammar. The method 1 uses only correct system grammar and extends the recognition results using the `error rules'. The method 2 uses a general grammar (which does not consider the relationship between verb, particle and each noun) to recognize the learner's utterance and check acceptance of each N-best result and searches the learner's utterance. The grammar error detection experiment proved that the method 2 performs as well as the conventional method.
Unsupervised language model adaptation based on automatic text collection from WWW Peer-reviewed

Motoyuki Suzuki, Yasutomo Kajiura, Akinori Ito, Shozo Makino

INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5　2202-2205　2006
A User Simulator based on VoiceXML for evaluation of spoken dialog systems Peer-reviewed

Akinori Ito, Keisuke Shimada, Motoyuki Suzuki, Shozo Makino

INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5　1045-1048　2006
Multiple description coding of an audio stream by optimum recovery transform Invited Peer-reviewed

Akinori Ito, Shozo Makino

IIH-MSP: 2006 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, PROCEEDINGS　19-+　2006

DOI： 10.1109/IIH-MSP.2006.265110 　
Automatic detection of English mispronunciation using speaker adaptation and automatic assessment of English Intonation and rhythm Peer-reviewed

Akinori Ito, Tadao Nagasawa, Hirokazu Ogasawara, Motoyuki Suzuki, Shozo Makino

Educational Technology Research　29　(1)　13-23　2006
Publisher: Japan Society for Educational Technology
DOI： 10.15077/etr.KJ00004963297 　

ISSN： 0387-7434

More details Close

This paper describes evaluation methods of English utterances by Japanese speaker. The proposed methods consist of the following two methods: a pronunciation evaluation method and a prosody evaluation method. The pronunciation evaluation method detects phoneme-level mispronunciations, and the prosody evaluation method treats intonation and rhythm of the speech. The pronunciation evaluation method exploits VFS speaker adaptation technique to improve precision of phoneme labeling. On the adaptation, we developed a new adaptation scheme that uses Japanese utterance to adapt English acoustic models. This method enables speaker adaptation for speakers who are not good at English pronunciation. The prosody evaluation method compares the pitch pattern of native speakers' utterances and that of a learner's one, and returns a score that reflects the utterance's naturalness. Besides intonation, the method compares the rhythm of utterances between native speakers' speech and the learner's one. Evaluation experiments are carried out to compare native speakers' evaluation scores and the system's one against Japanese speakers' utterances, and we obtained significant correlation between the two evaluations.
Pronunciation Error Detection Method Based on Error Rule Clustering Using a Decision Tree Peer-reviewed

Akinori Ito, Yenling Lim, Motoyuki Suzuki, Shozo Makino

Proceeding of European Conference on Speech Communication and Technology　173-176　2005/09
Construction Method of Acoustic Models Dealing with Various Background Noises Based on Combination of HMMs Peer-reviewed

Motoyuki Suzuki, Yusuke Kato, Akinori Ito, Shozo Makino

Proceeding of European Conference on Speech Communication and Technology　973-976　2005/09
nternal Noise Suppression for Speech Recognition by Small Robots Peer-reviewed

Akinori Ito, Takashi Kanayama, Motoyuki Suzuki, Shozo Makino

Proceeding of European Conference on Speech Communication and Technology　2685-2688　2005/09
Lyrics Recognition From A Singing Voice Based On Finite State Automaton For Music Information Retrieval Peer-reviewed

Toru Hosoya, Motoyuki Suzuki, Akinori Ito, Shozo Makino

Proceedings of the 6th International Conference on Music Information Retrieval　532-535　2005/09
A Grammatical Error Detection Method for Dialogue-based CALL system Peer-reviewed

Oh-Pyo Kweon, Akinori Ito, Motoyuki Suzuki, Shozo Makino

Journal of Natural Language Processing　12　(4)　137-156　2005/08
Publisher:
DOI： 10.5715/jnlp.12.4_137 　

ISSN： 1340-7619
Fast optimization of language model weight and insertion penalty from n-best candidates Peer-reviewed

Akinori Ito, Masaki Kohda, Shozo Makino

Acoustical Science and Technology　26　(4)　384-387　2005/07

DOI： 10.1250/ast.26.384 　

ISSN： 1346-3969
A new design concept of robotic interface for the improvement of user familiarity Peer-reviewed

Y Hiroi, E Nakano, T Takahashi, A Ito, K Kotani, N Takatsu

ICMIT 2005: CONTROL SYSTEMS AND ROBOTICS, PTS 1 AND 2　6042　(604230)　1-4　2005

DOI： 10.1117/12.664685 　

ISSN： 0277-786X
Smile and laughter recognition using speech processing and face recognition from conversation video Peer-reviewed

A Ito, XY Wang, M Suzuki, S Makino

2005 INTERNATIONAL CONFERENCE ON CYBERWORLDS, PROCEEDINGS　437-444　2005

DOI： 10.1109/CW.2005.82 　
Noise Adaptive Spoken Dialog System based on Selection of Multiple Dialog Strategies Peer-reviewed

Akinori Ito, Takanobu Oba, Takashi Konashi, Motoyuki Suzuki, Shozo Makino

Proceedings of International Conference on Spoken Language Processing　1　193-196　2004/10
A Japanese dialogue-based CALL system with mispronunciation and grammar error detection Peer-reviewed

Oh Pyo Kweon, Akinori Ito, Motoyuki Suzuki, Shozo Makino

Proceedings of International Conference on Spoken Language Processing　3　1833-1836　2004/10
Speaker Adaptation Method for CALL Systems Using Bilingual Speakers' Utterances Peer-reviewed

Motoyuki Suzuki, Hirokazu Ogasawara, Akinori Ito, Yuichi Ohkawa, Shozo Makino

Proceedings of International Conference on Spoken Language Processing　4　2929-2932　2004/10
Comparison of Features for DP-matching based Query-by-humming System Peer-reviewed

Akinori Ito, Sung-Phil Heo, Motoyuki Suzuki, Shozo Makino

Proceedings of the 5th International Conference on Music Information Retrieval　297-302　2004/10
A spoken dialog system based on automatic grammar generation and template-based weighting for autonomous mobile robots Peer-reviewed

Takashi KONASHI, Motoyuki SUZUKI, Akinori ITO, Shozo MAKINO

Proceedings of International Conference on Spoken Language Processing　1　189-192　2004/10
A dialogue-based CALL system for Japanese conversation Peer-reviewed

Oh-Pyo Kweon, Akinori Ito, Motoyuki Suzuki, Shozo Makino

Proceedings of the 18th International Congress on Acoustics　3　2015-2018　2004/04
Language modeling using stochastic switching N-gram Peer-reviewed

NAGANO, Takeshi, SUZUKI, Motoyuki, ITO, Akinori, MAKINO, Shozo

training　5　(3years)　1991-1993　2004/04
Language Modeling by an Ergodic HMM based on an N-gram Peer-reviewed

Takeshi Nagano, Motoyuki Suzuki, Akinori Ito, Shozo Makino, Masaharu Katoh, Masaki Kohda

Proceedings of the 18th International Congress on Acoustics　5　3701-3704　2004/04
An evaluation method of Japanese pronunciation for Korean native speakers Peer-reviewed

Oh Pyo Kweon, Motoyuki Suzuki, Akinori Ito, Shozo Makino

Educational Technology Research　27　(1)　1-8　2004/01
Publisher: Japan Society for Educational Technology
DOI： 10.15077/etr.KJ00003899214 　

ISSN： 0387-7434

More details Close

This paper describes an analysis of pronunciation problems in Japanese utterances by Korean speakers, and evaluation methods of a CALL (Computer Assisted Language Learning) system for teaching Japanese pronunciation to Korean speakers. To develop a CALL system, the pronunciation problems of Koreans must be understood. Firstly, Japanese utterances by adult Korean speakers were evaluated by Japanese native speakers. Then, the Japanese pronunciation problems of Korean speakers were analyzed. Finally, evaluation methods were developed. Speech recognition technology was used to compare Japanese utterances by a learner with that by a native speaker. With the proposed methods, intelligibility scores which indicate the similarity between the learner's speech and the Japanese native's speech are automatically calculated.
A Patient Care Service Robot System Based on a State Transition Architecture Peer-reviewed

Yutaka Hiroi, Eiji Nakano, Takayuki Takahashi, Shozo Makino, Akinori Ito, Koji Kotani, Nobuo Takatsu, Tadahiro Ohmi

Proceedings of the 2nd International Conference on Mechatronics and Information Technology　231-236　2003/12
Three dimensional continuous DP algorithm for multiple pitch candidates in music information retrieval system Peer-reviewed

Heo, Sungphil, Suzuki, Motoyuki, Ito, Akinori, Makino, Shozo

Proceedings of 4th International Symposium on Music Information Retrieval　235-236　2003/10
Publisher: Johns Hopkins University
Multiple pitch candidates based music information retrieval method for query-by-humming Peer-reviewed

Heo, Sung-Phil, Suzuki, M., Ito, A., Makino, S., Chung, HY

Proc. AMR　189-200　2003/09
Analysis of pronunciation errors in Japanese speech uttered by Korean towards development of Japanese CALL system Peer-reviewed

KWEON, OH

Proc. of O-COCOSDA 2003　185-192　2003/06
A Portable spoken dialog system for autonomous robots Peer-reviewed

Takashi Konashi, Motoyuki Suzuki, Akinori Ito, Shozo Makino

Proceeding of 1st International Workshop on Language Understanding and Agents for Real-world Interaction　79-84　2003/05
Construction and evaluation of language models based on stochastic context-free grammar for speech recognition

Chiori Hori, Masaharu Katoh, Akinori Ito, Masaki Kohda

Systems and Computers in Japan　33　(13)　48-59　2002/11/30

DOI： 10.1002/scj.1172 　

ISSN： 0882-1666
A Metric based on Likelihood Difference for N-gram Language Model Evaluation Peer-reviewed

Akinori Ito, Masaki Kohda

IPSJ Journal　43　(7)　2055-2064　2002/07
Construction and evaluation of language models based on stochastic context-free grammar for speech recognition Peer-reviewed

Chiori Hori, Masaharu Katoh, Akinori Ito, Masaki Kohda

IEICE Trans.(D-II)　J83-D-II　(11)　2407-2417　2000/11
Evaluation of Task Adaptation Using N-gram Count Mixture Peer-reviewed

Akinori Ito, Masaki Kohda

IEICE Trans.(D-II)　J83-D-II　(11)　2418-2427　2000/11
Language modeling by stochastic dependency grammar for Japanese speech recognition Peer-reviewed

Akinori Ito, Chiori Hori, Masaharu Katoh, Masaki Kohda

Proceeding of International Conference on Spoken Language Processing　2000/10
Free Software Toolkit for Japanese large vocabulary continuous speech recognition Peer-reviewed

Tatsuya Kawahara, Akinobu Lee, Tetsunori Kobayashi, Kazuya Takeda, Nobuaki Minematsu, Shigaki Sagayama, Katsunobu Itoh, Akinori Ito, Mikio Yamamoto, Atsushi Yamada, Takehito Utsuro, Kiyohiro Shikano

Proceeding of International Conference on Spoken Language Processing　476-479　2000/10
Overview of Japanese Dictation Toolkit

Kawahara, Tatsuya, Lee, Akinobu, Kobayashi, Tetsunori, Takeda, Kazuya, Minematsu, Nobuaki, Sagayama, Shigeki, Itou, Katsunobu, Ito, Akinori, Yamamoto, Mikio, Yamada, Atsushi

2000
A new metric for stochastic language model evaluation Peer-reviewed

Akinori Ito, Masaki Kohda

Proceeding of European Conference on Speech Communication and Technology　4　1591-1594　1999/09
A Study on a Phoneme-graph-based Hypothesis Restriction for Large Vocabulary Continuous Speech Recognition Peer-reviewed

Takaaki Hori, Masaharu Katoh, Akinori Ito, Masaki Kohda

IPSJ Journal　40　(4)　1365-1373　1999/04
A Study on a State Clustering-Based Topology Design Method for HM-Nets Peer-reviewed

Takaaki Hori, Masaharu Katoh, Akinori Ito, Masaki Kohda

IEICE Trans.(D-II)　J81-D-II　(10)　2239-2248　1998/10
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0915-1923
A study on HM-Nets using decision tree-based successive state splitting Peer-reviewed

Takaaki Hori, Masaharu Katoh, Akinori Ito, Masaki Kohda

Proceeding of IEEE International Conference on Speech Processing　1　383-387　1998/05
Common Platform of Japanese Large Vocabulary Continuous Speech Recognizer Assessment -- Proposal and Initial Results -- Peer-reviewed

T.Kawahara, A.Lee, T.Kobayashi, K.Takeda, N.Minematsu, K.Itou, A.Ito, M.Yamamoto, A.Yamada, T.Utsuro, K.Shikano

Proc. Oriental-COCOSDA Workshop　117-122　1998
A Study on HM-Nets Using Phonetic Decision Tree-Based Successive State Splitting Peer-reviewed

Takaaki Hori, Masaharu Katoh, Akinori Ito, Masaki Kohda

IEICE Trans.(D-II)　J80-D-II　(10)　2645-2654　1997/10
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0915-1923
N-gram language model adaptation using small corpus for spoken dialog recognition Peer-reviewed

Akinori Ito, Hideyuki Saitoh, Masaharu Katoh, Masaki Kohda

Proceeding of European Conference on Speech Processing　2735-2738　1997/09
Language Modeling by Kana and Kanji String N-gram Peer-reviewed

Akinori Ito, Masaki Kohda

IEICE Trans.(D-II)　J79-D-II　(12)　2062-2069　1996/12
The performance prediction on sentence recognition using a finite state word automaton Peer-reviewed

T Otsuki, A Ito, S Makino, T Ohtomo

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E79D　(1)　47-53　1996/01

ISSN： 0916-8532
Language modeling by string pattern N-gram for Japanese speech recognition Peer-reviewed

A Ito, M Kohda

ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4　490-493　1996
A NEW HMNET CONSTRUCTION ALGORITHM REQUIRING NO CONTEXTUAL FACTORS Peer-reviewed

M SUZUKI, S MAKINO, A ITO, H ASO, H SHIMODAIRA

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E78D　(6)　662-668　1995/06

ISSN： 0916-8532
Word Pre-Selection Using Extended Redundant Hash Addressing Method for Continuous Speech Recognition Peer-reviewed

Akinori Ito, Shozo Makino

IEICE Trans.(D-II)　J78-D-II　(3)　400-408　1995/03
Performance Prediction of Word Recognition Using the Probability of Word Occurrence Peer-reviewed

Takashi Otsuki, Akinori Ito, Shozo Makino, Teruhiko Otomo

IEICE Trans.(A)　J77-A　(2)　274-281　1994/02
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5707
A continuous speech recognition system using a modified LVQ2 method and a dependency grammar with semantic constraints Peer-reviewed

Shozo Makino, Akinori Ito, Mitsuru Endo, Ken'iti Kido

J. Pattern Recognition and Artificial Intelligence　8　(1)　197-213　1994/01

DOI： 10.1142/S0218001494000097 　
THE PERFORMANCE PREDICTION METHOD ON SENTENCE RECOGNITION SYSTEM USING A FINITE STATE AUTOMATON Peer-reviewed

T OTSUKI, A ITO, S MAKINO, T OTOMO

ICASSP-94 - PROCEEDINGS, VOL 1　397-400　1994
A Fast Word Pre-Selection Based on Speech Fragments for Continuous Speech Recognition

Akinori Ito, Shozo Makino

Proceeding of International Workshop on Speech Processing　107-112　1993/11
Performance Prediction of Word Recognition Using the Transition Information between Phonemes or between Characters Peer-reviewed

Takashi Otsuki, Akinori Ito, Shozo Makino, Toshio Sone

IEICE Trans.(D-II)　J76-D-Ii　(6)　1090-1096　1993/06
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0915-1923
A NEW WORD PRESELECTION METHOD BASED ON AN EXTENDED REDUNDANT HASH ADDRESSING FOR CONTINUOUS SPEECH RECOGNITION Peer-reviewed

A ITO, S MAKINO

ICASSP-93 : 1993 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5　B299-B302　1993
Word pre-selection using a redundant hash addressing method for continuous speech recognition Peer-reviewed

Akinori Ito, Shozo Makino

Proceeding of the International Conference on Spoken Language Processing　309-312　1992/10
A Functional Word Prediction CYK Method for Parsing Spoken Japanese Sentences Peer-reviewed

Akinori Ito, Shozo Makino, Ken'iti Kido

IEICE Trans.(D-II)　J74-D-II　(9)　1147-1155　1991/09
Publisher:
ISSN： 0915-1923
A JAPANESE TEXT DICTATION SYSTEM BASED ON PHONEME RECOGNITION AND A DEPENDENCY GRAMMAR Peer-reviewed

S MAKINO, A ITO, M ENDO, K KIDO

IEICE TRANSACTIONS ON COMMUNICATIONS ELECTRONICS INFORMATION AND SYSTEMS　74　(7)　1773-1782　1991/07

ISSN： 0917-1673
Parsing of spoken Japanese sentences using the functional word prediction CYK algorithm Peer-reviewed

Akinori Ito, Shozo Makino, Ken'iti Kido

Proc. Korea-Japan Joint Symposium on Acoustics　218-221　1991/07
A JAPANESE TEXT DICTATION SYSTEM BASED ON PHONEME RECOGNITION AND A DEPENDENCY GRAMMAR Peer-reviewed

S MAKINO, A ITO, M ENDO, K KIDO

ICASSP 91, VOLS 1-5　273-276　1991
A Japanese Text Dictation System Based on Phoneme Recognition Using a Modified LVQ2 Method Peer-reviewed

Shozo Makino, Akinori Ito, Mitsuru Endo, Ken'iti Kido

Proceeding of the International Conference on Spoken Language Processing　241-244　1990/11
`Bunsetsu' Spotting-based Japanese Continuous Speech Recognition Peer-reviewed

Michio Okada, Hiroshi Matsuo, Akinori Ito, Yoiti Ogawa, Shozo Makino, Ken'iti Kido

Trans. IEEJ(C)　108-C　(10)　826-833　1988/10

DOI： 10.1541/ieejeiss1987.108.10_826 　
Japanese Conjugate Word Spotting in Continuous Speech Using a Syntactic Driven Continuous DP Matching Algorithm Peer-reviewed

Michio Okada, Akinori Ito, Shozo Makino, Ken'iti Kido

IEICE Trans.(D)　70　(12)　p2479-2490　1987/12
Publisher:
ISSN： 0913-5731

Show all ︎Show first 5

Misc. 358

Fundamental investigation of a human-following robot system that moves side-by-side with a person

廣井富, 朝倉大裕, 中田海地, 伊藤彰則

日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM)　2020　2020

ISSN： 2424-3124
人追従時における追従対象者と非追従対象者の切り分け手法の実装

中田海地, 朝倉大裕, 廣井富, 伊藤彰則

計測自動制御学会システムインテグレーション部門講演会(CD-ROM)　20th　2019
2台のLRFを用いた人追跡手法の提案-鬼ごっこロボットの開発-

池本瑚幸, 廣井富, 伊藤彰則

計測自動制御学会システムインテグレーション部門講演会(CD-ROM)　20th　2019
テレプレゼンスロボットのための操作者の顔提示機能の開発

野阪百穂, 廣井富, 伊藤彰則

日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM)　2019　2019

ISSN： 2424-3124
人追従時における追従対象者と非追従対象者の切り分けに関する基礎的検討

中田海地, 廣井富, 伊藤彰則

日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM)　2019　2019

ISSN： 2424-3124
Preface

Jeng Shyang Pan, Akinori Ito, Pei Wei Tsai, Lakhmi C. Jain

Smart Innovation, Systems and Technologies　109　V-VI　2019

ISSN： 2190-3018

eISSN： 2190-3026
デモンストレーションを指向したロボットの原点復帰の提案-「だるまさんが転んだ」を行うロボットの開発-

中森裕子, 廣井富, 伊藤彰則

日本ロボット学会学術講演会予稿集(CD-ROM)　36th　2018
操作者の顔を再現するテレプレゼンスロボットの提案

野阪百穂, 廣井富, 伊藤彰則

計測自動制御学会システムインテグレーション部門講演会(CD-ROM)　19th　2018
「だるまさんが転んだ」の鬼役ロボットのためのタッチ機能の開発

中森裕子, 廣井富, 田中翔吾, 伊藤彰則

計測自動制御学会システムインテグレーション部門講演会(CD-ROM)　19th　2018
RGB-DカメラとLaser Range Finderを用いた障害物回避に関する基礎的検討

宮内雄大, 廣井富, 伊藤彰則

計測自動制御学会システムインテグレーション部門講演会(CD-ROM)　19th　2018
正面から接近する歩行者に対するロボットの事前回避手法の開発

廣井富, 宮内雄大, 伊藤彰則

日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM)　2018　2018

ISSN： 2424-3124
OpenPoseを用いた人の振り返り検出手法の開発-「だるまさんが転んだ」を行うロボットの開発-

廣井富, 小田垣成伸, 伊藤彰則

日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM)　2018　2018

ISSN： 2424-3124
Foreword

Akinori Ito

IEICE Transactions on Information and Systems　E101D　(1)　1　2018/01

DOI： 10.1587/transinf.2017MUF0001 　

ISSN： 0916-8532

eISSN： 1745-1361
Poster Presentation : A Study on Singer-Independent Singing Voice Conversion Using Read Speech Based on Neural Network

116　(414)　17-22　2017/01/21
Publisher: 電子情報通信学会
ISSN： 0913-5685
OpenPoseとLRFを用いた群衆回避手法の試み

森下康平, 廣井富, 宮内雄大, 伊藤彰則

計測自動制御学会システムインテグレーション部門講演会(CD-ROM)　18th　2017
RGB-Dカメラを用いた床面上の小物体回避に関する基礎的検討

宮内雄大, 廣井富, 今西天希, 伊藤彰則

計測自動制御学会システムインテグレーション部門講演会(CD-ROM)　18th　2017
LRFとビジョンの併用による群衆通り抜け時における人追跡手法の開発

宮内雄大, 廣井富, 西口敏司, 伊藤彰則

日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM)　2017　2017

ISSN： 2424-3124
LRFを用いた「だるまさんが転んだ」における「幅判定手法」の効果

中森裕子, 廣井富, 伊藤彰則

日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM)　2017　2017

ISSN： 2424-3124
Foreword. Invited

Akinori Ito

IEICE Transactions　100-D　(1)　1　2017

DOI： 10.1587/transinf.2016MUF0001 　
Improvement of Accent Sandhi Rules Based on Accent Dictionary for Japanese Text-to-Speech Systems

116　(378)　31-36　2016/12/20
Publisher: 電子情報通信学会
ISSN： 0913-5685
Poster Presentation : Development of the Julius-compatible interface for the speech recognition engine of Kaldi toolkit

116　(378)　49-51　2016/12/20
Publisher: 電子情報通信学会
ISSN： 0913-5685
Poster Presentation : F0 control by modeling differential features in DNN-based speech synthesis

116　(378)　37-42　2016/12/20
Publisher: 電子情報通信学会
ISSN： 0913-5685
Discrimination of Level of Willingness to Talk and Analysis of Features by Using Dialog Collected on WOZ basis

78　7-12　2016/10/05
Publisher: 人工知能学会
ISSN： 0918-5682
A Study on Colorization in Photo-Realistic Facial Animation Synthesis from Text Based on HMM and DNN with Animation Unit

116　(220)　67-72　2016/09/15
Publisher: 電子情報通信学会
ISSN： 0913-5685
A Study on Colorization in Photo-Realistic Facial Animation Synthesis from Text Based on HMM and DNN with Animation Unit

40　(31)　67-72　2016/09
Publisher: 映像情報メディア学会
ISSN： 1342-6893
Study of Photo-realistic Face Moving Image Generation from the Text Using the Facial Feature

116　(33)　43-48　2016/05/19
Publisher: 電子情報通信学会
ISSN： 0913-5685
円形回避領域を用いた群衆回避手法の提案

森下康平, 廣井富, 伊藤彰則

日本ロボット学会学術講演会予稿集(CD-ROM)　34th　2016
RGB-Dセンサを用いた指差し認識に関する研究-位置誤差に関する一考察-

津田剛志, 廣井富, 伊藤彰則

日本ロボット学会学術講演会予稿集(CD-ROM)　34th　2016
複数台の道案内ロボットのための人位置情報の引き継ぎ手法の提案

田中佑季, 廣井富, 伊藤彰則

日本ロボット学会学術講演会予稿集(CD-ROM)　34th　2016
複数台の手すりを移動する道案内ロボットによる人位置情報の引き継ぎ手法の実装

田中佑季, 廣井富, 伊藤彰則

日本感性工学会大会予稿集(CD-ROM)　18th　2016
子どもと外遊びを行うテレプレゼンスロボットの提案

廣井富, 中森裕子, 森下康平, 伊藤彰則

計測自動制御学会システムインテグレーション部門講演会(CD-ROM)　17th　2016
移動ロボット接近時における動作予告を用いた恐怖感低減に関する検討

廣井富, 前田彰大, 田中佑季, 松丸隆文, 伊藤彰則

日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM)　2016　2016

ISSN： 2424-3124
拡張現実感を用いた恐怖感低減手法に関する検討

廣井富, 前田彰大, 田中佑季, 伊藤彰則

日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM)　2016　2016

ISSN： 2424-3124
Analyzing the human-human dialog and examining to build WOZ system for estimating the user's willingness to talk

115　(346)　117-122　2015/12/02
Publisher: 電子情報通信学会
ISSN： 0913-5685
A study on quick model training in HMM-based speech synthesis

115　(253)　27-32　2015/10/15
Publisher: 電子情報通信学会
ISSN： 0913-5685
Multiple Description Vector Quantizer Based on Bit-Error-Tolerant Vector Quantizer Design

115　(219)　33-38　2015/09/10
Publisher: 電子情報通信学会
ISSN： 0913-5685
Multiple Description Vector Quantizer Based on Bit-Error-Tolerant Vector Quantizer Design

39　(32)　33-38　2015/09
Publisher: 映像情報メディア学会
ISSN： 1342-6893
Automatic generation of abbreviated named entities for localized speech recognition

115　(184)　7-12　2015/08/21
Publisher: 電子情報通信学会
ISSN： 0913-5685
HMM音声合成におけるアクセントラベリング基準が合成音声に与える影響の分析

高橋遼太, 能勢隆, 伊藤彰則

情報処理学会研究報告. SLP, 音声言語情報処理　2015　(1)　1-6　2015/05/18
Publisher: 一般社団法人情報処理学会

More details Close

本論文では,従来の HMM 音声合成において曖昧であったアクセントラベリング基準について検討を行い,合成音声への影響を調べる.具体的には,アクセント型の表現およびアクセント句境界の基準について検討する.アクセント型については,尾高型が 0 型とモーラ長型の 2 通りの表現があることに着目し,それらを用いた場合に合成音声の F0 がどのような影響を受けるかについて客観評価を行う.また,2 段階クラスタリングを用いる効果についても検証する.アクセント句境界については,アクセント句によっては 0 型と 1 型の 2 つのアクセント句で表現する場合と,それらを結合し 1 つのアクセント句として表現する場合があり,これらの違いが合成音声に与える影響を調べる.またこれらの評価において,日本語アクセントの高低の誤りを客観的指標として導入し,この指標の有効性について分析を行う.
日本人のための音声対話による英会話学習システム

伊藤彰則

情報処理学会研究報告. SLP, 音声言語情報処理　2015　(12)　1-6　2015/05/18
Publisher: 一般社団法人情報処理学会

More details Close

筆者のグループがこれまで研究してきた,音声対話を利用した英会話のための CALL システムに関する技術について述べる.音声認識技術を利用した現状の CALL システムは,発音やイントネーションなど,1 つの発話に含まれる要素を採点するものが多い.それも重要ではあるが,英会話学習には「実際に使われる表現を何度も繰り返して練習する」ということも必要である.この考えに基づき,筆者のグループではこれまで「対話に基づく CALL システム」について研究してきた.本稿では,対話音声からの韻律評価,文法誤り検出および応答タイミング制御練習のためのシステムについて述べる.
シナリオ対話における感情音声合成を用いた対話システムの評価と感情付与方法の検討

加瀬嵩人, 能勢隆, 千葉祐弥, 伊藤彰則

情報処理学会研究報告. SLP, 音声言語情報処理　2015　(9)　1-7　2015/05/18
Publisher: 一般社団法人情報処理学会

More details Close

近年,非タスク指向型の音声対話システムへの需要が拡大しており,様々な研究がされている.それらほとんどの研究は言語的な観点から適切な応答の生成を目指したものである.一方で人間同士の会話においては,感情表現や発話様式などのパラ言語情報を効果的に利用することにより,対話を円滑に進めることができると考えられる.そこで我々はシステムの応答の内容ではなく,応答の仕方に着目し,感情音声合成を対話システムに用いることを試みる.本研究ではまず,適切な感情付与を人手により与えた場合に実際に対話システムの質が向上するかを複数のシナリオを作成して主観基準により評価する.次に,感情付与を自動化するために,システム発話に応じた付与とユーザ発話に協調した付与の 2 つの手法について検討を行う.評価結果から,感情を自動付与することで対話におけるユーザの主観評価スコアが向上すること,またユーザ発話に協調した感情付与がより効果的であることを示す.
ユーザの対話意欲自動推定を目標とした対話データの分析と音声画像特徴量の検討

千葉祐弥, 能勢隆, 伊藤彰則

研究報告音声言語情報処理（SLP）　2015　(10)　1-6　2015/02/20
Publisher: 一般社団法人情報処理学会

More details Close

対話型システムがユーザに適応して話題の提供や情報推薦を行うためには，ユーザの情報を効率的に獲得できることが望ましい．本研究では，ユーザに対して積極的に質問するインタビュー型の音声対話システムを想定する．このようなシステムとの対話では，ユーザが話したいと思う話題に関してはより詳細な情報が得られる可能性がある一方，ユーザが話したくない話題に関しては有益な情報が得られない可能性が高いと考えられるため，システムはユーザの対話意欲を考慮して質問や話題の選択を行う必要がある．本稿では，ユーザの対話意欲を自動推定するための初期検討として，人間同士のインタビュー対話の分析とその自動識別を行った．分析から，対話者自身が自分の対話意欲の高低を自覚できている場合，70～80% 程度の精度で第三者にあたる評価者が対話意欲を判断できることが示唆された．また，評価者のアンケートに挙げられたマルチモーダル情報を利用することで，人間と同程度の精度で自動識別できることが示された．
Waveletを用いた特徴量抽出法とその高精度化手法の評価

松井清彰, 能勢隆, 伊藤彰則

研究報告音声言語情報処理（SLP）　2015　(5)　1-6　2015/02/20
Publisher: 一般社団法人情報処理学会

More details Close

音声認識の普及のために，より安価な音声認識システムの実現が必要である．音声認識の低演算量化に関しては様々な先行研究が行われているが，特徴量抽出処理に関しては研究が不十分である．そのため我々は，Wavelet 変換を用いた新しい低演算量特徴量抽出法およびその高精度化手法について提案してきた．本論文では，Haar Wavelet 及び Daubechies Wavelet の 2 種類の Wavelet を用いて特徴量抽出を行い，その性能を MFCC と比較した．その結果，高精度化手法を用いることで，若干の認識率の向上が見られた．また，フレーム間の動的特徴量である Δ 特徴量及び MFCC と同様に，DCT 出力の高次削減によって，さらに認識率を向上させることができた．一方，計算時間に関しては，最もシンプルな Wavelet を用いることで，MFCC の 5 倍以上の計算速度を確保できることが分かった．
英会話学習システムの複数回使用時における学習者の交替潜時の変化に関する検討

鈴木直人, 廣井富, 藤原祐磨, 千葉祐弥, 能勢隆, 伊藤彰則

日本音響学会研究発表会講演論文集(CD-ROM)　2015　2015

ISSN： 1880-7658
英会話学習システムにおける応答タイミング練習方法の有効性の検証

鈴木直人, 廣井富, 藤原祐磨, 千葉祐弥, 能勢隆, 伊藤彰則

情報処理学会研究報告(Web)　2015　(SLP-105)　2015
空き缶を拾うロボット-物体の傾き推定に関する一手法-

二上啓大, 廣井富, 西口敏司, 伊藤彰則

日本ロボット学会学術講演会予稿集(CD-ROM)　33rd　2015
荷物の運搬支援のための台車の開発-台車の自走を可能にする着脱式駆動ユニット-

坂井奎亮, 廣井富, 伊藤克明, 伊藤彰則

日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM)　2015　2015

ISSN： 2424-3124
ロボットとの「だるまさんがころんだ」の提案

廣井富, 坂井奎亮, 立田裕記, 伊藤彰則

日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM)　2015　2015

ISSN： 2424-3124
拡張現実感を用いた生活支援ロボットの恐怖感低減手法の評価-ロボットサイズに関する実験-

廣井富, 森奨平, 藤原祐磨, 伊藤彰則

日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM)　2015　2015

ISSN： 2424-3124
人の少し前を移動するコミュニケーションロボットの評価-手すり上を移動するコミュニケーションロボットの開発-

田中佑季, 廣井富, 藤原祐磨, 伊藤彰則

日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM)　2015　2015

ISSN： 2424-3124
拡張現実感を用いた生活支援ロボットの恐怖感低減手法の評価-ロボットの色に関する実験-

廣井富, 森奨平, 藤原祐磨, 伊藤彰則

日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM)　2015　2015

ISSN： 2424-3124
Drawing the current and future figures of ASJ from viewpoint of number of members

Ito Akinori

The Journal of the Acoustical Society of Japan　71　(1)　5-6　2014/12/25
Publisher: The Acoustical Society of Japan (ASJ)
ISSN： 0369-4232
Bit-error-tolerant Quantizer Based on Self-Organizing Map

ITO Akinori

Technical report of IEICE. EA　114　(315)　19-24　2014/11/20
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

Bit errors cannot be avoided when communicating using a digital channel. Packet-based communication abodons the packets with bit errors. However, we can assume a multimedia communiation such as speech or image where small amount of bit errors are not fatal. Considering such kind of multimedia communication, effect of bit error on quality of multimedia data was investigated. The result suggested that vector quantization was more fragile than scalar quantization with respect to the bit errors. Then a new vector quantization is proposed, which is robust against bit errors. The proposed method is based on the self-organizing map (SOM), and the codebook is designed so that Hamming distance between two codes and Euclidean discance between the corresponding centroids are correlated. The results of the simulation experiments showed that the proposed method was less affected by bit errors compared with the conventional k-means method.
日本人による英語歌唱音声の発音評価手法の検討

吉田一道, 能勢隆, 伊藤彰則

研究報告音楽情報科学（MUS）　2014　(9)　1-6　2014/11/13

More details Close

我々は日本人による英語歌唱音声の英語発音の自動評価を目指している．本研究では，日本人による英語歌詞朗読音声，歌唱音声のデータベースを構築し，英語ネイティブ話者と日本語ネイティブ話者による主観評価を行った．また，英語ネイティブ話者と日本語ネイティブ話者による英語歌詞朗読音声と英語歌唱音声の評価を比較し，歌唱音声では発話音声と比較して伸ばすフレーズに発音誤りが生じやすいということが示唆された．さらに，HMM による英語歌唱の自動発音評価手法について検討し，日米 2 言語のネイティブ話者による発話音声から学習した HMM を用いた簡単な発音誤り判定実験を行った．その結果，発音誤り判定時の尤度差の閾値や歌唱時に伸ばすフレーズの発音誤りの検討により，更に検出精度を向上させられる可能性がある事を論じた．
日本人による英語歌唱音声の発音評価手法の検討

吉田一道, 能勢隆, 伊藤彰則

研究報告デジタルコンテンツクリエーション（DCC）　2014　(9)　1-6　2014/11/13

More details Close

我々は日本人による英語歌唱音声の英語発音の自動評価を目指している．本研究では，日本人による英語歌詞朗読音声，歌唱音声のデータベースを構築し，英語ネイティブ話者と日本語ネイティブ話者による主観評価を行った．また，英語ネイティブ話者と日本語ネイティブ話者による英語歌詞朗読音声と英語歌唱音声の評価を比較し，歌唱音声では発話音声と比較して伸ばすフレーズに発音誤りが生じやすいということが示唆された．さらに，HMM による英語歌唱の自動発音評価手法について検討し，日米 2 言語のネイティブ話者による発話音声から学習した HMM を用いた簡単な発音誤り判定実験を行った．その結果，発音誤り判定時の尤度差の閾値や歌唱時に伸ばすフレーズの発音誤りの検討により，更に検出精度を向上させられる可能性がある事を論じた．
A Study on Intuitive Control of Emotional Expressions and Speaking Styles Using Facial Features by Kinect

BI Yu, NOSE Takashi, ITO Akinori

IEICE technical report. Speech　114　(303)　25-30　2014/11/13
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

This paper proposes a style control technique of synthetic speech based on multiple regression HSMM (MRHSMM) using facial features. In the proposed technique, styles and their intensities are represented by Animation Unit (AU) parameters and are modeled by an assumption that mean parameters of acoustic models are given as multiple regressions of the AU parameters. Since correlation among AU parameters is problematic in the modeling, we conducted orthogonalization and dimiensionality reduction in advance. When synthesizing speech, we can generated synthetic speech with an intended style by inputting the corresponding facial expression. In this study, we examine the appropriate number of AU parameters and discuss the performance difference depending on the users.
Analysis of interview dialog for building user-profiling dialog system considering motivation of conversation

CHIBA Yuya, ITO Akinori

Technical report of IEICE. HCS　114　(273)　43-48　2014/10/23
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

A dialog system was to obtain user's profile appropriately in order to provide a topic of dialog or recommend information adapted to the user. In this research, we assumed an interview-based user profiling system which asks the user about his/her personal information actively. In such a system, the system can obtain the detailed information if the user want to talk about the provided topic, but the system cannot obtain the beneficial information if the user does not interested in talking with that topic. A present paper analyse the interview dialog between humans for an initial study to estimate the user's motivation of conversation. As a result, the evaluators can judge the user's motivation of conversation of the relatively long dialog of the single topic in an accuracy from 70% to 80%. In addition, when they evaluate the subdivided dialog data, we observed some correlation between the judgements of the evaluators, although the concordance of the evaluation was decreased. Finally, it was indicated that several multi-modal information is efficient to estimate the user's motivation of the conversation, such as prosodic information, linguistic information, gesture, and gaze activity of the user.
コンピュータが声を聴く : 機械による音声の認識 (特集きく)

伊藤彰則

高翔 : 自動車技術会関東支部報　(62)　16-19　2014/07
Publisher: 自動車技術会関東支部
20 years of SIG-SLP ―Review by successive chairs―

Tsuneo Nitta, Tetsunori Kobayashi, Satoshi Nakamura, Kazuya Takeda, Tatsuya Kawahara, Akinori Ito

IPSJ SIG Notes　2014　(5)　1-6　2014/01/24
Publisher: Information Processing Society of Japan (IPSJ)

More details Close

This report reviews researches presented in 20-year of SlG-SLP meetings and overlooks the trends of spoken language processing research. First, the facts of papers presented in SIG-SLP are described. Then we present chair-by-chair trends of spoken language research, and finally we make suggestions to promote spoken language research of the next decade.
Subjective evaluation of latency and speech degradation of VoIP communication with packet loss concealment under severe packet loss

389-392　2014
Publisher: 日本音響学会
ISSN： 1880-7658
A sinusoidal model for voiced speech based on a complex analysis window

319-322　2014
Publisher: 日本音響学会
ISSN： 1880-7658
Applying singing voice analysis to entertainment : from music information retrieval to karaoke

1033-1036　2014
Publisher: 日本音響学会
ISSN： 1880-7658
LRFによる人追従を考慮した障害物回避手法の提案

坂井奎亮, 廣井富, 伊藤彰則

日本ロボット学会学術講演会予稿集(CD-ROM)　32nd　2014
手すり上を移動するコミュニケーションロボットの開発-伸びる手を用いた道案内の評価-

藤原祐磨, 廣井富, 鈴木直人, 伊藤彰則

日本ロボット学会学術講演会予稿集(CD-ROM)　32nd　2014
英会話学習システムにおけるCGキャラクタの効果と学習者の発話タイミング制御のための付加表現に関する検討

鈴木直人, 廣井富, 藤原祐磨, 千葉祐弥, 能勢隆, 伊藤彰則

日本音響学会研究発表会講演論文集(CD-ROM)　2014　2014

ISSN： 1880-7658
ARキャラクタとの英会話練習時における交替潜時のタイムプレッシャーによる制御

鈴木直人, 廣井富, 藤原祐磨, 黒田尚孝, 戸塚典子, 千葉祐弥, 能勢隆, 伊藤彰則

日本音響学会研究発表会講演論文集(CD-ROM)　2014　2014

ISSN： 1880-7658
指差しと音声対話併用による床面上の物体回収手法の提案

二上啓大, 廣井富, 黒田尚孝, 鈴木直人, 伊藤彰則

日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM)　2014　2014

ISSN： 2424-3124
LRFを用いた人追従時の移動軌跡の記録と軌道追従に関する基礎的検討

坂井奎亮, 廣井富, 伊藤彰則

日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM)　2014　2014

ISSN： 2424-3124
手すり上を移動するコミュニケーションロボットの開発-伸びる手を用いた道案内の提案-

藤原祐磨, 廣井富, 川崎成人, 黒田尚孝, 鈴木直人, 伊藤彰則

日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM)　2014　2014

ISSN： 2424-3124
日常生活支援移動ロボットASAHI2013の開発

廣井富, 坂井奎亮, 二上啓大, 藤原祐磨, 伊藤彰則

日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM)　2014　2014

ISSN： 2424-3124
音声操作ロボットの意図せぬ動作に対するユーザ発話のパラ言語的特徴に関する分析(音声対話,第15回音声言語シンポジウム)

戸塚典子, 伊藤彰則

電子情報通信学会技術研究報告. SP, 音声　113　(366)　59-64　2013/12/12
Publisher: 一般社団法人電子情報通信学会

More details Close

音声操作インタフェースがロボットなど移動機構を持つ機器に搭載されると,ユーザが音声によってロボットをリアルタイムで操作するという状況が考えられる.しかしこのような場合,ユーザの言い間違えやシステムの誤認識などによってロボットがユーザの意図せぬ動作をする可能性がある.我々は,そのような動作が発生した際に迅速に修正する手法として,ユーザの意図せぬロボット動作が発生した際のユーザ発話のパラ言語的特徴に着目し,これらをロボットの制御に応用することを提案する.本研究では,被験者実験によって実際にロボットを操作している音声を収集し,ロボットがユーザの意図通りに動作している時とそうでない時で発話速度,基本周波数(FO),インテンシティに変化が表れるかどうか分析を行った.
音声操作ロボットの意図せぬ動作に対するユーザ発話のパラ言語的特徴に関する分析

戸塚典子, 伊藤彰則

研究報告音声言語情報処理（SLP）　2013　(10)　1-6　2013/12/12
Publisher: 一般社団法人情報処理学会
ISSN： 0913-5685

More details Close

音声操作インタフェースがロボットなど移動機構を持つ機器に搭載されると，ユーザが音声によってロボットをリアルタイムで操作するという状況が考えられる．しかしこのような場合，ユーザの言い間違えやシステムの誤認識などによってロボットがユーザの意図せぬ動作をする可能性がある．我々は，そのような動作が発生した際に迅速に修正する手法として，ユーザの意図せぬロボット動作が発生した際のユーザ発話のパラ言語的特徴に着目し，これらをロボットの制御に応用することを提案する．本研究では，被験者実験によって実際にロボットを操作している音声を収集し，ロボットがユーザの意図通りに動作している時とそうでない時で発話速度，基本周波数 (F0)，インテンシティに変化が表れるかどうか分析を行った．
ARキャラクタとの英会話練習時における交替潜時のタイムプレッシャーによる制御

鈴木直人, 廣井富, 藤原祐磨, 黒田尚孝, 戸塚典子, 千葉祐弥, 伊藤彰則

研究報告音声言語情報処理（SLP）　2013　(9)　1-6　2013/12/12
Publisher: 一般社団法人情報処理学会

More details Close

英会話練習をする際は対話相手が必要であり，相手との会話がテンポ良く行えるようになる練習が求められる．CALL (Computer-Assited Language Learning) システムにおいて，学習者の応答のタイミングを向上させるような枠組みは無いのが現状である．英会話練習の際には発話内容を想起し，それを英語で表現する２重の認知的負荷がかかるため，交代潜時が長くなりがちであるが，対話の最初から意識的に交代潜時を短くしていくためには学習者に対して明示的な方法を用いるべきである．そこで本研究では対話相手として AR (Augmented Reality) キャラクタを設定し，タイムプレッシャー表現をかけたときに応答タイミングの練習として有効であるかどうかを実験により検証することを試みた．実験参加者にはタイムプレッシャーの有無で 2 通りの対話を行い，最後に主観評価のアンケートを行った．本稿では以上の結果と主観評価を踏まえた考察を報告する．
A study of the user's state's estimation by using multi-modal information of the local segment

CHIBA Yuya, ITO Masashi, ITO Akinori

IEICE technical report. Speech　113　(220)　27-32　2013/09/18
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

Most conventional researches of spoken dialog system have focused on natural language processing because the dialog system decide the response by processing the speech recognition result of the user's utterance. However, the user sometimes upset by the system's interface and cannot make any input utterance under the actual environment. The system should consider the user's state before his/her input utterance ignored by conventional researches to help these users appropriately. To solve this problem, we have decided the two user's states and studied the method to estimate them. The previous experimental analysis of human evaluation suggested these user's internal states can be estimated by observing some user's non-verbal behavior. From this results, we proposed the estimation method by using multi-modal features in this report. The proposed method clusters the feature sequences and uses them as Bag-of-Words. We confirmed the proposed method obtains over 70.0 % accuracy.
An acoustical analysis for mixed speech signals using a complex window function

43　(6)　473-478　2013/08/09
Publisher: 日本音響学会
ISSN： 1346-1109
An acoustical analysis for mixed speech signals using a complex window function

ITO Masashi, ITO Akinori

Technical report of IEICE. EA　113　(177)　1-6　2013/08/09
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

A sinusoidal representation of voiced speech is one of the promising methods for speech analysis and synthesis, which approximates the input signal to a sum of sinusoidal components of which frequency and amplitude continuously vary with time. The difficulty in estimating sinusoidal parameters from the input can be classified into two types: one is a spectral distortion induced by non-stationarity in the signal, while the other is an interferences among neighboring components in the spectrum. To overcome the difficulties, a new analysis method is proposed which integrates the local vector transform and complex analysis window. The result of the experiment, in which sinusoidal parameters for single speech or tone of musical instrument were estimated, supported effectiveness of the proposed method. Further, the method could provide important basis in analyzing the mixture of these signals.
Noise reduction based on fragmentary measurement of environmental noise

MACHIDA Kohei, ITO Akinori

IEICE technical report. Speech　113　(161)　1-6　2013/07/25
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

We propose a speech recognition method under noisy environments using multiple microphones based on asynchronous and intermittent observations. In this method, microphones placed at various locations in the room sometimes observe sounds, and clustering by GMM is performed to model the noise in the environment. Each of the clustered noise spectrum is subtracted from the input signal, and then the noise-reduced signals are decoded in parallel. Then, the final recognition result is determined by integrating all of the recognition results.
Analysis of Acoustic Feature of Command Speech Towards a Mobile Robot Under the Robot's Unintended Behavior

TOTSUKA Noriko, ITO Akinori

IEICE technical report. Speech　113　(161)　57-62　2013/07/25
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

In recent years, many devices have been developed that have a speech-based interface. The speech command may be used for operating a device such as a mobile robot that moves in real time. In this case, the robot might behave in an unintended way because of mis-recognitions or wrong commands. When the robot moves against the users' intention, the behavior should be corrected quickly. But how the robot knows that the behavior does not conform the operator's intention? We are investigating a possibility to use acoustic features of user's utterances to estimate whether the robot's behavior comply the user's intention. To this end, we collected utterances of operating robots by the operator's voice, and analyzed the acoustic features. In this paper, we show the result of the analysis of four features: speaking rate, fundamental frequency (F0), intensity, and speaking interval. As a result, we found severally that speaking rate and speaking interval tended to be faster and shorter when the robot is behaving against users intention, but we did not any differences for F0 and intensity.
A Task Development Experiment for the Multi-task Spoken Dialog System based on QA Database

MIYAKE Shinji, ITO Akinori

IEICE technical report. Speech　113　(161)　31-36　2013/07/25
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

We are developing a spoken dialog system for daily life support task such as smart home or human symbiotic robot. This system exploits a dialog control strategy based on a Q-A database, which makes it easier for a developers to develop a new task description. Even a novice developercan create a task because the task creation procedure is just making a list of supposed user utterances as well as responses to that utterances. Moreover, the system can treat multiple tasks by just merging multiple task descriptions in parallel. In this paper, we conduct an experiment to confirm if novice developers can really create task descriptions that is as good as that by the experienced developer. As a result, the created task description by novice developers were similar to that by the experienced developers, and the impressions of the system user were similar for both task descriptions. However, the task completion rate and the discrimination rate of ambiguous utterances were higher for the task description by the experienced developer than that by the novice developers.
Consideration of the relation between auditory impression and acoustic features of death growl and scream singing voice

Kato KEIZO, Ito AKINORI

IEICE technical report. Speech　112　(422)　43-48　2013/01/30
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

In the contemporary music scene, death-growl and scream singing style are often used in extreme metal, and have been one of the indispensable singing style. In this study, we attempt to clarify the essential acoustic feature of death-growl and scream singing voice, by considering relationship between auditory impression and acoustic feature.
Multi-modal Information Processing by Embedding Image Features into Speech Signal

ABE Yohei, ITO Akinori

112　(420)　1-5　2013/01/29
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

Lip movement has a close relationship with speech because lip moves when we talk. The idea of this work is to extract the lip movement feature from the facial video and embed the movement feature into speech signal using data hiding technique. In this paper, we show the basic framework of the method and apply the proposal method to multi-modal voice activity detection (VAD). As a result of detection experiment using SVM, we obtained higher accuracy than audio-only VAD in noisy environment. In addition we investigated effects of embedding data into speech signal on sound quality and detection accuracy.
対話中のユーザ状態逐次推定のための多段階識別手法に関する検討

千葉祐弥, 伊藤仁, 伊藤彰則

研究報告ヒューマンコンピュータインタラクション（HCI）　2013　(21)　1-6　2013/01/25

More details Close

従来の音声対話システムは，ユーザが入力した発話の音声認識結果を基準として処理を行うため，ユーザの入力を待機している間にユーザの状態を推定することはできなかった．しかしながら，実環境下においては，ユーザがシステムのプロンプトに戸惑ってしまうなどで，入力を行うことができないという状況が度々起こる．こういったユーザに対して適切な応答を行うためには，従来の音声対話システムでは無視されていた「発話を行う前のユーザ状態」を考慮する必要がある．我々は，発話前のユーザ状態を 2 種類定義し，その推定手法について研究を行ってきた．ここまでの分析結果から，マルチモーダル情報を用いることで対象とするユーザの状態がある程度推定できることを結論づけた．この結果を踏まえ，本報告では動画像と音声から得られる情報を統合し，逐次的にユーザの状態を推定する手法について検討を行う．
対話中のユーザ状態逐次推定のための多段階識別手法に関する検討

千葉祐弥, 伊藤仁, 伊藤彰則

研究報告音声言語情報処理（SLP）　2013　(21)　1-6　2013/01/25

More details Close

従来の音声対話システムは，ユーザが入力した発話の音声認識結果を基準として処理を行うため，ユーザの入力を待機している間にユーザの状態を推定することはできなかった．しかしながら，実環境下においては，ユーザがシステムのプロンプトに戸惑ってしまうなどで，入力を行うことができないという状況が度々起こる．こういったユーザに対して適切な応答を行うためには，従来の音声対話システムでは無視されていた「発話を行う前のユーザ状態」を考慮する必要がある．我々は，発話前のユーザ状態を 2 種類定義し，その推定手法について研究を行ってきた．ここまでの分析結果から，マルチモーダル情報を用いることで対象とするユーザの状態がある程度推定できることを結論づけた．この結果を踏まえ，本報告では動画像と音声から得られる情報を統合し，逐次的にユーザの状態を推定する手法について検討を行う．
手すりを移動するコミュニケーションロボット-道案内方法の比較-

廣井富, 黒田尚孝, 藤原祐磨, 戸塚典子, 伊藤彰則

日本ロボット学会学術講演会予稿集(CD-ROM)　31st　2013
ロボットアバタを用いた指差し行為の実装-人間による指差し認識の調査-

黒田尚孝, 廣井富, 伊藤彰則

日本ロボット学会学術講演会予稿集(CD-ROM)　31st　2013
ARキャラクタを用いた音声対話による英会話学習システムの検討-タイムプレッシャー導入の効果-

鈴木直人, 廣井富, 藤原祐磨, 黒田尚孝, 戸塚典子, 千葉祐弥, 伊藤彰則

日本バーチャルリアリティ学会大会論文集(CD-ROM)　18th　2013

ISSN： 1349-5062
ARキャラクタとの英会話練習時における交替潜時のタイムプレッシャーによる制御

鈴木直人, 廣井富, 藤原祐磨, 黒田尚孝, 戸塚典子, 千葉祐弥, 伊藤彰則

電子情報通信学会技術研究報告　113　(366(SP2013 82-95))　2013

ISSN： 0913-5685
対話ターン中のユーザ状態の推定に有用なモダリティの分析 (音声・第14回音声言語シンポジウム)

千葉祐弥, 伊藤仁, 伊藤彰則

電子情報通信学会技術研究報告 : 信学技報　112　(369)　35-40　2012/12/20
Publisher: 一般社団法人電子情報通信学会
ISSN： 0913-5685

More details Close

従来の音声対話システムは,ユーザが入力した発話を基準として処理を決定しているため,入力を待機している間にユーザの状態を推定することはできない.しかしながら,実環境下においてはユーザがシステムのプロンプトに戸惑ってしまい,入力をすることができない状況が度々起こる.このような場合,一定時間おきに同一内容のプロンプトを提示することが一般的であるが,この補助は入力内容を考えているユーザにとっては非常にわずらわしいものである.これらのユーザに対して適切な応答を行うためには,発話を行う前のユーザ状態を推定できる必要がある.以前行なっていた検討では,様々な影響を切り分けた分析を行わずに自動推定を試みていたため,どの情報がユーザの状態の推定に必要なのかが不明瞭であった.そこで,本稿ではあらためてデータの収集と被験者による評価実験を行い,より詳しい分析を行った.
トピック関連語推定とSTDによる未知語推定の評価 (音声・第14回音声言語シンポジウム)

佐藤壮一, 伊藤彰則

電子情報通信学会技術研究報告 : 信学技報　112　(369)　143-147　2012/12/20
Publisher: 一般社団法人電子情報通信学会
ISSN： 0913-5685

More details Close

本稿では,音声認識結果から関連する単語を推定するトピック関連語推定と,発話中にある単語が含まれているかどうかを見る検索語検出(SpokenTermDetection:STD)を用いて,音声認識における未知語を推定した.トピック関連語推定のみを用いた場合,STDのみを用いた場合,両方を用いた場合について,それぞれ比較し検討を行った.その結果,両方を用いた場合に推定語数が多い状況で,トピック関連語推定のみの場合に推定語数が少ない状況で最も良い再現率を得られることがわかった.また,トピック関連語推定の再現率が高い状態でSTDを利用することで,トピック関連語推定のみの場合よりも高い適合率を得ることができることもわかった.
対話ターン中のユーザ状態の推定に有用なモダリティの分析

千葉祐弥, 伊藤仁, 伊藤彰則

研究報告音声言語情報処理（SLP）　2012　(7)　1-6　2012/12/13

More details Close

従来の音声対話システムは，ユーザが入力した発話を基準として処理を決定しているため，入力を待機している間にユーザの状態を推定することはできない．しかしながら，実環境下においてはユーザがシステムのプロンプトに戸惑ってしまい，入力をすることができない状況が度々起こる．このような場合，一定時間おきに同一内容のプロンプトを提示することが一般的であるが，この補助は入力内容を考えているユーザにとっては非常にわずらわしいものである．これらのユーザに対して適切な応答を行うためには，発話を行う前のユーザ状態を推定できる必要がある．以前行なっていた検討では，様々な影響を切り分けた分析を行わずに自動推定を試みていたため，どの情報がユーザの状態の推定に必要なのかが不明瞭であった．そこで，本稿ではあらためてデータの収集と被験者による評価実験を行い，より詳しい分析を行った．
Enrichment of audio signal using side information

ITO Akinori

Technical report of IEICE. EA　112　(292)　87-92　2012/11/09
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

This paper describes methods that add values to audio signals using side information. There have been many acoustic signal processing methods have been proposed for estimating the lost information from the original signal. Using the appropriate side information, we can enhance the estimation easily. In this paper, the principle of the audio signal processing using side information is described first, and then three applications are described: packet loss concealment of audio signal, manipulation of mixed music signal and frequency band extension of telephone speech.
The Available Telecommunications Services at Serious Disaster

SHOJI Sadao, AOKI Takafumi, ITO Akinori, OMACHI Shinichiro, ITO Koichi

IEICE technical report　112　(208)　71-72　2012/09/13
Publisher: The Institute of Electronics, Information and Communication Engineers

More details Close

Hitachi East Japan Solutions, Ltd. And Tohoku University study the Available Telecommunications Services and Security and Information Sharing in case of Overcrowding of Mobile Communications Network at Serious Disaster.
Estimation of a User's Internal State before the First Input Utterance Using HMM with Non-verbal Information

CHIBA Yuya, ITO Masashi, ITO Akinori

Technical report of IEICE. PRMU　111　(430)　7-12　2012/02/02
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

This paper describes a method for estimating the internal state of the user of a spoken dialog system before his input utterance. In the practical use of dialogue-based system, the user often perplexed with the prompt. An ordinary system provides more detailed information to the user taking time to input, but these helps are meddlesome for the user considering the answer to the prompt. To make appropriate response, the spoken dialogue system have to be able to consider the user's internal state before user's input. The conventional researches on user modeling have focused on the linguistic information of the utterance. One problem of these approaches is that it cannot estimate the user's states until the end of the user's first utterance. Therefor, our study focused on the user's non-verbal output such as fillers, silence or head-moving until the occurrence of user's input utterance. This paper describes the method of the user modeling by HMM. We conducted the examination discrimination, and obtained the accuracy of 79.6%.
The SEES; Singing Enthusiasm Evaluation System for Amateur Singing Entertainment

Ryunosuke Daido, Masashi Ito, Shozo Makino, Akinori Ito

IPSJ SIG Notes　2012　(2)　1-7　2012/01/27
Publisher: Information Processing Society of Japan (IPSJ)

More details Close

The goal of our research is to develop a system for evaluating singing enthusiasm. As evaluation systems for karaoke represent, many researchers have worked on automatic evaluation methods of singing voice to make additional value on amateur singing entertainment. However most of the researches try to evaluate only singing skill. In our research, the point of interest is not singing skill but singing enthusiasm. We describe in this paper our attempt to develop an automatic evaluation system of singing enthusiasm through analyses of principles on human perception of it. Moreover we propose a new style of amateur singing entertainment with our system.
Acoustic analysis towards extreme voice synthesis of death growl and scream singing voices

Keizo Kato, Akinori Ito

IPSJ SIG Notes　2012　(14)　1-6　2012/01/27
Publisher: Information Processing Society of Japan (IPSJ)

More details Close

In this study, we analized acoustic feauture of growl and scream singing voices used in extream metal music, such as death metal, metal core, and so on. We observed sub-harmonics and macro pulse structures those are reported as accoustic features of rough voice. We also measured jitter, shimmer, and HNR values.
patissier-A Lyrics Writing Support System for Amateur Lyricists-

Chihiro Abe, Akinori Ito

IPSJ SIG Notes　2012　(17)　1-6　2012/01/27
Publisher: Information Processing Society of Japan (IPSJ)

More details Close

In this paper, we propose a lyrics writing support system focused on the number of syllables, rhyme and word accent. The system generates candidate sentences that satisfy user-specified conditions based on Ngram, and presents them. Users can use the system like a dictionary, and write lyrics be choosing presented sentences. In our subjective evaluations, we have investigated how the system is utilized for writing lyrics actually. The log of using the system and the questionnaires showed that users want the system to present words suitable for their images, and they used the presented words as keywords of a lyrics rather than as they are.
On short essays carried in the acoustical science and technology

Ito, A.

Acoustical Science and Technology　33　(1)　72-72　2012

DOI： 10.1250/ast.33.72 　
手すりを移動するコミュニケーションロボット-全体コンセプト-

廣井富, 内田裕二, 西村駿宏, 中山貴之, 黒田尚孝, 三宅真司, 戸塚典子, 伊藤彰則

ヒューマンインタフェースシンポジウム論文集(CD-ROM)　2012　2012

ISSN： 1345-0794
ロボットアバタを用いた指差し行為の実現-ロボットアバタへの実装-

黒田尚孝, 廣井富, 三宅真司, 伊藤彰則

日本感性工学会大会予稿集(CD-ROM)　14th　2012
ロボットアバタを用いた指差し行為の移動ロボットへの実装

黒田尚孝, 廣井富, 三宅真司, 伊藤彰則

日本ロボット学会学術講演会予稿集(CD-ROM)　30th　2012
Detection of Utterances that Need Clarification Using a Question and Answer Database

三宅真司, 廣井富, 伊藤彰則

情報処理学会研究報告(CD-ROM)　2012　(2)　2012

ISSN： 2186-2583
Controlling the start time of human utterance by behavior of the robot

中山貴之, 廣井富, 黒田尚孝, 三宅真司, 伊藤彰則

情報処理学会研究報告(CD-ROM)　2012　(2)　2012

ISSN： 2186-2583
日常生活支援移動ロボットASAHIの開発-全体構想とハードウェア構成-

廣井富, 黒田尚孝, 内藤圭祐, 高田晶太, 松井一馬, 井上駿, 林和孝, 中山貴之, 松中翔平, 伊藤彰則

日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM)　2012　2012

ISSN： 2424-3124
一つのLRFを用いた人追跡に関する一考察

松中翔平, 廣井富, 内藤圭祐, 井上駿, 伊藤彰則

日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM)　2012　2012

ISSN： 2424-3124
ロボットアバタを用いた指差し行為の実現-基本コンセプトと予備実験-

黒田尚孝, 廣井富, 松井一馬, 三宅真司, 伊藤彰則

日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM)　2012　2012

ISSN： 2424-3124
A Comparison of Side Information Expressions Incorporating Background Music Signals for Manipulating Mixed Music Sounds

SASAKI Yuto, HAHM Seong-Jun, ITO Akinori

111　(287)　47-52　2011/11/14
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

In this paper, we propose a method for manipulating vocal sound in mixed music signals using side information. In the proposed method, fundamental frequency (pitch) of a vocal sound signal and the backing sound information are used as side information. After receiving the mixed music signal, vocal sound manipulation is performed using a comb filter with harmonic structure using pitch information. The performance was evaluated using signal-to-noise ratio (SNR). We designed three filters using different backing sound information, and compared those filters.
Crisis Responses to the Great East Japan Earthquake : 12. Emergency Activity for Information Systems of the Graduate School of Engineering, Tohoku University, under the Great East Japan Earthquake

ITO A.

IPSJ MAGAZINE　52　(9)　1084-1085　2011/08/15
A Lyrics Writing Support System Using a Statistical Language Model

2011　(9)　1-6　2011/07/20
Discrimination of User's Internal State using Non-verbal Information before the First User Utterance

CHIBA Yuya, HAHM Seongjun, ITO Akinori

IEICE technical report　111　(153)　23-28　2011/07/14
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

A dialogue system is expected to be able to make flexible responses to various user behavior. Because speech-based interface can be used hands free and without training, these requirement should be crucial. Although there have been a number of conventional researches for making adapted responses based on linguistic information of user's input, there have been few attempts for deciding system's dialogue strategy before the user making the first input utterance. In this research, we focus on non-verbal information of the user in order to build a system that can help users before the input utterance. Here, we investigate the length of the non-linguistic utterances like filler or silence and 3 angles of face orientation. Finally, we conducted an experiment for discrimination by SVM.
移動ロボット減速時におけるロボットアバタを用いた動作予告法の実装と評価

中山貴之, 廣井富, 伊藤彰則

日本ロボット学会学術講演会予稿集(CD-ROM)　29th　2011
10日間で作るロボット音声対話システム

三宅真司, 廣井富, 伊藤彰則

ヒューマンインタフェースシンポジウム論文集(CD-ROM)　2011　2011

ISSN： 1345-0794
Subjective evaluation of a robot: a real body or augmented reality?

廣井富, 伊藤彰則

電子情報通信学会技術研究報告　110　(459(HCS2010 56-69))　2011

ISSN： 0913-5685
ロボットアバタを用いた日常生活支援ロボットの親しみ感の向上-非ヒューマノイド型ロボットへの適用-

廣井富, 伊藤彰則

日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM)　2011　2011

ISSN： 2424-3124
日常生活支援移動ロボットGoyaneの開発-高さ変更可能な機構の提案-

廣井富, 篠原達也, 兼次一喜, 岩本昂, 中山貴之, 伊藤彰則

日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM)　2011　2011

ISSN： 2424-3124
Modeling expansion using Web for spoken document retrieval based on probabilistic language model

IEICE technical report　110　(357)　109-114　2010/12/20
Publisher: 電子情報通信学会
ISSN： 0913-5685
Modeling Expansion using Web for Spoken Document Retrieval based on Probabilistic Language Model

MASUMURA RYO, HAHM SEONGJUN, ITO AKINORI

2010　(20)　1-6　2010/12/13
Publisher: 情報処理学会
ISSN： 0919-6072
An abnormal sound detection method using multi-stage GMM for surveillance microphone

ITO Akinori, AIBA Akihito, ITO Masashi, MAKINO Shozo

IEICE technical report　110　(220)　1-6　2010/10/01
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

We have been developing a method for detecting abnormal sound events from audio signal recorded in real environments, which uses the multi-stage Gaussian Mixture Models (GMM) that learns rare sounds using multiple GMMs. In this paper, we investigate relationship between sound environment and detection performance, and we found that the performance deteriorates in noisy environments. The performance largely depended on SN ratio of the abnormal sounds. Next, we investigated methods for determining hyperparameters of the multi-stage GMM, which involves intermediate thresholds, numbers of mixture of GMMs and the detection threshold. From the experimental results, combination of Percentile-based threshold determination and Bayesian information criterion (BIC)-based mixture determination was most effective. However, when using the automatically-determined parameters, the detection performance deteriorated around 20%.
Sinusoidal Modeling for Voiced Speech Signals Based on Local Vector Transform and Time-Warping

ITO Masashi, ITO Akinori

The IEICE transactions on information and systems　93　(9)　1745-1754　2010/09/01
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 1880-4535
Topic Expression of Words using Web Documents for Unsupervised Language Model Adaptation

2010　(18)　1-6　2010/07/15
Publisher: 情報処理学会
ISSN： 1884-0930
Lecture Speech Recognition Based on Word Graph Combination by Using Quinphone HM-Net

KATO Masaharu, KOSAKA Tetsuo, ITO Akinori, MAKINO Shozo

IEICE technical report　110　(81)　37-42　2010/06/10
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

While high recognition performance has been achieved for read speech, rather poor performance has been reported for spontaneous speech recognition because it has various problems, such as hesitations, filled pauses, unclear pronunciation, and so on. In particular, acoustic variation caused by coarticulation has become a serious problem. In order to solve the problem, context-dependent models such as triphone or quinphone are used for recognition. However, the strength of coarticulatory effect varies widely in spontaneous speech. In this study, we attempt to improve the recognition performance by using a technique of word graph combination in which various acoustic models are combined.
Measuring "enthusiasm" of singing voice

DAIDO RYUNOSUKE, ITO MASASHI, ITO AKINORI, MAKINO SHOZO

2010　(10)　1-6　2010/05/20
Publisher: 情報処理学会
ISSN： 0919-6072
Towards development of practical life-support robots

廣井富, 伊藤彰則

電子情報通信学会技術研究報告　109　(457(HCS2009 64-88))　2010

ISSN： 0913-5685
拡張現実感を用いた日常生活支援移動ロボットへの位置の指示方法の提案

去来川勇樹, 廣井富, 榊洋祐, 二神龍平, 中山貴之, 伊藤彰則

バイオメカニズム学術講演会予稿集　31st　2010
日常生活支援移動ロボットGoyaneの開発

廣井富, 後藤基允, 山本祐三, 山根佑介, 稲田遥一, 大原達哉, 木村昭太, 久野修平, 伊藤彰則

日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM)　2010　2010

ISSN： 2424-3124
日常生活支援移動ロボットのためのロボットアバタを用いた動作予告法の比較

廣井富, 大原達哉, 木村昭太, 久野修平, 伊藤彰則

日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM)　2010　2010

ISSN： 2424-3124
音声認識における言語モデル

Akinori Ito

The Journal of the Acoustical Society of Japan　66　(1)　32-35　2010/01

DOI： 10.20697/jasj.66.1_32 　
Utterance discrimination for dialog control on multi-task spoken dialog system

Awano Kentaro, Ito Masashi, Ito Akinori, Makino Shozo

IEICE technical report　109　(355)　37-42　2009/12/21
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685
Evaluation of unsupervised language model adaptation based on topic-related word estimation using WWW

Masumura Ryo, Ito Masashi, Ito Akinori, Makino Shozo

IEICE technical report　109　(355)　183-188　2009/12/21
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685
Evaluation of Unsupervised Language Model Adaptation based on Topic-related Word Estimation using WWW

MASUMURA RYO, ITO MASASHI, ITO AKINORI, MAKINO SHOZO

2009　(32)　1-6　2009/12/14
Publisher: 情報処理学会
ISSN： 0919-6072
Utterance Discrimination for dialog control on Multi-task Spoken Dialog System

AWANO KENTARO, ITO MASASHI, ITO AKINORI, MAKINO SHOZO

2009　(7)　1-6　2009/12/14
Publisher: 情報処理学会
ISSN： 0919-6072
Bit Rate Reduction of Vocoder-Type Speech Coder by Reducing Temporal Redundancy

KOHATA Minoru, SUZUKI Motoyuki, ITO Akinori, MAKINO Syouzou

IEICE technical report　109　(308)　7-12　2009/11/19
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

We formerly proposed a new segment quantization method named LZSQ, which is a modified version of the LZ coding, so that it can be applied to a continuous information source. In this paper, LZSQ is applied to a vocoder-type speech coder for reducing its bit rate, by removing temporal redundancy in coding parameters, while preserving the quality of coded speech. In this paper, LZSQ is applied to six coding parameters of the MELP coder, which is one of the standardized vocoder-type speech coders operating at 2.4kbit/s, to reduce its bit rate as lower as possible. As the result, the total bit rate was reduced to about 1.57kbit/s.
この曲、何だっけ？歌で音楽を探す「歌声検索」

伊藤彰則, 鈴木基之, 牧野正三

DTM Magazine　16　(11)　100-101　2009/11
Publisher: 寺島情報企画
An algorithm for fast calculation of back-off n-gram probabilities with unigram rescaling

Kato, M., Kosaka, T., Ito, A., Makino, S.

IAENG International Journal of Computer Science　36　(4)　2009/11/01

ISSN： 1819-656X
RE-005 Sinusoidal Modeling for Voiced Speech Based on a Local Vector Transform

Ito Masashi, Ito Akinori

8　(2)　43-48　2009/08/20
Publisher: Forum on Information Technology
Detection of abnormal sound using multi-stage GMM and segment model

39　(5)　401-405　2009/08/03
Publisher: 日本音響学会聴覚研究委員会
ISSN： 1346-1109
A study on objective evaluation of MP3 packet loss concealment

39　(5)　367-372　2009/08/03
Publisher: 日本音響学会聴覚研究委員会
ISSN： 1346-1109
A Study on Objective Evaluation of MP3 Packet Loss Concealment

KONNO Kiyoshi, ITO Masashi, ITO Akinori, MAKINO Shozo

IEICE technical report　109　(166)　37-42　2009/07/27
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

In this paper, we analyze objective evaluation of MP3 audio with packet loss concealment. As an objective evaluation for wideband audio, PEAQ is recommended as ITU-R BS.1387. However, PEAQ is not designed to evaluate audio with packet losses, and its accuracy is not sufficient. So we applied multiple linear regression analysis using PEAQ's Model Output Variables. In addition, we improved correlation by taking variance of subband SNR into account, which may reflects degradation in a specific frequency band. As a result of cross-validation, mean of correlation was about 0.84.
Detection of Abnormal Sound Using Multi-stage GMM and Segment Model

AIBA Akihito, ITO Masashi, ITO Akinori, MAKINO Shozo

IEICE technical report　109　(166)　71-75　2009/07/27
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

We propose an abnormal sound detection system for surveillance microphone. The system uses models for normal sounds produced actually in surveillance place and detects sounds not learned as abnormal sounds. Therefore, the system doesn't limit detection targets to particular events and can cope with any abnormal events. The detection performance of the proposed system was examined for actual environmental sounds. The performance was improved by a multi-stage GMM that models normal sounds that rarely occur. Furthermore, we examined incorporating a dynamic variation of acoustic features by segment features.
Panel Discussion Featuring Newly Honored Doctors (III) "Research for me, research for new values"

ITO Akinori, ANDO Daichi, LE ROUX Jonathan, NAKANO Tomoyasu, YOSHII Kazuyoshi

2009　(7)　1-5　2009/07/22
Publisher: 情報処理学会
ISSN： 0919-6072
Utterance Discrimination for using Multiple Spoken Dialog Systems

AWANO KENTARO, ITO MASASHI, ITO AKINORI, MAKINO SHOZO

2009　(15)　1-6　2009/05/14
Publisher: 情報処理学会
ISSN： 0919-6072
Utterance Discrimination for using Multiple Spoken Dialog Systems

2009　(15)　1-6　2009/05/14
Publisher: 情報処理学会
ISSN： 1884-0930
Composition Search Query for Language Model Adaptation using WWW

MASUMURA RYO, ITO MASASHI, ITO AKINORI, MAKINO SHOZO

2009　(10)　1-8　2009/05/14
Publisher: 情報処理学会
ISSN： 0919-6072
Music Information Retrieval using database with multiple F0 candidates

KOSUGI YU, ITO MASASHI, ITO AKINORI, MAKINO SHOZO

2009　(6)　1-6　2009/05/14
Publisher: 情報処理学会
ISSN： 0919-6072
Adaptive Multiple Description Coding for Flash Video based on Bitstream Pattern Reconstruction

KURAISHI Takuya, ITO Masashi, ITO Akinori, MAKINO Shozo

71　275-276　2009/03/10
Database generation from acoustic signal for music infomation retrieval system with Query-by-Humming

KOSUGI Yu, ITO Masashi, ITO Akinori, MAKINO Shozo

71　237-238　2009/03/10
DS-3-8 Bit-rate control of payload by information hiding based on ADPCM

HANDA Hironori, ITO Akinori, SUZUKI Yoiti

Proceedings of the IEICE General Conference　2009　(2)　"S-33"-"S-34"　2009/03/04
Publisher: The Institute of Electronics, Information and Communication Engineers
Implementation of preliminary-announcement for a life-support mobile robot using a robot avatar

廣井富, 後藤基允, 山本祐三, 大原達哉, 木村昭太, 伊藤彰則

日本ロボット学会学術講演会予稿集(CD-ROM)　27th　2009
Novel tonal feature and statistical user modeling for query-by-humming

Motoyuki Suzuki, Takuto Ichikawa, Akinori Ito, Shozo Makino

Journal of Information Processing　17　95-105　2009
Publisher: Information Processing Society of Japan
DOI： 10.2197/ipsjjip.17.95 　

ISSN： 1882-6652 0387-5806
Evaluation of English Intonation based on Combination of Multiple Evaluation Scores

Akinori Ito, Tomoaki Konno, Masashi Ito, Shozo Makino

INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5　596-599　2009
Relative importance of formant and whole-spectral cues for vowel perception

Masashi Ito, Keiji Ohara, Akinori Ito, Masafumi Yano

INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5　132-+　2009
Detailed description of triphone model using SSS-free algorithm

Motoyuki Suzuki, Daisuke Honma, Akinori Ito, Shozo Makino

INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5　1403-+　2009
Multiple description coding of flash video based on adaptive allocation of DCT coefficients Peer-reviewed

Akinori Ito, Takuya Kuraishi, Masashi Ito, Shozo Makino

APSIPA ASC 2009 - Asia-Pacific Signal and Information Processing Association 2009 Annual Summit and Conference　453-456　2009
Evaluation of annealing schadule for PLSA language model adaptaion

KATO Masaharu, KOSAKA Tetsuo, ITO Akinori, MAKINO Shozo

IPSJ SIG Notes　2008　(123)　49-53　2008/12/02
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

Probabilistic Latent Semantic Analysis (PLSA) is a powerful statistical laguage model. However the PLSA has the local maxima problem. To overcame this problem, the EM annealing algorithm has been proposed. In this paper, we designed annealing schedule β with some continuous functions. As a result, we found that increasing functions and square root functions are the best for annealing schedule. In the experiment, we obtain 28.7% perplexity reduction and 5.3% word error rate reduction.
Estimation of Spoken Dialog System using Automatically-generated question-and-answer database

MORIMOTO Takahiro, ITO Masashi, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

IEICE technical report　108　(337)　267-272　2008/12/02
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

A question-and-answer style spoken dialog system based on example-based answer generation is known to be robust against variation of user utterances. However, it is costly to create QA database for a new task. In this paper, we proposed a method to reduce cost of preparing the database by generating the database automatically from templates. As a result, we obtained almost same performance using the automatically generated QA database compared with the manually prepared database. In addition, we propose a new scoring method to choose an answer based on F-measure, which improved the accuracy of answer selection.
Evaluation of annealing schadule for PLSA language model adaptaion

KATO Masaharu, KOSAKA Tetsuo, ITO Akinori, MAKINO Shozo

IEICE technical report　108　(337)　49-53　2008/12/02
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

Probabilistic Latent Semantic Analysis (PLSA) is a powerful statistical laguage model. However the PLSA has the local maxima problem. To overcame this problem, the EM annealing algorithm has been proposed. In this paper, we designed annealing schedule β with some continuous functions. As a result, we found that increasing functions and square root functions are the best for annealing schedule. In the experiment, we obtain 28.7% perplexity reduction and 5.3% word error rate reduction.
Estimation of Spoken Dialog System using Automatically-generated question-and-answer database

MORIMOTO Takahiro, ITO Masashi, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

IPSJ SIG Notes　2008　(123)　267-272　2008/12/02
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

A question-and-answer style spoken dialog system based on example-based answer generation is known to be robust against variation of user utterances. However, it is costly to create QA database for a new task. In this paper, we proposed a method to reduce cost of preparing the database by generating the database automatically from templates. As a result, we obtained almost same performance using the automatically generated QA database compared with the manually prepared database. In addition, we propose a new scoring method to choose an answer based on F-measure, which improved the accuracy of answer selection.
Multiple description coding of an audio stream by optimum recovery transforms

Ito, A., Makino, S.

Journal of Digital Information Management　6　(2)　189-195　2008/12/01

ISSN： 0972-7272
I-021 動き情報を用いたビットストリームパターン推定によるFlash VideoのMultiple Description符号化(グラフィクス・画像,一般論文)

倉石卓也, 伊藤仁, 伊藤彰則, 牧野正三, 鈴木基之

情報科学技術フォーラム講演論文集　7　(3)　241-242　2008/08/20
Publisher: FIT(電子情報通信学会・情報処理学会)運営委員会
A study of high-quality speech modification based on sinusoidal representation

38　(5)　513-518　2008/08/04
Publisher: 日本音響学会聴覚研究委員会
ISSN： 1346-1109
A study of high-quality speech modification based on sinusoidal representation

ITO Masashi, OHARA Keiji, ITO Akinori, YANO Masafumi

IEICE technical report　108　(179)　41-46　2008/07/28
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

One of the crucial problems in speech analysis is to separate acoustical characteristics caused by source signal and vocal-tract filter from input speech signal. To overcome this problem, a method is proposed to estimate fundamental frequency and vocal-tract filter response on the basis of sinusoidal representation of speech. Three psycho-acoustical experiments were carried out to evaluate accuracy of the estimation for natural utterances. The results indicated that the proposed algorithm could estimate sinusoidal parameters and fundamental frequency with high accuracy. However, it was also indicated that non-negligible errors were remained in interpolating vocal-tract filter response.
Intonation Evaluation by Combination of Multiple Evaluation Scores using Synthesized Voice

KONNO Tomoaki, ITO Masashi, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

IEICE technical report　108　(142)　37-42　2008/07/12
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

In this paper, we describe a system for intonation evaluation of English utterance by Japanese learners using synthesized speech for a CALL system. To evaluate the intonation of learners' utterance, we need reference utterances, for which English native speakers' utterances should be used. However, it is costly to gather native speakers' utterances for all sentences in the system. Therefore, we examined an intonation evaluation method using synthesized speech. Intonation evaluation system calculates scores between a learner's utterance and corresponding utterances by the teachers. We investigated a method of combining multiple scores. In addition, we incorporated a feature for rhythm evaluation into intonation evaluation. As a result, we obtained improvement of correlation between scores by human evaluators and the system. Furthermore, we analyzed a tendency of intonation evaluation by the system through limiting evaluation utterances to find out what degrades the system performance.
Statistical Language Modeling and Its Problems

ITO Akinori

IPSJ SIG Notes　2008　(68)　43-46　2008/07/11
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

Statistical language models are widely used as language models for large vocabulary continuous speech recognition. Above all, a back-off n-gram is a de facto standard as a language model for speech recognition. Number of models have been proposed so far for overcoming the back-off n-gram, but none of them has achieved large improvement over the back-off trigram. In this paper, various language models are briefly reviewed, and I give some suggestions what is needed for current language models, and discuss possibilities of improving language models.
Packet Loss Concealment for Flash Video Streaming Using Multiple Description Coding

KURAISHI Takuya, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

全国大会講演論文集　70　(0)　107-108　2008/03/13
DS-4-3 A New Lower Bits Substitution Method for log-PCM using ADPCM

ABE Shun-ichiro, ITO Akinori, SUZUKI Yoiti

Proceedings of the IEICE General Conference　2008　(2)　"S-23"-"S-24"　2008/03/05
Publisher: The Institute of Electronics, Information and Communication Engineers
Improvement of a Query-by-Humming Music Information Retrieval System using Multiple Musical Interval Features

ICHIKAWA Takuto, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

IPSJ SIG Notes　2008　(12)　7-12　2008/02/08
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

This paper describes a query-by-humming (QbH) music information retrieval (MIR) system without FO extraction. In FO extraction based system, FO extraction errors inevitably occur that degrades performance of the system. Furthermore, errors in pitch of sung data degrade performance of the system, too. To improve these problems, we have propose an MIR system that used a musical interval feature and probabilistic models. The performance of the proposed system exceeded the system based FO extraction. In this paper, we use peak interval of the cross-correlation function as a tonal feature to improve performance of the system. In addition, we integrated multiple retrieval result to obtain better recognition result. From an experimented result, the top retrieval accuracy given by the proposed method have exceeded the system based FO extraction by 13.2 %.
正弦波モデルに基づく高品質音声変調の検討

伊藤仁, 小原桂二, 伊藤彰則, 矢野雅文

信学技報　EA2008-52　(15067)　2008
正弦波モデルに基づく非定常音声の分析と変調

伊藤仁, 小原桂二, 伊藤彰則, 矢野雅文

日本音響学会秋季研究発表会講演論文集　3-4-5.　2008
Are Bigger Robots Scary? - The Relationship Between Robot Size and Psychological Threat -

Yutaka Hiroi, Akinori Ito

2008 IEEE/ASME INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT MECHATRONICS, VOLS 1-3　546-551　2008

DOI： 10.1109/AIM.2008.4601719 　

ISSN： 2159-6255
A Fast Speaker Adaptation Method using Aspect Model

Seongjun Hahm, Akinori Ito, Shozo Makino, Motoyuki Suzuki

INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5　1221-1224　2008
Recognition of English Utterances with Grammatical and Lexical Mistakes for Dialogue-based CALL System

Akinori Ito, Ryohei Tsutsui, Shozo Makino, Motoyuki Suzuki

INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5　2819-2822　2008
Discrimination of Task-Related Words for Vocabulary Design of Spoken Dialog Systems

Akinori Ito, Toyomi Meguro, Shozo Makino, Motoyuki Suzuki

INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5　207-+　2008
Automatic Clustering of Part-of-speech for Vocabulary Divided PLSA Language Model

Motoyuki Suzuki, Naoto Kuriyama, Akinori Ito, Shozo Makino

IEEE NLP-KE 2008: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING　289-+　2008

DOI： 10.1109/NLPKE.2008.4906747 　
Examination of judgment method of utterance outside task in voice conversation system

MEGURO Toyomi, SUZUKI Motoyuki, ITO Akinori, MAKINO Syozo

IPSJ SIG Notes　2007　(129)　283-287　2007/12/21
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

In a small task, to be able to do more flexible processing, the utterance that relates to the task is recognized by the written grammar and the utterance that did not relate to the task is recognized by a large vocabulary speech recognition. Then, the technique for identifying sentences that do not relate to sentences that relate to the task by using semantic distance between words of the noun is examined in this paper.
A Study on the Environment and Speaker Adaptation System using Aspect model

HAHM Seongjun, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

IPSJ SIG Notes　2007　(129)　115-118　2007/12/20
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

One of the key issues for adaptation algorithms is to modify a large number of parameters with only a small amount of adaptation data. Speaker adaptation techniques try to obtain near speaker dependent (SD) performance with only small amounts of specific data and are often based on initial speaker independent (SI) recognition systems. In this paper, we introduce an aspect model into an acoustic model for rapid speaker and environment adaptation. A formulation of probabilistic latent semantic analysis (PLSA) is extended to continuous density HMM. We carried out isolated word recognition experiment, and the results was compared to that of MAP and MLLR.
Speech recognition of English spoken by Japanese native speekers using N-gram trained from generated text

TSUTSUI Ryohei, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

IPSJ SIG Notes　2007　(129)　125-130　2007/12/20
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

Our goal is to develop a voice interactive CALL system which enables langage learners to practice words, phrases, and grammars interactively. In order to develop such a system, it is necessary to recognize learner's utterances correctly. We found that 4 or 5 states HMM works better than 3 states HMM in the case of recognition of English spoken by Japanese native speakers. Ngram language model trained from generated text achieves heigher speech recognition accuracy than FSA (Finite States Automata) language model.
Phoneme Recognition with SSS-free HMnet using, Cutting number of paths Method and Smoothing Method

HONMA Daisuke, OHKAWA YUICHI, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

IPSJ SIG Notes　2007　(129)　131-135　2007/12/20
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

When carrying out phoneme recognition with SSS-free HMnet's path connection probability, as probability is specialization for training data, phoneme accuracy don't improve. In this paper, We propose smoorhing method and cutting number of paths Method. In phoneme recognition for specific speaker, as a result both of methods prevent connection probability's specialization, phoneme accuracy improve better than conventonal method.
Phoneme Recognition with SSS-free HMnet using, Cutting number of paths Method and Smoothing Method

HONMA Daisuke, OHKAWA YUICHI, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

IEICE technical report　107　(406)　131-135　2007/12/13
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

When carrying out phoneme recognition with SSS-free HMnet's path connection probability, as probability is specialization for training data, phoneme accuracy don't improve. In this paper, We propose smoorhing method and cutting number of paths Method. In phoneme recognition for specific speaker, as a result both of methods prevent connection probability's specialization, phoneme accuracy improve better than conventonal method.
A Study on the Environment and Speaker Adaptation System using Aspect model

HAHM Seongjun, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

IEICE technical report　107　(406)　115-118　2007/12/13
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

One of the key issues for adaptation algorithms is to modify a large number of parameters with only a small amount of adaptation data. Speaker adaptation techniques try to obtain near speaker dependent (SD) performance with only small amounts of specific data and are often based on initial speaker independent (SI) recognition systems. In this paper, we introduce an aspect model into an acoustic model for rapid speaker and environment adaptation. A formulation of probabilistic latent semantic analysis (PLSA) is extended to continuous density HMM. We carried out isolated word recognition experiment, and the results was compared to that of MAP and MLLR.
Examination of judgment method of utterance outside task in voice conversation system

MEGURO Toyomi, SUZUKI Motoyuki, ITO Akinori, MAKINO Syozo

IEICE technical report　107　(406)　283-287　2007/12/13
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

In a small task, to be able to do more flexible processing, the utterance that relates to the task is recognized by the written grammar and the utterance that did not relate to the task is recognized by a large vocabulary speech recognition. Then, the technique for identifying sentences that do not relate to sentences that relate to the task by using semantic distance between words of the noun is examined in this paper.
Speech recognition of English spoken by Japanese native speekers using N-gram trained from generated text

TSUTSUI Ryohei, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

IEICE technical report　107　(406)　125-130　2007/12/13
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

Our goal is to develop a voice interactive CALL system which enables langage learners to practice words, phrases, and grammars interactively. In order to develop such a system, it is necessary to recognize learner's utterances correctly. We found that 4 or 5 states HMM works better than 3 states HMM in the case of recognition of English spoken by Japanese native speakers. Ngram language model trained from generated text achieves heigher speech recognition accuracy than FSA(Finite States Automata) language model.
「おかしな言語」の楽しみ(ちょっとしたエッセイ,コーヒーブレーク)

伊藤彰則

日本音響学会誌　63　(11)　696-696　2007/11/01
Publisher: 一般社団法人日本音響学会
ISSN： 0369-4232
Increasing correlation in one or two bits

37　(7)　509-514　2007/08/09
Publisher: 日本音響学会聴覚研究委員会
ISSN： 1346-1109
Increasing correlation in one or two bits

ITO Akinori, MAKINO Shozo

IEICE technical report　107　(186)　1-6　2007/08/02
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

In this paper we investigated methods that increase correlation between two values using one or two bits of extra information. For methods that use one bit, we investigated '1-bit quantization, ' 'sign correction' and 'difference quantization' methods. For those that use two bits, we investigated '2-bit quantization, ' 'sign correction+difference quantization' methods. From theoretical analysis and numerical experiments, it has been found that the quantization-based method is best when correlation of the original data is weak, while 'difference quantization' or combination of sign correction is better when the original data have strong correlation. Then we applied the methods to multiple description coding of speech signals.
Query-by-Humming Music Information Retrieval System using Probabilistic Distribution for Tone Interval Features

ICHIKAWA Takuto, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

IPSJ SIG Notes　2007　(81)　33-38　2007/08/01
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

This paper describes a query-by-humming (QbH) music information retrieval (MIR) system without pitch extraction. In pitch extraction based system, pitch extraction errors inevitably occur that degrades performance of the system. In this system, a cross-correlation function between two logarithmic frequency spectra is extracted as a tonal feature instead of deltaPitch, and probabilistic models are prepared for all tone intervals assumed to exist in the music. When two signals corresponding to two contiguous notes are given, likelihoods are calculated for all possibility of tone intervals. The advantage of this system is that it is hard to occur a fatal error such as a pitch extraction error because extracted features are modeled stochastically. From a experimented result, the top retrieval accuracy given by the proposed method have exceeded the system based pitch extraction by 4.9%.
Automatic detection and estimation of the direction of calling speech under noisy envirionment

SUZUKI Motoyuki, KITADATE Kota, ITO Akinori, MAKINO Shozo

IEICE technical report　107　(116)　67-72　2007/06/21
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

If the robot understands user's calling voice, it can approach the user to hear the user's commands. In this paper we developed a method to detect user's calling voice and estimate the direction of arrival (DoA) of the voice in a real environment. Many works have been proposed for estimation of DoA, but most of those methods do not assume more than one voice sources. Our method detects pre-registered voice even when other voice and heavy noise exist. The method combines two distinct technologies. One is the multi-channel spectrum subtraction (MSS). Using MSS we record sound from every five degree. The next technology is word spotting using continuous DP (CDP) matching. We perform CDP for all of 72 directions in parallel. When a registered word is detected, the word is verified using the frame relation matrix, which expresses word-internal similarities. Finally, the scores of CDP are combined with power of each direction to determine the DoA. We carried out experiments and obtained 95% accuracy for from 0 to 20 dB SNR conditions.
The evaluation of vocabulary divided PLSA language model using information criterion

栗山直人, 鈴木基之, 伊藤彰則

Proceedings of the Spoken Document Processing Workshop　1　103-108　2007/02/26
Publisher: [豊橋技術科学大学メディア科学リサーチセンター]
Unsupervised iterative language model adaptation using WWW

梶浦泰智, 鈴木基之, 伊藤彰則

Proceedings of the Spoken Document Processing Workshop　1　109-114　2007/02/26
Publisher: [豊橋技術科学大学メディア科学リサーチセンター]
B-6-82 Enhanced secret and high quality audio communication system by disjoint path routing

ENOMOTO Nobuyuki, KITAMURA Tsuyoshi, IWATA Atsushi, TANI Hideaki, ABE Shunichiro, NISHIMURA Ryouichi, SUZUKI Yoiti, SAKAI Toshiyuki, ITO Akinori, MAKINO Shozo

Proceedings of the IEICE General Conference　82-82　2007
Publisher: The Institute of Electronics, Information and Communication Engineers
音声符号化へのMD量子化の適用に関する基礎的検討

WEY H., 西村竜一, 伊藤彰則, 小林まおり, 鈴木陽一

日本音響学会研究発表会講演論文集(CD-ROM)　2007　2007

ISSN： 1880-7658
Automatic evaluation system of English prosody for Japanese learner's speech

Motoyuki Suzuki, Tatsuki Konno, Akinori Ito, Shozo Makino

IMSCI '07: INTERNATIONAL MULTI-CONFERENCE ON SOCIETY, CYBERNETICS AND INFORMATICS, VOL 1, PROCEEDINGS　1　48-53　2007
Analysis of cell wall polysaccharides during storage of a local melon accession 'Wasada-uri' compared to the melon cultivar 'Prince'

T. Nishizawa, A. Ito

Journal of Horticultural Science and Biotechnology　82　(2)　227-234　2007
Publisher: Headley Brothers Ltd
DOI： 10.1080/14620316.2007.11512224 　

ISSN： 1462-0316
Topic and style adaptation using vocabulary divided PLSA language model by criterion of information

KURIYAMA Naoto, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

IPSJ SIG Notes　2006　(136)　233-238　2006/12/22
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

PLSA (Probabilistic Latent Semantic Analysis) is one of promising language model adaptation methods. We propose a new way to combine PLSA and N-gram models by separating the vocabulary into three classes - 'topic'-related, 'style'-related and 'general'-related words. This method trains topic vocabulary PLSA model, style vocabulary PLSA model, and general vocabulary unigram model independently, and combines the three models. And we propose an automatic composing method of vocabulary divide criterion, using pattern of word-Class occurrence between newspaper and CSJ. The experimental result showed that the proposed method achieves 15.48% perplexity reduction than conventional PLSA model, about testset of which topic and style feature are not happen together in the training data.
Deciding Search Query for Unsupervised Language Model Adaptation using WWW

KAJIURA Yasutomo, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

IPSJ SIG Notes　2006　(136)　131-135　2006/12/21
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

To improve the accuracy of an LVCSR system, it is effective to gather text data related to the topic of the input speech and adapts the language model using the text data. However, collecting topic-related text manually requires much effort. To automate the text collection, we have proposed a method to create an adapted language model by collecting topic-related text from World Wide Web. In this paper, we propose the method of deciding available search query using similarities between words and calculating query's availability using small WWW texts. This method reaches same performance as selected query by human.
Topic and style adaptation using vocabulary divided PLSA language model by criterion of information

KURIYAMA Naoto, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

IEICE technical report　106　(444)　55-60　2006/12/15
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

PLSA (Probabilistic Latent Semantic Analysis) is one of promising language model adaptation methods. We propose a new way to combine PLSA and N-gram models by separating the vocabulary into three classes-'topic'-related, 'style'-related and 'general'-related words. This method trains topic vocabulary PLSA model, style vocabulary PLSA model, and general vocabulary unigram model independently, and combines the three models. And we propose an automatic composing method of vocabulary divide criterion, using pattern of word-Class occurrence between newspaper and CSJ. The experimental result showed that the proposed method achieves 15.48% perplexity reduction than conventional PLSA model, about testset of which topic and style feature are not happen together in the training data.
Deciding Search Query for Unsupervised Language Model Adaptation using WWW

KAJIURA Yasutomo, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

IEICE technical report　106　(443)　131-135　2006/12/14
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

To improve the accuracy of an LVCSR system, it is effective to gather text data related to the topic of the input speech and adapts the language model using the text data. However, collecting topic-related text manually requires much effort. To automate the text collection, we have proposed a method to create an adapted language model by collecting topic-related text from World Wide Web. In this paper, we propose the method of deciding available search query using similarities between words and calculating query's availability using small WWW texts. This method reaches same performance as selected query by human.
Music information retrieval from a singing voice based on verification of recognized hypotheses

Motoyuki Suzuki, Toru Hosoya, Akinori Ito, Shozo Makino

ISMIR 2006 - 7th International Conference on Music Information Retrieval　168-171　2006/12/01
A new construction method of a context-dependent HMnet considering phonetic variations

SUZUKI Motoyuki, SAKAMOTO Hajime, ITO Akinori, MAKINO Shozo

IEICE technical report　106　(123)　37-41　2006/06/16
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

Most of all speech recognition systems use a context-dependent HMM (such as triphone) as an acoustic model. It can represent phonetic variations depending on a phoneme context, however, other factors such as speaker, speaking rate, and so on, cannot be considered. In this paper, a new construction algorithm of HMnet is proposed. It can construct an HMnet considering various phonetic variations by combining between SSS and SSS-free algorithm. From the experimental results, the proposed algorithm gives higher recognition accuracy than that given by conventional SSS and SSS-free.
Unsupervised Language Model Adaptation using Web Text

KAJIURA Yasutomo, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

IEICE technical report　106　(123)　43-47　2006/06/16
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

To improve the accuracy of an LVCSR system, it is effective to gather text data related to the topic of the input speech and adapts the language model using the text data. However, collecting topic-related text manually requires much effort. To automate the text collection, we have proposed a method to create an adapted language model by collecting topic-related text from World Wide Web. In this paper, we propose two new methods the search query using multiple words extracted from the preliminary recognition result. This method achieved 2.2 points higher accuracy than the previous method when 1000 documents are gathered. The other method excludes the misrecognized words from the query words. Using the proposed method, the ratio of misrecognized words in all words reduced to only 4%.
「人はなぜコンピューターを人間として扱うか『メディアの等式』の心理学」, バイロン・リーブズ, クリフォード・ナス著, 細馬宏通訳, 翔泳社, 2001年(私のすすめるこの一冊,コーヒーブレーク)

伊藤彰則

日本音響学会誌　62　(6)　473-474　2006/06/01
Publisher: 一般社団法人日本音響学会
ISSN： 0369-4232
A-19-15 An Interpolation Method of the Feature Vector for Finger Character Recognition

Osato Muneyuki, Suzuki Motoyuki, Ito Akinori, Makino Shozo

Proceedings of the IEICE General Conference　2006　333-333　2006/03/08
Publisher: The Institute of Electronics, Information and Communication Engineers
Training optimization and vocabulary division of PLSA language model

KURIYAMA Naoto, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

IPSJ SIG Notes　2006　(12)　37-42　2006/02/04
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

PLSA is a method of composing language model which can reflect the global charactetistics of linguistic context as "topic". We propose more extention of PLSA language model. First, we compare the conventional learning methods of PLSA language model, and examine the optimization of EM annealing schedule. As a result, we found that the best method is to reduce β from 1.0 to some special value. Next, we compose a PLSA language model whose vocabulary set is divided, into content words and function words. Then training and adaptation to topic or style are performed separately. In the experiment, we acheived 82.23% perplexity reduction against conventional way 83.90%.
2項音響工学研究会(3節工学研究会,第5章国際会議・シンポジウム等)

鈴木陽一, 坂本修一, 伊藤彰則

東北大学電気通信研究所研究活動報告　13　278-278　2006/01/01
ロボットアバタを用いたユーザ親和性向上手法の高齢者による評価

廣井富, 伊藤彰則, 高津宣夫, 中野栄二

情報科学技術フォーラム　FIT 2006　2006
Unsupervised language model adaptation based on automatic text collection from WWW

Motoyuki Suzuki, Yasutomo Kajiura, Akinori Ito, Shozo Makino

INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5　5　2202-2205　2006
A User Simulator based on VoiceXML for evaluation of spoken dialog systems

Akinori Ito, Keisuke Shimada, Motoyuki Suzuki, Shozo Makino

INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5　2　1045-1048　2006
Lyrics recognition from a singing voice based on finite state automaton for music information retrieval

Toru Hosoya, Motoyuki Suzuki, Akinori Ito, Shozo Makino

ISMIR 2005 - 6th International Conference on Music Information Retrieval　532-535　2005/12/01
Construction method of acoustic models dealing with various background noises based on combination of HMMs

Motoyuki Suzuki, Yusuke Kato, Akinori Ito, Shozo Makino

9th European Conference on Speech Communication and Technology　973-976　2005/12/01
Pronunciation error detection method based on error rule clustering using a decision tree

Akinori Ito, Yen Ling Lim, Motoyuki Suzuki, Shozo Makino

9th European Conference on Speech Communication and Technology　173-176　2005/12/01
Internal noise suppression for speech recognition by small robots

Akinori Ito, Takashi Kanayama, Motoyuki Suzuki, Shozo Makino

9th European Conference on Speech Communication and Technology　2685-2688　2005/12/01
Feature Value Combination for Finger Character Recognition Using a Color Glove

OSATO Muneyuki, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

IEICE technical report　105　(375)　73-78　2005/10/28
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

Several finger character recognition systems have been developed for the achievement of the finger character recognition to support communication between hearing-impaired people and a person in normal health. In the systems that utilize color information of the hand image, various feature values are employed. In this paper, several effective feature values for the finger character recognition are examined through some comparison experiment results. In addition, we try to recover errors caused by single feature value by combining multiple feature values. Using feature value combination and the combination by posterior probability, 8% improvement of recognition rate was obtained.
Performance evaluation for the multi-mixture HMMs in various kinds of noise with various SNRs conditions

SUZUKI Motoyuki, KATO Yusuke, ITO Akinori, MAKINO Shozo

IEICE technical report. Speech　105　(133)　25-30　2005/06/17
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

Background noise is one of the biggest problem for speech recognition systems in real environments. In order to achieve high recognition performance for corrupted speech, we proposed a new construction method of HMMs dealing with various kinds of background noise. At first, each HMM dealing with a single noise is trained for each background noise, and then all Gaussian components of those HMMs are combined into a "multi-mixture HMM". From the experimental results, the multi-mixture HMM gave the highest recognition performance for any kind of noise and any variation of SNR. Although the multi-mixture HMMs has high performance, it has a huge number of Gaussian components that makes the speech recognition slower. In order to solve the problem, we also proposed a reduction method of Gaussian components. It can decrease the number of Gaussian components with slight deterioration of recognition performance.
Internal noise suppression for speech recognition by small robots based on the noise spectrum prediction

ITO Akinori, KANAYAMA Takashi, SUZUKI Motoyuki, MAKINO Shozo

IEICE technical report. Speech　105　(133)　43-48　2005/06/17
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

Speech recognition by a small robot is difficult because the robot makes noise by itself. In this paper, two new methods are proposed that supresses internal noise of small robots. These methods are based on spectral subtraction (SS). The difference of the proposed methods from the orininal SS is that the proposed methods use the estimated noise spectrum dependent to the motion of the robot. One method, called MDSS, prepares the noise spectrums for all motions. Another method, called NPSS, predicts the noise spectrum from angular velocities of all joints of the robot using a neural network. From the results of the comparison among the original SS and the proposed methods, the proposed methods outperformed the conventional SS. The MDSS method gave good result when the noise within one motion was stable, while the NPSS worked well even when the noise of the motion was unstable.
HMnet training method combining SSS and SSS-free

SAKAMOTO Hajime, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

日本音響学会研究発表会講演論文集　2005　(1)　31-32　2005/03/08
Publisher: 日本音響学会
ISSN： 1340-3168
A construction of a dialogue agent for evaluation of spoken dialogue system

SHIMADA Keisuke, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

日本音響学会研究発表会講演論文集　2005　(1)　69-70　2005/03/08

ISSN： 1340-3168
An HMM robust to multiple noise conditions and change in Signal-Noise ratio by combining multiple noise-adapted HMMs

KATO Yusuke, ITO Akinori, SUZUKI Motoyuki, MAKINO Shozo

日本音響学会研究発表会講演論文集　2005　(1)　83-84　2005/03/08

ISSN： 1340-3168
Improvement of audio signal dimension compression using KL expansion

HARADA Shoji, ITO Akinori, SUZUKI Motoyuki, KOHATA Minoru, MAKINO Shozo

日本音響学会研究発表会講演論文集　2005　(1)　199-200　2005/03/08

ISSN： 1340-3168
Laughter Recognition from Natural Conversation Video Using Facial Expression Recognition

WANG Xinyue, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

日本音響学会研究発表会講演論文集　2005　(1)　217-218　2005/03/08

ISSN： 1340-3168
A grammar error detection method for interactive CALL system

KWEON O.-P, ITO A, SUZUKI M, MAKINO S

日本音響学会研究発表会講演論文集　2005　(1)　303-304　2005/03/08

ISSN： 1340-3168
Lyrics recognition based on Deterministic Finite State Automaton for song retrieval system

HOSOYA Toru, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

日本音響学会研究発表会講演論文集　2005　(1)　603-604　2005/03/08

ISSN： 1340-3168
Speech recognition robust to the internal noise for small robots

KANAYAMA Takashi, ITO Akinori, SUZUKI Motoyuki, MAKINO Shozo

日本音響学会研究発表会講演論文集　2005　(1)　659-660　2005/03/08

ISSN： 1340-3168
A-19-13 An Examination of Finger Character Recognition Using Color Information

Osato Muneyuki, Suzuki Motoyuki, Ito Akinori, Makino Shozo

Proceedings of the IEICE General Conference　342-342　2005
Publisher: The Institute of Electronics, Information and Communication Engineers
Frame - Based Spoken Dialog System for Autonomous Robots

MAKINO Shozo, KONASHI Takashi, ITO Akinori, SUZUKI Motoyuki

IPSJ SIG Notes　2004　(108)　141-146　2004/11/05
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

We have been developing a spoken dialog system. Conventional spoken dialog systems need grammar descriptions and scripts of a dialog, that are difficult to develop. The system proposed in this paper is based on semantic frames, and the system generates the recognition grammar from the frames automatically. As the system requires only a frame-based description for a task of dialog, the system can be easily applied to different kinds of tasks. Moreover, the recognition accuracy is improved by sentence weighting based on phrase class template. We evaluated the system by experiments. The system reached the goal with 2.44 user's utterances in average.
Frame-Based Spoken Dialog System for Autonomous Robots

MAKINO Shozo, KONASHI Takashi, ITO Akinori, SUZUKI Motoyuki

IEICE technical report. Natural language understanding and models of communication　104　(417)　65-70　2004/10/29
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

We have been developing a spoken dialog system. Conventional spoken dialog systems need grammar descriptions and scripts of a dialog, that are difficult to develop. The system proposed in this paper is based on semantic frames, and the system generates the recognition grammar from the frames automatically. As the system requires only a frame-based description for a task of dialog, the system can be easily applied to different kinds of tasks. Moreover, the recognition accuracy is improved by sentence weighting based on phrase class template. We evaluated the system by experiments. The system reached the goal with 2.44 user's utterances in average.
I-069 Smile and Laugh Recognition from Natural Conversation Video

Xinyue Wang, Suzuki Motoyuki, Ito Akinori, Makino Shozo

3　(3)　163-164　2004/08/20
Publisher: Forum on Information Technology
G-014 Comparison of features for Query-by-Humming MIR

Ito Akinori, Heo Sung-Phil, Suzuki Motoyuki, Makino Shozo

情報科学技術フォーラム一般講演論文集　3　(2)　373-374　2004/08/20
Publisher: Forum on Information Technology
I-009 Environmental Map Generation by Omnidirectional Stereo

Goto Nozomu, Suzuki Motoyuki, Ito Akinori, Makino Shozo

情報科学技術フォーラム一般講演論文集　3　(3)　19-20　2004/08/20
Publisher: Forum on Information Technology
An HMM robust to multiple noise conditions by combining multiple noise - adapted HMMs

KATO Yusuke, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

IPSJ SIG Notes　2004　(57)　1-6　2004/05/27
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

This paper describes methods to compose an HMM robust under multiple noise conditions. The methods are based on combination of several HMMs trained under different noise conditions. We propose two combination methods. The first one combines multiple HMMs into a multi-path HMM. The second one combines corresponding states of each HMM into one state by mixing the output probability distributions onto one mixture distribution. The recognition experiment revealed that HMMs composed by the proposed methods shows similar or better results than conventional multi-condition model. One drawback of the model composed by the proposed methods is that it has large number of distributions. To reduce the number of distributions, we examined several methods to unify distributions.
An HMM robust to multiple noise conditions by combining multiple noise-adapted HMMs

KATO Yusuke, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

IEICE technical report. Speech　104　(86)　1-6　2004/05/20
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

This paper describes methods to compose an HMM robust under multiple noise conditions. The methods are based on combination of several HMMs trained under different noise conditions. We propose two combination methods. The first one combines multiple HMMs into a multi-path HMM. The second one combines corresponding states of each HMM into one state by mixing the output probability distributions onto one mixture distribution. The recognition experiment revealed that HMMs composed by the proposed methods shows similar or better results than conventional multi-condition model. One drawback of the model composed by the proposed methods is that it has large number of distributions. To reduce the number of distributions, we examined several methods to unify distributions.
Recent Topics on Speech Recognition

Akinori Ito

IEICE Information and Systems Society Journal　9　(1)　14-21　2004/05/01
Publisher: The Institute of Electronics, Information and Communication Engineers
A study on dialogue-based CALL system

KWEON Oh-pyo, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

IEICE technical report. Speech　103　(633)　19-24　2004/01/23
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

This paper describes a dialogue-based CALL (Computer Assisted Language Learning) system. One of the major problems in CALL systems is that learners are usually assigned a passive role. Learners have no practices in composing their own utterances. The other major problem is that most of the propose CALL systems are the pronunciation exercise systems such as minimal pairs. The pronunciation exercise is an unrealistic task, if the goal of the learner is to obtain the ability to participate actively in a conversation. We proposed a dialogue-based CALL system with which learners can practice in making a conversation and in composing utterances actively. Depending on learners ' utterances, the path of conversation is also changed. A system also checks pronunciation and grammatical errors, and return proper expression. Therefore, learners can obtain the ability to participate actively in a conversation.
Noise adaptive spoken dialog system based on selection of multiple dialog strategies

Akinori Ito, Takanobu Oba, Takashi Konashi, Motoyuki Suzuki, Shozo Makino

8th International Conference on Spoken Language Processing, ICSLP 2004　193-196　2004/01/01
A Japanese dialogue-based CALL system with mispronunciation and grammar error detection

Oh Pyo Kweon, Akinori Ito, Motoyuki Suzuki, Shozo Makino

8th International Conference on Spoken Language Processing, ICSLP 2004　1833-1836　2004/01/01
A spoken dialog system based on automatic grammar generation and template-based weighting for autonomous mobile robots

Takashi Konashi, Motoyuki Suzuki, Akinori Ito, Shozo Makino

8th International Conference on Spoken Language Processing, ICSLP 2004　189-192　2004/01/01
Speaker adaptation method for call systems using bilingual speakers' utterances

Motoyuki Suzuki, Hirokazu Ogasawara, Akinori Ito, Yuichi Ohkawa, Shozo Makino

8th International Conference on Spoken Language Processing, ICSLP 2004　2929-2932　2004/01/01
Error tolerant melody matching method in music information retrieval

SP Heo, M Suzuki, A Ito, S Makino, HY Chung

ADAPTIVE MULTIMEDIA RETRIEVAL　3094　212-227　2004

ISSN： 0302-9743
様々な雑音環境での音声対話における文法と認識精度の関係の分析 (第5回音声言語シンポジウム)

大庭隆伸, 鈴木基之, 伊藤彰則, 牧野正三

電子情報通信学会技術研究報告　103　(517)　133-138　2003/12/18
Publisher: 一般社団法人電子情報通信学会
ISSN： 0913-5685

More details Close

音声認識において,雑音下での認識精度の改善は重要な課題の一つとなっている.そのために,音響モデルや雑音除去法の改善など様々な研究が行われているが,本稿では,対話の立場からの精度改善を試みる.具体的には,音声認識にとって不利な雑音環境になるのにあわせ,認識対象とする語彙・侯補数を削減した文法に変更し音声認識を行う.これにより,雑音の影響が小さい場合には,ユーザの自由な発話を認識できる枠組みを残しつつ,雑音下でも一定の認識精度を維持して対話を行うことが可能となる.これを実現するためには,まず,語彙・侯補数を削減した際に,認識側で認識対象としていない語彙や文法を含むユーザ発話が増加してしまうが,そのための対策が必要となる.また,認識文法を環境にあわせて変更させるには,ある雑音下で対話を行った場合に,認識精度がどの程度になるかを推定する必要があり,これをどのように実現するかが課題となる.前者については,システムの質問提示方法を工夫することにより対策を行い,後者については,雑音・文法と認識精度の関係をニューラルネット学習により推定可能か検討する.
Speaker Adaptation of Bilingual Phone Models using Bilingual Speakers' Speech

OGASAWARA Hirokazu, OHKAWA Yuichi, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

IEICE technical report. Natural language understanding and models of communication　103　(517)　85-90　2003/12/18
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

In this paper, we investigate a method of speaker adaptation of bilingual phone models to improve precision of non-native speech recognition system. Non-native speakers tend to substitute native-language's phones for non-native phones, therefore the recognition system must use bilingual phone models consist of all phones in non-native and native languages. Speaker adaptation, generally, use utterance of the same language as the phone model. However, non-native speaker can't speak well to use speaker adaptation. In order to adapt bilingual phone models, we propose a speaker adaptation method of bilingual phone models using native speaker's utterance. To improve bilingual phone models, we propose a method using bilingual speakers' speech. Experiments showed that the bilingual phone models adapted by the proposed method outperformed the models adapted by conventional methods.
Speaker Adaptation of Bilingual Phone Models using Bilingual Speakers' Speech

OGASAWARA Hirokazu, OHKAWA Yuichi, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

IPSJ SIG Notes　2003　(124)　85-90　2003/12/18
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

In this paper, we investigate a method of speaker adaptation of bilingual phone models to improve precision of non-native speech recognition system. Non-native speakers tend to substitute native-language's phones for non-native phones, therefore the recognition system must use bilingual phone models consist of all phones in non-native and native languages. Speaker adaptation, generally, use utterance of the same language as the phone model. However, non-native speaker can't speak well to use speaker adaptation. In order to adapt bilingual phone models, we propose a speaker adaptation method of bilingual phone models using native speaker's utterance. To improve bilingual phone models, we propose a method using bilingual speakers' speech. Experiments showed that the bilingual phone models adapted by the proposed method outperformed the models adapted by conventional methods.
様々な雑音環境での音声対話における文法と認識精度の関係の分析

大庭隆伸, 鈴木基之, 伊藤彰則, 牧野正三

情報処理学会研究報告音声言語情報処理（SLP）　2003　(124)　133-138　2003/12/18
Publisher: 一般社団法人情報処理学会
ISSN： 0919-6072

More details Close

音声認識において，雑音下での認識精度の改善は重要な課題の一つとなっている．そのために，音響モデルや雑音除去法の改善など様々な研究が行われているが，本稿では，対話の立場からの制度改善を試みる．具体的には，音声認識にとって不利な雑音環境になるのにあわせ，認識対象とする語彙・候補数を削除した文法に変更し音声認識を行う．これにより，雑音の影響が小さい場合には，ユーザの自由な発話を認識できる枠組みを残しつつ，雑音下でも一定の認識精度を維持して対話を行うことが可能となる．これを実現するためには，まず，語彙・候補数を削減した際に，認識側で認識対象としていない語彙や文法を含むユーザ発話が増加してしまうが，そのための対策が必要となる．また，認識文法を環境にあわせて変更させるには，ある雑音下で対話を行った場合に，認識精度がどの程度になるかを推定する必要があり，これをどのように実現するかが課題となる．前者については，システムの質問提示方法を工夫することにより対策を行い，後者については，雑音・文法と認識精度の関係をニューラルネット学習により推定可能か検討する．Speech recognition under noisy environment is one of the hottest topic in the speech recognition research. Noise-tolerant acoustic models or noise reduction techniques are often used to improve the recognition accuracy. In this paper, we propose a method to improve accuracy of spoken dialog system from a dialog strategy point of view. In the proposed method, the dialog system automatically changes its dialog strategy according to the estimated recognition accuracy in noisy environment in order to keep the performance of the system constant. In a noise-free environment, the system accepts any utterance from a user. On the other hand, the system restricts its grammar and vocabulary in a noisy environment. To realize this strategy, we investigated a method to avoid user's out of grammar utterances through an instruction given by the system to a user. Furthermore, we developed a method to estimate recognition from features extracted from noise signal.
Face Detection for Gesture Recognition System

ONODERA Mieko, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

電子情報通信学会技術研究報告. PRMU, パターン認識・メディア理解　103　(453)　25-30　2003/11/21
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

In this paper, we investigate a face detection method for a gesture recognition system. In a gesture recognition system, a face in image might be so small that its parts (eyes, mouth and so on) can't be identified and its outline is not clear, when the distance from the person to the camera is large. In order to detect a small face, we focus on the method based on HMM (Hidden Markov Model). HMM is statistical model used to characterize the statistical properties of a signal. Then we examine a face detection method using HMM to investigate the effect of future vectors and the HMM topology for a face detection method that can detect a small face. Besides, the effect of the size difference between training faces and faces of evaluation data is investigated.
Product Software of Continuous Speech Recognition Consortium -2002 version-

KAWAHARA T, SUMIYOSHI T, LEE A, BANNO H, TAKEDA K, MIMURA M, ITOU K, ITO A, SHIKANO K

IPSJ SIG Notes　2003　(104)　1-6　2003/10/17
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

Continuous Speech Recognition Consortium (CSRC) was founded under IPSJ SIG-SLP for further enhancement of Japanese Dictation Toolkit that had been developed by the IPA project. An overview of the software developed in the third year (Oct. 2002-Sep. 2003) is given in this report. The LVCSR (large vocabulary continuous speech recognition) engine Julius has been improved both in functionality and stability, and ported to Windows in compliance with SAPI (Speech API). A variety of acoustic and language models are set up to realize wider coverage of input, speech. The software package is currently available by contacting the address below.
Examination of the method of learning HSn - gram

NAGANO Takeshi, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

IPSJ SIG Notes　2003　(104)　35-40　2003/10/17
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

HSn-gram is a language model which extends an N-gram to Ergodic HMM. HSn-gram regards an N-gram as deterministic finite-state automata, and it extends the FSA into a non-deterministic finitestate automata by dividing each state into two or more states. A problem of learning HSn-gram is that estimation of the model is difficult, because the number of state and the number of state transition becomes large. In this paper, we propose a learning method of an HSn-gram that uses a set of parameters obtained from SSn-gram (the other HMM-based language model) as an initial parameter set. This method reduces the number of parameters, in order to cope with this problem. Consequently, the perplexity is reduced by 5% comparing to that normally learned HSn-gram.
Speech Recognition in Unstable Noise Environment using Multipath HMM

ITO Akinori, KISHIMA Tomonori, SUZUKI Motoyuki, MAKINO Shozo

IEICE technical report. Speech　103　(93)　1-6　2003/05/29
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

This paper describes a multi-path HMM for speech recognition under unstable noise environment (multi-noise-path HMM). This method concatenates multiple HMMs in parallel, which are learned from speech data under different noise environment. On decoding, the decoder chooses the most likely path among possible paths in the HMM. The multi-noise-path HMM can recognize speech under unstable noise environment, under which noise changes within one utterance. In the experiment, we used white-noise-based unstable noises. Multi-noise-path HMM learned from several white-noise-added speech was used for recognition. The experimental result unvailed that the performance of multi-noise-path HMM was almost equivalent to the matched model under stable noise environment, while the proposed model gave better result than other single-path model under unstable noise environment.
Validating significance of decoder parameters

ITO A., MAKINO S.

2003　(1)　147-148　2003/03/18

ISSN： 1340-3168
An investigation on multi-path HMM with duration control

OHKAWA Yuichi, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

日本音響学会研究発表会講演論文集　2003　(1)　1-2　2003/03/18

ISSN： 1340-3168
Evaluation and Analysis of Japanese Pronunciation uttered by Korean

KWEON Oh-Pyo, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

日本音響学会研究発表会講演論文集　2003　(1)　361-362　2003/03/18

ISSN： 1340-3168
Performance Evaluation of the Music Retrieval System using Plural Pitch Candidates

HEO S-P, SUZUKI M, ITO A, MAKINO S

日本音響学会研究発表会講演論文集　2003　(1)　847-848　2003/03/18

ISSN： 1340-3168
Construction of the Music Retrieval System using the Multiple Pitch Candidates

HEO Sungphil, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

IPSJ SIG Notes　2003　(16)　85-90　2003/02/21
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

Users do not sing accurately, especially if they are inexperienced or an accompanied ; even skilled musicians have difficultly in maintaining the correct pitch of a song. Moreover errors may occur when a musical retrieval system extracts pitch from humming. Consider of these problems, we propose to extract multiple pitch candidates. This method has shown that multiple pitch candidates are important features in determining melodic similarity, but it is also clear that reliability information which obtained from power is important as well. In the experiment, we compared to search efficiency of the similar system. Proposed method showed good retrieval result compared with the similar system.
A Study on Japanese Pronunciation Learning System for Korean Using Speech Recognition

KWEON Oh-pyo, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

IEICE technical report. Speech　102　(618)　19-24　2003/01/23
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

This paper describes a CALL (Computer Assisted Language Learning) system for teaching the pronunciation of Japanese to Korean speaker. First, Japanese sentences uttered by adult Korean speakers were evaluated by Japanese native speakers. Then, the Japanese Learning System based on the evaluation result was developed. Our CALL system asks the learner to read a sentence including minimal pairs. Speech recognition technology is used to compare Japanese speech uttered by a learner to the utterance by a native speaker and the system automatically calculates intelligibility scores which indicate the similarity between the learner's sneech and the standard Japanese native's speech. Furthermore, when the learner make pronunciation mistake the learner confirm his/her mispronunciation. The system also eives a proper instruction of the pronunciation.
Development of the Interactive and Robust Intelligent Patient Care System

HIROI Yutaka, SHOJI Michihiko, JEONG Seong Hee, KUDO Masaya, TAKAHASHI Ryosuke, KONASHI Takashi, TAJIMA Makoto, OBA Takanobu, CHEN Qiu, NAKANO Eiji, TAKAHASHI Takayuki, MAKINO Shozo, ITO Akinori, OHMI Tadahiro, KOTANI Koji, TAKATSU Nobuo, SUZUKI Motosyuki

The proceedings of the JSME annual meeting　2003　(0)　231-232　2003
Publisher: The Japan Society of Mechanical Engineers
DOI： 10.1299/jsmemecjo.2003.5.0_231 　

More details Close

An intelligent service robot named IRIS (Interactive, Robust and Intelligent Patient Care System) has been developed with the aim to be used mainly in a sickroom of hospital. IRIS is composed of the speaker direction identification system, the dialog system with the patient, the face recognition system, the safety manipulator and the omni-directional vehicle (ODV). It is able to recognize the patient's face, to dialogue with someone, and to execute some simple tasks such as serving a drink safely by request. The hardware system of IRIS is mainly presented in this paper.
An optimized multi-duration HMM for spontaneous speech recognition

Yuichi Ohkawa, Akihiro Yoshida, Motoyuki Suzuki, Akinori Ito, Shozo Makino

EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology　485-488　2003/01/01
A study on language model based on kana and kanji string

KINNO Hiroaki, KATOH Masaharu, KOSAKA Tetsuo, KOHDA Masaki, ITO Akinori

IPSJ SIG Notes　2002　(121)　165-170　2002/12/16
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

This paper describes a character-based n-gram language model. The proposed model is based on Kanji and Kana character instead of word or morpheme determined by morphemic analysis. To exploit stronger constraint, character strings are used in addition to single characters as basic units of the model. We examined two methods to choose character strings. One method is based on frequency in the training corpus, and the other is based on mutual information as well as the frequency. We carried out experiments to compare perplexities and character error rates (CER) between the proposed model and conventional (word or character based) n-gram model. The results showed that the mutual information based method gave the better performance. Although the proposed model was not superior to the word-based model, it was better than the character-based one. The vocabulary size of the proposed model was about 50% smaller than that of word-based model.
A study on language model based on kana and kanji string

KINNO Hiroaki, KATOH Masaharu, KOSAKA Tetsuo, KOHDA Masaki, ITO Akinori

IEICE technical report. Natural language understanding and models of communication　102　(528)　1-6　2002/12/13
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

This paper describes a character-based n-gram language model. The proposed model is based on Kanji and Kana character instead of word or morpheme determined by morphemic analysis. To exploit stronger constraint, character strings are used in addition to single characters as basic units of the model. We examined two methods to choose character strings. One method is based on frequency in the training corpus, and the other is based on mutual information as well as the frequency. We carried out experiments to compare perplexities and character error rates (CER) between the proposed model and conventional (word or character based) n-gram model. The results showed that the mutual information based method gave the better performance. Although the proposed model was not superior to the word-based model, it was better than the character-based one. The vocabulary size of the proposed model was about 50% smaller than that of word-based model.
Product Software of Continuous Speech Recognition Consortium -2001 version-

KAWAHARA T, SUMIYOSHI T, LEE A, BANNO H, TAKEDA K, MIMURA M, YAMADA T, NISHIURA T, ITOU K, ITO A, SHIKANO K

IPSJ SIG Notes　2002　(98)　13-18　2002/10/25
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

Continuous Speech Recognition Consortium (CSRC) was founded under IPSJ SIG-SLP for further enhancement of Japanese Dictation Toolkit that had been developed by the IPA project. An overview of the software developed in the secondyear (Oct. 2001 - Sep. 2002) is given in this report. The LVCSR (large vocabulary continuous speech recognition) engine Julius is ported to Windows and compliance with SAPI (Speech API). A variety of acoustic models are set up to cover wider user generations and speech-input environments. The software is currently available by contacting the address below.
A Speech Coding Method Using LZ Algorithm

KOHATA Minoru, MITSUYA Ikuya, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

IEICE technical report. Speech　102　(335)　7-12　2002/09/17
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

Most of speech coding parameters have temporal redundancy, which might be removed. This article presents a new speech coding method using Lempel-Ziv algorithm. The proposed method was applied to quantize LP coefficients at first, and it performed better than Split-VQ, MSVQ, and MA prediction VQ in rate-distortion criterion. Then, the proposed method was also used to quantize F0 and gain, and a coder at 1.9kbit/s was designed. The quality of the coded speech was almost compatible with the FS-MELP at 2.4kbit/s according to the subjective tests.
The utterance direction identification system using multiple microphones

TAJIMA Makoto, SUZUKI Motoyuki, ITO Akinori, MAKINO Shozo

IEICE technical report. Speech　102　(335)　19-24　2002/09/17
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

This paper describes a system to identify the direction of user's keyword utterance for autonomous mobile robot. The robot is activated by the user's keyword utterance and identifies the speaker by face recognition. To capture the speaker's face within the camera's viewangle, the robot first have to identify the approximate direction of the utterance using acoustic information. To achieve the above-mentioned task, this system identifies the direction of keyword utterance by multiple microphone within the range of 45 degree. As this system is built into the mobile robot, hardware requirement is very tight due to battery restriction and space factor restriction. Therefore we developed the system which doesn't need expensive calculation The system was evaluated by recall and precision using several thresholds. From the experimental results it is found that the length of the keyword dominates the absolute threshold value. Using mora-by-mora threshold, more than 80% recall and precision was obtained.
I-41 Extraction of The Motion Vector in The Motion Picture Using The Two-Dimensional Warping

Saito Atsuko, Suzuki Motoyuki, Ito Akinori, Makino Shozo

情報科学技術フォーラム一般講演論文集　2002　(3)　81-82　2002/09/13
Publisher: Forum on Information Technology
I-43 領域分割を用いたDPマッチングによるステレオ画像からの対応点検出(ステレオ・オプティカルフロー,I.画像認識・メディア理解)

倉本健介, 伊藤彰則, 鈴木基之, 牧野正三

情報科学技術フォーラム一般講演論文集　2002　(3)　85-86　2002/09/13
Publisher: FIT(電子情報通信学会・情報処理学会)運営委員会
English pronunciation learning system utilizing speaker adaptation by Japanese speech

ITO Akinori, NAGASAWA Tadao, SUZUKI Motoyuki, MAKINO Shozo

IEICE technical report. Speech　102　(159)　19-24　2002/06/20
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

This paper describes a computer-aided English learning system for Japanese speakers. The proposed system is composed of the following two subsystems: a pronunciation tutor to detect phoneme-level mispronunciations, and a prosody tutor which treats intonation and rhythm of the speech. The pronunciation tutor exploits VFS speaker adaptation technique to improve precision of phoneme labeling. On the adaptation, we developed a new adaptation scheme that used Japanese utterances to adapt English acoustic models. This method enables speaker adaptation for speakers who are not good at English pronunciation. The prosody tutor compares pitch pattern of native speakers' utterances and a student's one, and suggests how to improve intonation. In addition to intonation tutoring, the system compares duration of phrases between native speakers and a student. Evaluation experiments are carried out to compare native speakers' evaluation and the system's one against Japanese speakers' speech, and we obtained good correlation between the two evaluation, which shows that the proposed system can be as good teacher as native English speaker.
Evaluation the maximum entorpy based trigger language model

KISHIMOTO Yukinobu, KATOH Masaharu, ITO Akinori, KOHDA Masaki

2002　(1)　157-158　2002/03/18

ISSN： 1340-3168
Speech recognition based on kana and kanji string

KINNO H., KATOH M., ITO A., KOHDA M.

2002　(1)　155-156　2002/03/18

ISSN： 1340-3168
Evaluation of MLLR Adaptation for Dialog Speech Recognition

KATO Masaharu, ITO Akinori, KOHDA Masaki

2002　(1)　135-136　2002/03/18

ISSN： 1340-3168
Erratum: Language modeling by stochastic dependency grammar for Japanese speech recognition (Systems and Computers in Japan (November 15, 2001) 32:12 (10-15))

Ito, A., Hori, C., Katoh, M., Kohda, M.

Systems and Computers in Japan　33　(3)　74-74　2002/03/01

DOI： 10.1002/scj.1115 　

ISSN： 0882-1666
Continuous speech recognition consortium -An open repository for CSR tools and models

Akinobu Lee, Tatsuya Kawahara, Kazuya Takeda, Masato Mimura, Atsushi Yamada, Akinori Ito, Katsunobu Itou, Kiyohiro Shikano

Proceedings of the 3rd International Conference on Language Resources and Evaluation, LREC 2002　1438-1441　2002/01/01
Piecewise linear two-dimensional warping

Akinori Ito, Chiori Hori, Masaharu Katoh, Masaki Kohda

Systems and Computers in Japan　32　(12)　1-9　2001/11/15

DOI： 10.1002/scj.1072 　

ISSN： 0882-1666
Product Software of Continuous Speech Recognition Consortium -2000 version-

KAWAHARA T, SUMIYOSHI T, LEE A, TAKEDA K, MIMURA M, ITO A, ITOU K, SHIKANO K

IPSJ SIG Notes　2001　(100)　37-42　2001/10/19
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

Continuous Speech Recognition Consortium(CSRC) was founded last year under IPSJ SIG-SLP for forther enhancement of Japanese Dictation Toolkit that had been developed by the IPA project. An overview of the software developed in the first year(Oct. 2000-Sep. 2001) is given in this report. We have revised the LVCSR(large vocabulary continuous speech recognition) engine Julius, and constructed new acoustic models using very large speech corpora. Moreover, a. variety of acoustic and language models as well as toolkits are being set up. The software is currently available by contacting the address below.
Performance improvement of LVCSR using vocal tract length normalization

FUJITA Daisuke, KATOH Masaharu, ITO Akinori, KOHDA Masaki

2001　(2)　3-4　2001/10/01

ISSN： 1340-3168
A Statistical Language Modeling Toolkit for word and class n-gram.

ITO A., KOHDA M.

2001　(1)　77-78　2001/03/01

ISSN： 1340-3168
Japanese Dictation Toolkit -1999 version-

Tatsuya Kawahara, Akinobu Lee, Tetsunori Kobayashi, Kazuya Takeda, Nobuaki Minematsu, Shigeki Sagayama, Katsunobu Itoh, Akinori Ito, Mikio Yamamoto, Atsushi Yamada, Takehito Utsuro, Kiyohiro Shikano

J. Acoustical Society of Japan　57　(3)　210-214　2001/03/01
Publisher: The Acoustical Society of Japan
DOI： 10.20697/jasj.57.3_210 　

ISSN： 0369-4232
New state clustering of hidden markov network with Korean phonological rules for speech recognition

SJ Oh, HY Chung, CJ Hwang, BK Kim, A Ito

2001 IEEE FOURTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING　39-44　2001
Optimization of the Parameter Set for Word Graph Generation

KATOH Masaharu, SAIIN Toshinori, ITO Akinori, KOHDA Masaki

IPSJ SIG Notes　2000　(119)　107-112　2000/12/21
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

Language model weight and insertion penalty greatly affects the recognition preformance of LVCSR system. In the multi-pass LCVSR system that uses word graphas an intermediate data structure, theses decorder parameter should be optimized in order to generate a good word graph. We proposed the rescoring based method that uses bigram LM insted of generating many word graphs for each parameter setting. As the rescoring is much faster than the re-generation of a word graph, the optimization time of the proposed method is much shorter. In this paper, we tested proposed method on Japanese News Article Sentences(ASJ-JNAS). When obtaied enough development data, the recognition performance is improved.
Statistical Language Model Toolkit for Word and Class N-gram

ITO Akinori, KOHDA Masaki

IEICE technical report. Natural language understanding and models of communication　100　(521)　67-72　2000/12/15
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

This paper describes a statistical language model toolkit for word and class-based n-gram. This toolkit has command-level compatibility with CMU-Cambridge SLM Toolkit, and supports ARPA-style language model. Furthermore, the toolkit supports class n-gram and n-gram count mixture as well as combined language model using linear interpolation. As the language model combination is supported within the API level, the SLM library in this toolkit enables any tool to exploit the LM combination. To demonstrate the potential of the toolkit, several language models are created from six-year Mainichi Shimbun database. We evaluated verious combination of word n-gram and POS n-gram, and we found that the combination of word trigram and POS trigram reasonably improves the perplexity.
Optimization of the Parameter Set for Word Graph Generation

KATOH Masaharu, SAIIN Toshinori, ITO Akinori, KOHDA Masaki

IEICE technical report. Natural language understanding and models of communication　100　(520)　107-112　2000/12/14
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

Language model weight and insertion penalty areatly affects the recognition preformance of LVCSR system. In the multi-pass LCVSR system that uses word graphas an intermediate data structure, theses decorder parameter should be optimized in oeder to generate a good word graph. We proposed the rescoring based method that uses bigram LM insted of generating many word graphs for each parameter setting. As the rescoring is much faster than the re-generation of a word graph, the optimization time of the proposed method is much shorter. In this paper, we tested proposed method on Japanese News Article Sentences (ASJ-JNAS). When obtaied enough development data, the recognition performance is improved.
Changes in fruit quality as influenced by shading of netted melon plants (Cucumis melo L. 'Andesu' and 'Luster')

Nishizawa, T., Ito, A., Motomura, Y., Ito, M., Togashi, M.

Journal of the Japanese Society for Horticultural Science　69　(5)　563-569　2000/10/26

DOI： 10.2503/jjshs.69.563 　

ISSN： 1882-3351
Optimaization of the parameter set for word graph generation

KATOH Masaharu, SAIIN Toshinori, ITO Akinori, KOHDA Masaki

2000　(2)　33-34　2000/09/01

ISSN： 1340-3168
w3m: a pager/text-based WWW browser

Akinori Ito

bit　32　(9)　28-33　2000/09
Publisher: Kyoritsu Shuppan Co. Ltd.
ISSN： 0385-6984
A Study on MLLR-Based Speaker Models Using for Speaker Verification

KATOH Masaharu, KANOU Junya, ITO Akinori, KOHDA Masaki

IEICE technical report. Speech　100　(137)　25-32　2000/06/16
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

In this paper, we described text prompted speaker verification system. In this system, speaker specific models are trained by the Maximum Likelihood Linear Regression based adaptation that is used speech recognition. Regression classes are designed as a tree structure and selected automatically based upon the size of training data. We compared the following two criteria for cluster selection-the amount of frames and the Minimum Description Length(MDL)principle. And, we research the MAP adaptation with them. Experimental results show that applying the MAP adaptation after the MLLR-MDL adaptation is significant improvement on the verification performance. We also apply the SAT compact models insted of SI models. The SAT compact model is better when training data and testing data are recorded in different sessions.
Language Modeling by an Ergodic HMM based on an N-gram

ITO Akinori, SAITO Hideki, KATOH Masaharu, KOHDA Masaki

IEICE technical report. Speech　100　(137)　67-74　2000/06/16
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

This paper proposes a new language model based on an Ergodic HMM. This model is created by extending a deterministic finite state automaton equivalent to an n-gram into a nondeterministic one. We call the proposed model "Hidden State N-gram(HS-ngram)." We carried out experiments to compare the perplexity of n-gram and that of HS-ngram. The result showed that the proposed models(SH-bigram and HS-trigram)gave lower perplexity than the original model. From the rescoring experiment from the recognition result of an LVCSR system, HS-trigram slightly outperformed trigram model.
Optimization of language model weight and insertion penalty for word graph generation

SAIIN Toshinori, KATOH Masaharu, ITO Akinori, KOHDA Masaki

IEICE technical report. Speech　100　(137)　75-82　2000/06/16
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

Language model weight and insertion penalty greatly affects the recognition performance of LVCSR system. In the multi-pass LVCSR system that uses word graph as an intermediate data structure, these decoder parameters should be optimized in order to generate a good word graph. In this paper, a new method to optimize these parameter is proposed. This method uses rescoring of the word graph using bigram LM instead of generating many word graphs for each parameter setting. As the rescoring is much faster than the re-generation of a word graph, the optimization time of the proposed method is much shorter than that of re-generation based one. However, as the method minimizes the first-pass WER, improvement of the second-pass WER is not garanteed. From the experimental result for the newspaper task, it is found that the proposed method doesn't only improve the first-pass WER but also improves the second-pass WER in most case.
Evaluation of Japanese Dictation Toolkit : 1999 version

Kawahara T, Lee A, Kobayashi T, Takeda K, Minematsu N, Sagayama S, Itou K, Ito A, Yamamoto M, Yamada A, Utsuro T, Shikano K

IPSJ SIG Notes　2000　(54)　9-16　2000/06/02
Publisher: 一般社団法人情報処理学会
ISSN： 0919-6072

More details Close

A sharable software repository for Japanese LVCSR (Large Vocabulary Continuous Speech Recognition) is introduced. It has been developed under collaboration of researchers of different academic institutes in Japan. The platform consists of a standard recognition engine, Japanese phone models and Japanese statistical language models as well as Japanese morphological analysis tools. As an integrated system of these modules, we have implemented a baseline 20000-word and 60000-word dictation system and evaluated various components. The software repository is freely available to the public.
Language modeling using ergodic HMM based on trigram

SAITOH Hideki, KATOH Masaharu, ITO Akinori, KOHDA Masaki

2000　(1)　51-52　2000/03/01

ISSN： 1340-3168
Optimization of language model weight and insertion penalty for word graph generation

SAIIN Toshinori, OKA Naoki, KATOH Masaharu, ITO Akinori, KOHDA Masaki

2000　(1)　47-48　2000/03/01

ISSN： 1340-3168
Task adaptation using part-of-speech tag and high frequency word N-gram

OGASAWARA Norimitsu, KATOH Masaharu, ITO Akinori, KOHDA Masaki

2000　(1)　75-76　2000/03/01

ISSN： 1340-3168
A study on MDL criterion based regression cluster setting for MLLR adaptation

KANOU Junya, KATOH Masaharu, ITO Akinori, KOHDA Masaki

2000　(1)　103-104　2000/03/01

ISSN： 1340-3168
Language modeling by stochastic dependency grammar for Japanese speech recognition

Akinori Ito, Chiori Hori, Masaharu Kotow, Masaki Kohda

6th International Conference on Spoken Language Processing, ICSLP 2000　2000/01/01
IPA Japanese dictation free software project

Katsunobu Itou, Kiyohiro Shikano, Tatsuya Kawahara, Kazuya Takeda, Atsushi Yamada, Akinori Ito, Takehito Utsuro, Tetsunori Kobayashi, Nobuaki Minematsu, Mikio Yamamoto, Shigeki Sagayama, Akinobu Lee

2nd International Conference on Language Resources and Evaluation, LREC 2000　2000/01/01
Free software toolkit for Japanese large vocabulary continuous speech recognition

Tatsuya Kawahara, Akinobu Lee, Tetsunori Kobayashi, Kazuya Takeda, Nobuaki Minematsu, Shigeki Sagayama, Katsunobu Itou, Akinori Ito, Mikio Yamamoto, Atsushi Yamada, Takehito Utsuro, Kiyohiro Shikano

6th International Conference on Spoken Language Processing, ICSLP 2000　2000/01/01
Study on Large Vocabulary Continuous Speech Recognition with a phoneme graph based hypothesis restriction

OKA Naoki, KATOH Masaharu, ITO Akinori, KOHDA Masaki

IEICE technical report. Natural language understanding and models of communication　99　(524)　67-72　1999/12/21
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685
Study on Large Vocabulary Continuous Speech Recognition with a phoneme graph based hypothesis restriction

OKA Naoki, KATOH Masaharu, ITO Akinori, KOHDA Masaki

IPSJ SIG Notes　1999　(108)　199-204　1999/12/20
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

In this paper, we study about fast search strategies for large vocabulary continuous speech recognition (LVCSR). Many fast search strategies have been proposed until. In [2], we proposed a new search strategy with a phoneme graph based hypothesis retriction, which efficiently reduces the search space. For 5000-word task, exprimental results showed that the method can reduce 70 % of the elapsed time without any error increasing. For further faster search, we incorporated 1-phoneme look-ahead technique into phoneme graph generation. We evaluate the proposed method with 20000-word Japanese newspaper task. Expremental results show that the method can reduce about 60 % of the elapsad time without error rate increasing.
A study on MLLR adapted speaker model for speaker verification

KANOU Junya, KATOH Masaharu, ITO Akinori, KOHDA Masaki

IPSJ SIG Notes　1999　(108)　55-60　1999/12/20
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

In this paper, we propose a method to make automatically the regression cluster corresponding to the amount of adaptation data by MDL criterion. Claimant speaker models are made by MLLR adaptation. To increase the number of regression clusters, we use a tree structure. It is made with top-down clustering based on acoustic distance. The MDL criterion is compared with the frame threshold criterion and fixed regression clusters criterion. In the experiment on the text-prompted speaker verification, MDL criterion becomes the repression of cluster division, and the most suitable number of cluster corresponding to the amount of adaptation data is chosen.
A study on MLLR adapted speaker model for speaker verification

KANOU Junya, KATOH Masaharu, ITO Akinori, KOHDA Masaki

IEICE technical report. Natural language understanding and models of communication　99　(523)　55-60　1999/12/20
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685
Fast and Robust Optimization of Language Model Weight and Insertion Penalty from N - best Candidates

ITO Akinori, KOHDA Masaki

IPSJ SIG Notes　1999　(91)　35-40　1999/10/29
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

An LVCSR system has many parameters to be optimized. In this paper, we investigate several issues about language model weight and word insertion penalty. From recognition results obtained by changing these parameters, we found three important observations. The first one was that the optimum point of these parameter values depended to the test set for the optimization. The second one was that the parameter space had many local optimum, which meant that one had to try all points in the parameter space to find the global optimum point. The third one was that the potential increment of WER in suboptimum region of the parameter space was about 2%. Based on these observations, We propose three new methods to optimize language model weight and insertion penalty. Firstly, a new method is proposed to preselect n-best candidates for n-best rescoring based parameter optimization. Secondly, a method to choose robust parameter setting is proposed. This method splits a development test set into several sets. Accoding to the optimization results for each set, This method choosed the optimum point by considering the average of WER as well as its variances. Finally, a method to find sub-optimum parameter setting is proposed. This optimization is based on neighborhood search, and it finds a parameter setting rapidly.
A Report on Eurospeech99 and IEEE Multimedia Signal Processing Workshop

NAKAMURA Satoshi, OKAWA Shigeki, ITOH Akinori, TAMOTO Masafumi, MIZUNO Hideyuki, UNOKI Masashi, TOKUDA Keiichi, KABURAGI Tokihiko, HATAOKA Nobuo

IPSJ SIG Notes　1999　(91)　21-28　1999/10/29
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

This paper summarizes the topics in ESCA Eurospeech99 held at Budapest, Hungary, from Sep. 5 to Sep. 9, 1999 and in IEEE Multimedia Signal Processing Workshop held at Helsinger, Denmark, from Sep. 13 to Sep. 15, 1999.
A study on MLLR adapted speaker model for speaker verification

KANOU Junya, KATOH Masaharu, ITO Akinori, KOHDA Masaki

1999　(2)　49-50　1999/09/01

ISSN： 1340-3168
Fast optimization of language model weight and insertion penalty using n-best candidate

ITO A., KOHDA M.

1999　(2)　65-66　1999/09/01

ISSN： 1340-3168
A Study on Increase of Performance Based on Combine Multiple Recognizer Output

KATOH Masaharu, ITO Akinori, KOHDA Masaki

1999　(2)　85-86　1999/09/01

ISSN： 1340-3168
A new metric language model evaluation based on likelihood gain

ITO A., KOHDA M.

1999　(2)　73-74　1999/09/01

ISSN： 1340-3168
Language modeling using ergodic HMM based on bigram

SAITOH Hideki, ITO Akinori, KATOH Masaharu, KOHDA Masaki

1999　(2)　101-102　1999/09/01

ISSN： 1340-3168
A metric based on likelihood difference for n-gram language model evaluation

ITO Akinori, KOHDA Masaki, OSTENDORF Mari

IEICE technical report. Speech　99　(121)　95-102　1999/06/18
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

Perplexity has been widely used as an evaluation metric of stochastic language model. Recently, several papers reported that correlation between perplexity and word error rate was poor when complicated language models were used, such as mixture model. In this paper, a new metric for n-gram language model is proposed, that is intended to substitute perplexity. The major difference of the proposed metric from perplexity is that, while perplexity utilizes probabilities of word occurences in the evaluation text, the proposed metric accumulates differences of linguistic scores between a word in the evaluation text and the maximum score available in that context. A sigmoid-like nonlinear function is applied to the score difference and the average of that values is calculated. Applying the nonlinear function suppresses the effect of language score difference that does not affect word errer rate improvement. Correlation between the proposed metric and word accuracy was investigated for a speech recognition simulator and real speech recognizer. The result proved that the proposed metric had higher correlation between word accuracy than perplexity.
Construction and Evaluation of Language Models Based on Stochastic Context Free Grammar for Speech Recognition

HORI Chiori, KATOH Masaharu, ITO Akinori, KOHDA Masaki

IEICE technical report. Speech　99　(121)　79-86　1999/06/18
Publisher: The Institute of Electronics, Information and Communication Engineers
ISSN： 0913-5685

More details Close

It is well known that Stochastic Context Free Grammar (SCFG) is a very effective language model since it can express not only local constraints like an N-gram, but also global constraints over a whole sentence. However, to estimate parameters of an SCFG, the Inside-Outside algorithm has to be used, which needs huge computation in proportion to the cude of the number of non-terminal symbols and the length of the input sequences. Therefore, the SCFG has rarely been used for speech recognition. In this paper, we propose a new SCFG to which phrasebased dependency grammar is applied to decrease the huge computation. In the test using the EDR corpus, we compared the proposed model with the other types of SCFGs in terms of perplexity and computational amount. We constructed a large-scale SCFG using the Mainichi news corpus, and compared it with trigram for a 5,000-word Japanese newspaper reading task.
Japanese Dictation Toolkit -1997 version-

Tatsuya Kawahara, Akinobu Lee, Tetsunori Kobayashi, Kazuya Takeda, Nobuaki Minematsu, Katsunobu Itoh, Akinori Ito, Mikio Yamamoto, Atsushi Yamada, Takehito Utsuro, Kiyohiro Shikano

J. Acoustical Society of Japan　55　(3)　175-180　1999/03/01
Publisher: The Acoustical Society of Japan
DOI： 10.20697/jasj.55.3_175 　

ISSN： 0369-4232
Japanese Dictation Toolkit -1997 version

Tatsuya Kawahara, Akinobu Lee, Tetsunori Kobayashi, Kazuya Takeda, Nobuaki Minematsu, Katsunobu Itou, Akinori Ito, Mikio Yamamoto, Atsushi Yamada, Takehito Utsuro, Kiyohiro Shikano

Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi)　20　(3)　233-239　1999

DOI： 10.1250/ast.20.233 　

ISSN： 0388-2861
A Study on A Phoneme-Graph-based Hypothesis Restriction for Large Vocabulary Continuous Speech Recognition

HORI Takaaki, OKA Naoki, KATOH Masaharu, ITO Akinori, KOHDA Masaki

IEICE technical report. Natural language understanding and models of communication　98　(461)　25-32　1998/12/11
Publisher: The Institute of Electronics, Information and Communication Engineers

More details Close

In this paper, we study about fast search strategies for Large Vocabulary Continuous Speech Recognition(LVCSR), and propose a new method-a phoneme-graph-based hypothesis restriction, which effectually prunes the search space. In the proposed method, a phoneme graph is generated at the pre-processing stage, and then the best word sequence is searched while restricting expansion of hypotheses using the information of the phoneme graph at the main recognition stage. The phoneme-graph-based restriction consists of the limitation of phoneme boundaries and the Forward-Backward Pruning, which enable to reduce the search space dramatically. The proposed method was tested on a 5,000-word Japanese newspaper reading task. The experimental results show that this method can reduce about 70% of the elapsed time without any error increasing.
A Study on A Phoneme -Graph- based Hypothesis Restriction for Large Vocabulary Continuous Speech Recognition

HORI Takaaki, OKA Naoki, KATOH Masaharu, ITO Akinori, KOHDA Masaki

IPSJ SIG Notes　1998　(114)　113-120　1998/12/10
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

In this paper, we study about fast search strategies for Large Vocabulary Continuous Speech Recognition(LVCSR), and propose a new method-a phoneme-graph-based hypothesis restriction, with effectually prunes the search space. In the proposed method, a phoneme graph is generated at the pre-processing stage, and then the best word sequence is searched while restricting expansion of hypotheses using the information of the phoneme graph at the main recognition stage. The phoneme-graph-based restriction consists of the limitation of phoneme boundaries and the Forward-Backward Pruning, which enable to reduce the search space dramatically. The proposed method was tested on a 5, 000-word Japanese newspaper reading task. The experimental results show that this method can reduce about 70% of the elapsed time without any error increasing.
A study on a large vocabulary continuous speech recognition system with a state clustering-based HM-Net

HORI Takaaki, OKA Naoki, KATOH Masaharu, ITO Akinori, KOHDA Masaki

1998　(2)　95-96　1998/09/01

ISSN： 1340-3168
Evaluation of N-gram language models trained on newspaper corpus by speech recognition experiments

KAMEYAMA Yoshihiro, KATOH Masaharu, ITO Akinori, KOHDA Masaki

1998　(2)　73-74　1998/09/01

ISSN： 1340-3168
ここまでできるぞ音声/言語処理技術 : 音声編

新田恒雄, 小林哲則, 鹿野清宏, 武田一哉, 河原達也, 伊藤克亘, 峯松信昭, 伊藤彰則, 宇津呂武仁, 山本幹雄, 山田篤, 西村雅史, 甲斐充彦, 中川聖一, 服部浩明, 阿部匡伸, 松浦博

情報処理学会研究報告. SLP, 音声言語情報処理　98　(49)　9-16　1998/05/28
Publisher: 社団法人情報処理学会
ISSN： 0919-6072

More details Close

マルチメディア時代が到来し, 様々なサービス提供が始まっている。本報告では, 今後, ますます重要性を増す音声インタフェース技術に焦点をあて, 音声認識および音声合成を中心とした最新技術を紹介している。内容は, 音声認識技術として, 日本語ディクテーションソフトウエア, Web検索ソフトウエア, 大語彙音声認識チップを, また音声合成技術として, 音声コンテンツ制作支援ツール, テキスト-音声変換ソフトウエアから成る。
SIG - SLP/SIG - NL Joint Session "Recent Advances in Speech and Language Processing Technologies" -Speech Processing Technologies-

NITTA Tsuneo, KOBAYASHI Tetsunori, SHIKANO Kiyohiro, TAKEDA Kazuya, KAWAHARA Tatsuya, ITOU Katunobu, MINEMATSU Nobuaki, ITO Akinori, UTSURO Takehito, YAMAMOTO Mikio, YAMADA Atsushi, NISHIMURA Masafumi, KAI Mitsuhiko, NAKAGAWA Seiichi, HATTORI Hiroaki, ABE Masanobu, MATSU'URA Hiroshi

IPSJ SIG Notes　1998　(48)　9-16　1998/05/28
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

Computer-human interaction by voice is one of the most important technology in the coming multimedia era. In this report, we introduce recent advances in speech processing technologies through focussing both of speech recognition and speech synthesis. Contents are : Japanese dictation software, Web-page retrieval software, large-vocabulary speech recognition chips, a speech editing tool for designing multimedia applications, and TTS (Text-To-Speech) software for PCs.
Evaluation of Japanese Dictation ToolKit -1997 version-

KAWAHARA Tatsuya, LEE Akinobu, KOBAYASHI Tetsunori, TAKEDA Kazuya, MINEMATSU Nobuaki, ITOU Katsunobu, ITO Akinori, YAMAMOTO Mikio, YAMADA Atsushi, UTSURO Takehito, SHIKANO Kiyohiro

IPSJ SIG Notes　1998　(48)　109-114　1998/05/28
Publisher: Information Processing Society of Japan (IPSJ)

More details Close

The project of developing LVCSR (Large Vocabulary Continuous Speech Recognition) platform is introduced. It is a collaboration of researchers of different academic institutes and intended to develop a sharable software repository of not only databases but also models and programs. The platform consists of a standard recognition engine, Japanese phone models and Japanese statistical language models. As an integrated system of these modules, we have implemented a baseline 500-word dictation system and evaluated various components. The software repository is available to the public.
Evaluation of N-gram task adaptation by speech recognition simulation

ITO Akinori, KOHDA Masaki

1998　(1)　43-44　1998/03/01

ISSN： 1340-3168
Effect of Cut-off and Learning Text on The Language Model from The Newspaper Corpus

KAMEYAMA Yoshihiro, KATOH Masaharu, ITO Akinori, KOHDA Masaki

1998　(1)　49-50　1998/03/01

ISSN： 1340-3168
A Study on Word Spotting based on Likelihood Normalization Using Phoneme HMMs

KATOH Masaharu, HORI Takaaki, ITO Akinori, KOHDA Masaki

IEICE technical report. Natural language understanding and models of communication　97　(440)　9-14　1997/12/12
Publisher: The Institute of Electronics, Information and Communication Engineers

More details Close

In recent speech recognition, hidden Markov model (HMM) has been useful. We consider likelihood score of HMMs from a point of theory of probability. In continuous speech recognition, each hypothesis will have different length and position of speech segment. It affects the system performance by comparing the HMMs' scores directly. In this paper, we describe normalization of likelihood based on Bayes' theorem. To normalize likelihood, we use connected phoneme HMMs that allow Japanese syllable rule. In this method, we need no additional calculation to get scores, and we need no models except phoneme HMMs to the system. We apply it to the word-spotting, and obtain significant improvement of system performance.
A Study on A State Clustering - Based Topology Design Method for HM - Nets

HORI Takaaki, KATOH Masaharu, ITO Akinori, KOHDA Masaki

IPSJ SIG Notes　1997　(120)　47-52　1997/12/11
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

A Hidden Markov Network (HM-Net) is a highly accurate and robust acoustic model which represents a tied-state structure of context dependent Hidden Markov Models as a network. A Successive State Splitting (SSS) method and its improved ones have been already proposed to generate HM-Nets. However, there is a common problem in these algorithms. The problem is that much amount of computation is required when large amount of training data is used, because state splitting and parameter estimation are repeated using the training data. Although topologies of HM-Nets are usually designed with a part of training data and then only their output density distributions are estimated with all of the data, HM-Nets with large-scale topologies for large vocabulary continuous speech recognition (LVCSR) cannot be derived. In this paper, we propose a state clustering-based rapid topology design method to generate high accuracy HM-Nets for LVCSR. In continuous phoneme recognition experiments, it is shown that the proposed method is a fast algorithm and can generate HM-Nets equivalent to ones designed by conventional methods when the same training data is used.
Common Platform of Japanese Large Vocabulary Continuous Speech Recognition Research -Speech Recognizer Design-

KAWAHARA Tatsuya, LEE Akinobu, ITOU Katsunobu, KOBAYASHI Tetsunori, ITO Akinori, UTSURO Takehito, SHIMIZU Toru, TAMOTO Masafumi, ARAI Kazuhiro, MINEMATSU Nobuaki, YAMAMOTO Mikio, TAKEZAWA Toshiyuki, TAKEDA Kazuya, MATSUOKA Tatsuo, SHIKANO Kiyohiro

IPSJ SIG Notes　1997　(101)　1-6　1997/10/24
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

For Japanese large vocabulary continuous speech recognition (LVCSR) research, we are developing standard baseline software repository that includes language models, acoustic models and recognition engines. In this report, specifications and algorithms of the speech recognizer currently designed are described.
Common Platform of Japanese Large Vocabulary Continuous Speech Recognition Research -Developement of text corpus-

ITOU Katunobu, ITO Akinori, UTSURO Takehito, KAWAHARA Tatsuya, KOBAYASHI Tetsunori, SHIMIZU Toru, TAMOTO Masafumi, ARAI Kazuhiro, MINEMATSU Nobuaki, YAMAMOTO Mikio, TAKEZAWA Toshiyuki, TAKEDA Kazuya, MATSUOKA Tatsuo, SHIKANO Kiyohiro

IPSJ SIG Notes　1997　(101)　7-12　1997/10/24
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

For Japanese large vocabulary continuous speech recognition (LVCSR) research, we are developing standard baseline software repsitory that includes language models, acoustic models and recognition engines. In this report, design and specification of the text corpus are described.
Common Platform of Japanese Large Vocabulary Continuous Speech Recognition Research -Construction of Acoustic Model-

TAKEDA Kazuya, MINEMATSU Nobuaki, ITO Akinori, ITOU Katsunobu, UTSURO Takehito, KAWAHARA Tatsuya, KOBAYASHI Tetsunori, SHIMIZU Toru, TAMOTO Masafumi, ARAI Kazuhiro, YAMAMOTO Mikio, TAKEZAWA Toshiyuki, MATSUOKA Tatsuo, SHIKANO Kiyohiro

IPSJ SIG Notes　1997　(101)　13-18　1997/10/24
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

For Japanese large vocabulary continuous speech recognition (LVCSR) research, we are developing standard baseline software repository that includes language models, acoustic models and recognition engines. In this report, construction of acoustic models is discussed.
A Study of Language Modeling using Stochastic Context Free Grammar with "Dependency Grammar"

YAGINUMA Masanobu, KATOH Masaharu, ITO Akinori, KOHDA Masaki

IEICE technical report. Natural language understanding and models of communication　97　(330)　33-40　1997/10/17
Publisher: The Institute of Electronics, Information and Communication Engineers

More details Close

In this paper, we propose a language model using stochastic context free grammar (SCFG) for speech recognition. To train an SCFG, inside-outside (I/O) algorithm is used. We modified I/O algorithm to treat with dependency grammar. To express dependency grammar, two categories of word are introduced : functional words (particle, auxiliary, suffix, etc.) and content words (noun, verb, adjective, etc.). Using dependency grammar, training time is reduced from the cube of number of nonterminal symbols to the square of it. We carried out an experiment to compare the proposed method with two conventional methods : the trigram model and the original SCFG model. We obtained significant reduction of training time compared with the original SCFG. The perplexity of the proposed model was smaller than other two models. Furthermore, we researched initial values to reduce training time andimprove performance.
On the effect of vocabulary size on N-gram task adaptation

ITO Akinori, KOHDA Masaki

1997　(2)　61-62　1997/09/01

ISSN： 1340-3168
Study of initial values for language modeling using stochastic context free grammar

YAGINUMA Masanobu, KATOH Masaharu, ITO Akinori, KOHDA Masaki

1997　(2)　51-52　1997/09/01

ISSN： 1340-3168
A Study on Word Spotting using Phoneme HMMs based Likelihood Normalization

KATOH Masaharu, HORI Takaaki, ITO Akinori, KOHDA Masaki

1997　(2)　79-80　1997/09/01

ISSN： 1340-3168
On the vocabulary size for N-gram task adaptation

ITO Akinori, KOHDA Masaki

IEICE technical report. Speech　97　(115)　51-58　1997/06/20
Publisher: The Institute of Electronics, Information and Communication Engineers

More details Close

While N-gram language model requires large corpus for good probability estimation, it is often difficult to gather large number of samples for a specific task domain. This paper describes task adaptation technique to make N-gram model for the specific domain from a task independent large corpus (TI text) and a task specific small corpus (AD text). Simple weighted mixture is employed to mix two corpora. This paper first points out the relationship between weighted mixture method and MAP/Bayes eatimation. Next, the effect of vocabulary restriction is investigated. As the TI text has many words which don't appear in the object task, perplexity of the model decreases by replacing these words to "unknown" symbol. In this paper, it is shown that perplexity of the model can be reduced by the vocabulary restriction and the vocabulary sizes of TI and AD texts must be determined individually.
Reading Japanese Corpus using N-gram.

ITO A, MANZAKI H, KATOH M, KOHDA M

1997　(1)　9-10　1997/03/01

ISSN： 1340-3168
A Study on Improvement of HM-Nets using Decision Tree-based Successive State Splitting

HORI Takaaki, KATOH Masaharu, ITO Akinori, KOHDA Masaki

IEICE technical report. Natural language understanding and models of communication　96　(420)　17-24　1996/12/13
Publisher: The Institute of Electronics, Information and Communication Engineers

More details Close

The important aspects of context-dependent acoustic modeling using a limited training data set for robust speech reognition are how to tie the model parameters and how to handle the unknown contexts. From this point of view, we proposed the Decision Tree-based Successive State Splitting algorithm (DT-SSS), and showed HM-Nets generated with this algorithm had high accuracy and enabled to represent any contexts. But this algorithm was not taken temporal splits into consideration, and therefore did not make the best use of the strong point of SSS. In this paper, we incorporate temporal splits into DT-SSS and generate HM-Nets from various initial models. In continuous phoneme reognition experiments, we show the effects of these improvements.
A study on word preselection using HMM state sequence

KATOH Masaharu, HORI Takaaki, ITO Akinori, KOHDA Masaki

1996　(2)　87-88　1996/09/01

ISSN： 1340-3168
Study on the adaptation of a stochastic language model using small corpus.

ITO Akinori, KOHDA Masaki

1996　(2)　37-38　1996/09/01

ISSN： 1340-3168
A Study on HM-Net using Successive State Splitting based on Phonetic Decision Tree

HORI Takaaki, KATOH Masaharu, ITO Akinori, KOHDA Masaki

IEICE technical report. Speech　96　(93)　15-22　1996/06/14
Publisher: The Institute of Electronics, Information and Communication Engineers

More details Close

The important aspects of context-dependent acoustic modeling using a limited training data set for robust speech recognition are how to tie the model parameters and how to handle the unknown contexts. The Successive State Splitting algorithm (SSS) is a good method which design the topology of tied-state HMMs automatically, but it doesn't cover unknown contexts adequately and also has some problems in the contextual splits. In this paper, we propose a new SSS algorithm which includes the contextual splits based on the phonetic decision tree. This method is able to generate high accurate HM-Nets which can represent any contexts. In continuous phoneme recognition experiments, it is shown that the proposed method is effective.
A Study on Utilizing to Word Preselection Using Optimal Phonemes Sequence

KATOH Masaharu, ITO Akinori, KOHDA Masaki

IEICE technical report. Speech　96　(92)　9-14　1996/06/13
Publisher: The Institute of Electronics, Information and Communication Engineers

More details Close

In this paper, a fast word preselection method is proposed for HMM-based word recognition. In this method, candidate words are selected using phoneme-based matching. First, phoneme recognition is carried out on the input speech and an optimal phoneme sequence is recognized. Then phoneme DP matching is executed to choose word candidates. Finally, the word candidates are verified frame-by-frame using subword HMM. We evaluated the proposed method by 15,000 word recognition experiment. When 150 word candidates (1% of total vocabulary) were selected, the omission rate was less than 1%. Compared with full search algorithm, the proposed method took only 4.6% of CPU time and the number of comparison operation was 8.6%. We also carried out an experiment which investigated the performance of the method with simplified HMM.
N - gram estimation from Japanese large corpus and task adaptation of N - gram

ITO Akinori, DAISHIMA Naoto, MARUYAMA Atsushi, KATOH Masaharu, KOHDA Masaki

IPSJ SIG Notes　1996　(55)　25-30　1996/05/27
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

N-gram language models were constructed from EDR corpus, 5-million-word Japanese corpus. The models were investigated under various conditions about training text size, vocabulary and cut-off condition. The result of the experiments clarified the optimum condition under a certain training text size. We carried out another experiments about task adaptation. An N-gram model from a dialog wag mixed with the N-gram from EDR corpus, which made about 60% reduction of perplexity
N-gram Language Model by String Pattern and Pattern Class

Ito Akinori, Kohda Masaki

Proceedings of the IEICE General Conference　1996　(1)　345-346　1996/03/11
Publisher: The Institute of Electronics, Information and Communication Engineers
A study on utilizing to preprosess using optimal phonemes sequence.

KATOH Masaharu, ITO Akinori, KOHDA Masaki

1996　(1)　79-80　1996/03/01

ISSN： 1340-3168
Language Modelling by String Pattern and Pattern Class N-gram.

ITO Akinori, KOHDA Masaki

1996　(1)　193-194　1996/03/01

ISSN： 1340-3168
対話音声認識のための事前タスクの適応の検討

伊藤彰則

信学技報,SP96-81　1996
The performance prediction on sentence recognition using a finite state word automaton

T Otsuki, A Ito, S Makino, T Ohtomo

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E79D　(1)　47-53　1996/01

ISSN： 0916-8532
Language modeling by string pattern N-gram for Japanese speech recognition

A Ito, M Kohda

ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4　1　490-493　1996
Language Modelling by String Pattern N-gram

ITO Akinori, KOHDA Masaki

IEICE technical report. Natural language understanding and models of communication　95　(429)　19-24　1995/12/15
Publisher: The Institute of Electronics, Information and Communication Engineers

More details Close

Markov model based language models (N-gram) are popular among sentence/dialog speech recognition. On applying these models to Japanese speech recognition, one has to decide what to be a unit of N-gram. As Japanese sentence is not divided into words, the morphemic analysis is required before word-by-word processing. But it is difficult to get the precise analysis automatically for spontaneous speech transcription. In this paper, we propose several language models which enable fully automatic construction of the model. We examined three types of models: N-gram by string pattern, N-gram by automatic morphemic analysis and string pattern class N-gram. These models were compared by perplexity. From the experimental results, the string pattern class N-gram got better performance than morpheme N-gram.
Language Modelling by String Pattern N - gram

ITO Akinori, KOHDA Masaki

IPSJ SIG Notes　1995　(120)　105-112　1995/12/14
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

Markov model based language models (N-gram) are popular among sentence/dialog speech recognition. On applying these models to Japanese speech recognition, one has to dicide what to be a unit of N-gram. As Japanese sentence is not divided into words, the morphemic analysis is required before word-by-word processing. But it is difficult to get the precise analysis automatically for spontaneous speech transcription. In this paper, we propose several language models which enable fully automatic construction of the model. We examined three types of models : N-gram by string pattern, N-gram by automatic morphemic analysis and string pattern class N-gram. These models were compared by perplexity. From the experimental results, the string pattern class N-gram got better performance than morpheme N-gram.
Automatic generation of Japanese Bunsetsu structure represented

ITO Akinori, KOHDA Masaki

1995　(2)　19-20　1995/09/01

ISSN： 1340-3168
SuperTAINS: Tohoku University Network realizes multimedia applications through sub-giga network

Yukiyoshi Kameyama, Akinori Ito, Hiroaki Kobayashi

Computer and Network LAN　13　(6)　114-120　1995/06
Publisher: Ohmsha
A NEW HMNET CONSTRUCTION ALGORITHM REQUIRING NO CONTEXTUAL FACTORS

M SUZUKI, S MAKINO, A ITO, H ASO, H SHIMODAIRA

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS　E78D　(6)　662-668　1995/06

ISSN： 0916-8532
On a Bunsetsu structure model with several constraints for speech recognition

ITO Akinori, MAKINO Shozo

IPSJ SIG Notes　1995　(51)　43-50　1995/05/25
Publisher: Information Processing Society of Japan (IPSJ)
ISSN： 0919-6072

More details Close

This paper describes a bunsetsu (phrase) model for Japanese spontaneous speech recognition. This model is represented as a finite automaton which covers almost all expressions in dialog transcriptions in the ASJ continuous speech corpus. This model contains 3386 conceptual words and 615 functional words. Next, stochastic language models are combined with the bunsetsu model. Two types of stochastic models are investigated : a stochastic regular grammar and a N-gram model. When combined with the bunsetsu model, a bigram model gets smaller perplexity. Finally, several attributes are introduced the bunsetsu model to express constraints between distant words in a phrase. The finite automaton model with attributes is automatically converted to a finite automaton without attributes, which can be easily used for conventional speech recognition schemes.
On introducing several grammatical constraints into a bunsetsu structure model for spoken dialog recognition

ITO Akinori, MAKINO Shozo

1995　(1)　183-184　1995/03/01

ISSN： 1340-3168
対話音声認識のための事前タスク適応の検討

伊藤彰則

信学技報NLC96-50,SP96-81　1995
Performance prediction of word recognition using the probability of word occurrence

Takashi Otsuki, Teruhiko Otomo, Akinori Ito, Shozo Makino

Electronics and Communications in Japan (Part III: Fundamental Electronic Science)　78　(3)　10-19　1995

DOI： 10.1002/ecjc.4430780302 　

ISSN： 1520-6440 1042-0967
Performance prediction of word recognition using the transition information between phonemes or between characters

Takashi Otsuki, Shozo Makino, Akinori Ito, Toshio Sone

Systems and Computers in Japan　25　(7)　72-81　1994

DOI： 10.1002/scj.4690250707 　

ISSN： 1520-684X 0882-1666
The performance evaluation on Sentence recognition system using a finite state automaton-the relationship between word recognition score and sentence recognition score-

Otsuki Takashi, Ito Akinori, Makino Shozo, Otomo Teruhiko

IEICE technical report. Speech　93　(183)　41-48　1993/08/19
Publisher: The Institute of Electronics, Information and Communication Engineers

More details Close

This report presents the performance evaluation method on sentence recognition system which uses finite state automaton.The relationship between word recognition score and sentence recognition score can be predicted using the number of sentences at a short distance.But it is not clear that how we get this number when the finite state automaton is used as linguistic information.Therefore,we propose the algorithm to calcurate this number to predict the relationship between word recognition score and sentence recognition score.And we carry out the prediction using the method we proposed,and carry out simulation to evaluate the accuracy of prediction.
The performance evaluation method on sentence recognition system which uses the transition information between word categories.

46　197-198　1993/03/01
Detection of Unknown Words in the Morphemic Analysis for Construction of a Word Dictionary

46　55-56　1993/03/01
Speech to Text Conversion System Based on Phoneme Recognition Peer-reviewed

Shozo Makino, Akinori Ito, Mitsuru Endo, Ken'ichi Kido

The Annals of Applied Information Sciences　18　(1-2)　51-66　1993/03
A NEW WORD PRESELECTION METHOD BASED ON AN EXTENDED REDUNDANT HASH ADDRESSING FOR CONTINUOUS SPEECH RECOGNITION

A ITO, S MAKINO

ICASSP-93 : 1993 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5　2　B299-B302　1993

ISSN： 0736-7791
Detection of Unknown Words using a Bunsetsu Automaton

45　167-168　1992/09/28
Detection of Unknown Words in the Morphemic Analysis for Corpus

44　177-178　1992/02/24
Redundant Hash Addressing法と機能語予測CYK法を組み合わせた連続音声認識の統語処理

伊藤彰則, 牧野正三

全国大会講演論文集　44　165-166　1992/02/24

More details Close

連続音声認識の統語処理法としては,拡張LR法に基づくものやChart Parser,Earley法などに基づくものが提案されている.しかし,これらのアルゴリズムでは主に文法からの単語予測というトップダウンの処理を中心にしているため,文法的に予測された単語については,すベて入力系列とマッチングを行ってみる必要がある.これは認識精度を高めるという点においては有効であるが,大語彙の連続音声認識システムを構築する場合に計算量が問題となる.本稿では,KohonenのRedundant Hash Addressing法を連続音声認識に拡張し,これを筆者らの提案した連続音声認識アルゴリズムである機能語予測CYK法の予備選択として使う方法を示す.この方法を用いることにより,入力音素系列中からそこに含まれる実質語を高速に予備選択することができるため,単語マッチングの計算量を削減することができる.
言語情報を利用した文字認識における文字認識率と単語認識率の関係

大槻恭士, 伊藤彰則, 牧野正三, 曽根敏夫

全国大会講演論文集　44　141-142　1992/02/24

More details Close

文字認識の後処理として,単語辞書や文字連接情報などの言語情報が用いられている.特に文字連接情報は,簡単で高速な処理で単語辞書と同等な効果が得られることが報告されている.本稿では,これらの言語情報を用いた文字認識における,文字認識率と単語認識率の関係を,実際に認識を行なわずに求める手法を示す.
A JAPANESE TEXT DICTATION SYSTEM BASED ON PHONEME RECOGNITION AND A DEPENDENCY GRAMMAR

S MAKINO, A ITO, M ENDO, K KIDO

ICASSP 91, VOLS 1-5　1　273-276　1991

ISSN： 0736-7791
Computerized System for Recording and Analysis of the Circadian Biological Activity Peer-reviewed

Kunio Isono, Yoshiharu Oda, Akinori Ito, Satoshi Hongo, Masao Miyauchi, Atsushi Harada, Shouichi Musashi, Yasuo Tsukahara

The Annals of Applied Information Sciences　15　(1)　155-166　1990/03
文章朗読音声を対象とした連続音声認識のための言語処理

伊藤彰則

東北大応用情報研究センターシンポジウム予稿集　143-150　1990
Linguistic Processing in Japanese Dictation System Peer-reviewed

Shozo Makino, Akinori Ito, Mitsuru Endo, Ken'iti Kido

Preprints of The Third Symposium on Advanced Man-Machine Interface Through Spoken Language　25-1-25-10　1989/12
Bunsetsu-spotting Based Linguistic Processing for a Japanese Dictation System Peer-reviewed

Shozo Makino, Akinori Ito, Yoichi Ogawa, Michio Okada, Ken'iti Kido

Preprints of The Second Symposium on Advanced Man-Machine Interface Through Spoken Language　29-1-29-10　1988/11

Show all ︎Show first 5

Books and Other Publications 7

Issues in Japanese Psycholinguistics from Comparative Perspectives

Masatoshi Koizumi

De Gruyter Mouton　2023/07

ISBN: 9783110778946
社会言語科学の源流を追う

横山, 詔一, 杉戸, 清樹, 佐藤, 和之, 米田, 正人, 前田, 忠彦, 阿部, 貴人

ひつじ書房　2018/09

ISBN: 9784894769311
音響情報ハイディング技術

鵜木, 祐史, 西村, 竜一, 伊藤, 彰則, 西村, 明, 近藤, 和弘, 薗田, 光太郎

コロナ社　2018/03

ISBN: 9784339011357
音響学入門

鈴木陽一, 赤木正人, 伊藤彰則, 佐藤洋, 苣木禎史, 中村健太郎

2010/02
Spoken Language Systems

Seiichi Nakagawa, Michio Okada, Tatsuya Kawahara

Ohmsha/IOS Press　2005/09/15
IT Text Speech Recognition System

Kiyohiro Shikano, Katsunobu Itoh, Tatsuya Kawahara, Kazuya Takeda, Mikio Yamamoto

Ohmsha　2001/05/15
Recent Research towards Advanced Man-Machine Interface through Spoken Language

Shozo Makino, Akinori Ito, Mitsuru Endo, Ken'iti Kido

Elsevier　1996/01

Show all Show first 5

Presentations 36

DNN-based talking movie generation with face direction consideration

Toru Ishikawa, Takashi Nose, Akinori Ito

Smart Innovation, Systems and Technologies　2019/01/01

More details Close

© Springer Nature Switzerland AG 2019. In this paper, we propose a method to generate a talking head animation considering the direction of the face. The proposed method parametrizes a facial image using the active appearance model (AAM) and models the parameters of the AAM using a feedforward deep neural network. Since the AAM is a two-dimensional face model, conventional methods that use the AAM assumes only the frontal face. Thus, when combining the generated face and other parts such as a head and a body, the direction of the face and the head was often inconsistent. The proposed method models the shape parameters of the AAM using the principal component analysis (PCA) so that the direction and movement of individual facial parts are modeled separately; thus we substitute the face direction of the generated animation with that of the head part so that the direction of the face and the head coincides. We conducted an experiment to demonstrate that the proposed method can generate face animation with proper face direction.
Two-stage sequence-to-sequence neural voice conversion with low-to-high definition spectrogram mapping

Sou Miyamoto, Takashi Nose, Kazuyuki Hiroshiba, Yuri Odagiri, Akinori Ito

Smart Innovation, Systems and Technologies　2019/01/01

More details Close

© Springer Nature Switzerland AG 2019. In this study, we propose a voice conversion technique with two-stage conversion, which is realized by using two models consisting of U-Net and pix2pix. Using U-Net, we tried to reproduce intonation of a target speaker by performing low-dimensional feature conversion considering the time direction. We introduced pix2pix for the task of spectrogram enhancement. The pix2pix is trained to map from low definition spectrogram to high definition spectrogram (low-to-high spectrogram mapping). Low definition spectrogram is reconstructed from low dimensional mel-cepstrum converted by U-Net and high definition spectrogram is extracted from natural speech. In objective evaluations, we showed that the proposed method was effective in improvement of mel-cepstral distance (MCD) and Log F0 RMSE. Subjective evaluations revealed that the use of the proposed method had a certain effect in improving speech individuality while maintaining the same level of naturalness as the conventional method.
Evaluation of english speech recognition for Japanese learners using DNN-based acoustic models

Jiang Fu, Yuya Chiba, Takashi Nose, Akinori Ito

Smart Innovation, Systems and Technologies　2019/01/01

More details Close

© Springer Nature Switzerland AG 2019. Regarding the assistance of computer-assisted language learning (CALL) systems to make foreign language learning easier, it is necessary to recognize the utterances of the learner with high accuracy. The quality of CALL systems mainly depends on the accuracy of automatic speech recognition (ASR). However, since the pronunciation of non-native speakers is greatly different from that of native speakers, existing ASR system cannot well recognize speech accurately. To solve this problem, this research projects an acoustic model based on deep neural networks (DNN), which is trained by using ERJ (English Read by Japanese) database collected from 202 Japanese learners. Compared with traditional ASR systems, this new system significantly promotes the speech recognition accuracy.
Comparison of speech recognition performance between kaldi and google cloud speech API

Takashi Kimura, Takashi Nose, Shinji Hirooka, Shinji Hirooka, Yuya Chiba, Akinori Ito

Smart Innovation, Systems and Technologies　2019/01/01

More details Close

© Springer Nature Switzerland AG 2019. In recent years, many systems having a speech interface have grown. The speech interface includes spoken dialogue function and high performance of a spoken dialogue system has been required. The spoken dialogue system consists of a speech recognition module. In this study, we focus on the speech recognition module of the spoken dialogue system and aim for improving the spoken dialogue system by enhancing the performance of the speech recognition system. Among several speech recognition systems, Kaldi is a widely used speech recognition system in many kinds of researches. On the other hand, several speech recognition services that are Web API is also provided, such as IBM Watson Speech to Text, Microsoft Bing Speech API, and Google Cloud Speech API, which is known that it has high performance. This paper compares speech recognition performance between Kaldi and Google Cloud Speech API in WER and RTF and confirms the recognition performance of each recognition system.
Segmental pitch control using speech input based on differential contexts and features for customizable neural speech synthesis

Shinya Hanabusa, Takashi Nose, Akinori Ito

Smart Innovation, Systems and Technologies　2019/01/01

More details Close

© Springer Nature Switzerland AG 2019. This paper proposes a technique for controlling the pitch of synthetic speech at a segmental level using user input speech within a framework of speech synthesis based on deep neural networks (DNNs). In a previous study, we proposed tailor-made speech synthesis, the speech synthesis technique which enables users to control the synthetic speech naturally and intuitively. We introduced differential fundamental frequency (F0) contexts into speaker model training of speech synthesis based on DNNs. The differential F0 context represents relative log F0 at the segmental level of training data. In this study, we use the user speech to determine the F0 contexts for synthetic speech. This approach allows users to modify and control the segmental pitch more flexibly, which will enhance the performance of the tailor-made speech synthesis.
A study on a spoken dialogue system with cooperative emotional speech synthesis using acoustic and linguistic information

Mai Yamanaka, Yuya Chiba, Takashi Nose, Akinori Ito

Smart Innovation, Systems and Technologies　2019/01/01

More details Close

© Springer Nature Switzerland AG 2019. This study examines an emotion labeling method for a system utterance of a non-task-oriented spoken dialogue system. The conventional study proposed the cooperative emotion labeling, which generates an emotional speech with an emotion label estimated from user and system utterances. However, this method had a problem that the system cannot decide the emotion label when the emotion is not estimated from the linguistic information. Therefore, we propose a method that uses both the acoustic and the linguistic information for the emotion recognition. In this paper, we show the performance of the emotion recognition when using the acoustic features first. Then, a dialogue experiment based on scenarios is conducted to verify the effectiveness of the proposed emotion labeling method.
Muting machine speech using audio watermarking

Akinori Ito

Smart Innovation, Systems and Technologies　2019/01/01

More details Close

© Springer Nature Switzerland AG 2019. Spoken dialog systems have become popular and are used in a home environment, such as smart speakers. A problem will occur when two or more smart speakers are in the same environment, in which a dialog system misdetects the other dialog systems voice as a users voice. In this paper, a method to mute synthesized speech is proposed to prevent a speech recognizer from recognizing speech uttered by a machine. The audio watermark technique is used to indicate that a machine utters the speech, and the speech recognizer attenuates the observed speech if it contains the watermark. The watermark is embedded in high frequency so that humans cannot perceive the watermark and the watermark is robustly extracted. From the experimental result, we found that the proposed method robustly determine the existence of the watermark when the SNR is no less than 0 dB.
Melody completion based on convolutional neural networks and generative adversarial learning

Kosuke Nakamura, Takashi Nose, Yuya Chiba, Akinori Ito

Smart Innovation, Systems and Technologies　2019/01/01

More details Close

© Springer Nature Switzerland AG 2019. In this paper, we deal with melody completion, a technique which smoothly completes melodies that are partially masked. Melody completion can be used to help people compose or arrange pieces of music in several ways, such as editing existing melodies or connecting two other melodies. In recent years, various methods have been proposed for realizing high-quality completion via neural networks. Therefore, in this research, we examine a method of melody completion based on an image completion network. We represent melodies of a certain length as images and train a completion network to complete those images. The completion network consists of convolution layers and is trained in the framework of generative adversarial networks. We also consider chord progression from musical pieces as conditions.
Leveraging a small corpus by different frame shifts for training of a speech recognizer

Akinori Ito

Smart Innovation, Systems and Technologies　2019/01/01

More details Close

© Springer Nature Switzerland AG 2019. During the feature extraction process for speech recognition, a window function is first applied to the input waveform to extract temporally-limited spectrum. By shifting the window function with a short time period, we can analyze the temporal change of speech spectrum. This time period is called “the frame shift,” which is usually 5 to 10 ms. In this paper, frame shift is re-considered from two aspects. The first one is the appropriateness of 10 ms as the frame shift. The frame-based process is based on the assumption that temporal change of speech spectrum is slow enough compared with the frame shift, which does not hold for kinds of consonants such as plosives. Thus, this paper experimentally shows that feature value fluctuates much according to the first position of the frame. Then a training method is proposed that uses temporally shifted samples as independent samples to compensate for the fluctuation of feature caused by the difference of the beginning position of a frame. The second aspect is that the frame shift could be longer if the fluctuation can be compensated. To prove this, an experiment was conducted to change frame shift from 10 to 60 ms, and it was found that the result of 40 ms frame shift outperformed the result of 10 ms frame shift, and comparable recognition performance with 10 ms frame shift result was obtained with 50 ms frame shift.
A study on ship type identification by use of deep neural network

西村竜一, 天間克宏, 服部聖彦, 金子健司, 伊藤彰則, 藤井豊展, 木島明博

電子情報通信学会技術研究報告 = IEICE technical report : 信学技報　2018/10
日常音識別による活動記録自動生成のためのデータの収集と分析

古谷崇拓, 千葉祐弥, 能勢隆, 伊藤彰則

情報処理学会研究報告　2017/06/17
低リソースな計算機による音声認識の検討

長野雄, 伊藤彰則, 大河雄一

日本音響学会2014年春季研究発表会講演論文集　2014/03
The Available Telecommunications Services at Serious Disaster Invited

Sadao Shoji, Takafumi Aoki, Akinori Ito, Shinichiro Omachi, Koichi Ito

IEICE Technical Report　2012/09

More details Close

NS2012-64,IN2012-62,CS2012-53
混合音響信号の正弦波モデルによる分析合成

五十嵐佑樹, 伊藤仁, 伊藤彰則

電気関係学会東北支部連合大会講演論文集　2012
口唇画像情報の音声信号へのデータハイディング

阿部洋平, 伊藤彰則

電気関係学会東北支部連合大会講演論文集　2012
断片的な環境測定に基づく雑音除去の検討

町田晃平, 伊藤彰則

電気関係学会東北支部連合大会講演論文集　2012
人間共存型ロボットのための呼びかけ制御の検討

戸塚典子, 伊藤彰則

電気関係学会東北支部連合大会講演論文集　2012
統計的言語モデルを用いた作詞補助システム

阿部ちひろ, 伊藤彰則

電気関係学会東北支部連合大会講演論文集　2011
雑音環境下での頑健な単語検出

藤田一暁, 咸聖俊, 伊藤彰則

電気関係学会東北支部連合大会講演論文集　2011
音声合成用コーパス作成方式に関する研究

加藤圭造, 伊藤彰則

電気関係学会東北支部連合大会講演論文集　2011
拡張現実感を用いたバーチャル対話エージェントに関する研究

三宅真司, 伊藤彰則

電気関係学会東北支部連合大会講演論文集　2011
混合重み再学習を用いた単語モデルによる連続音声認識

大越真裕美, 鈴木基之, 大河雄一, 伊藤彰則, 牧野正三

日本音響学会 2009年春季研究発表会講演論文集，1-P-23　2009/03
音素トライフォンの混合重み再学習に基づく孤立単語認識

大越真裕美, 鈴木基之, 大河雄一, 伊藤彰則, 牧野正三

日本音響学会 2008年秋季研究発表会講演論文集　2008/09
Adaptive Multiple Description Coding of Flash Video based on Bitstream Pattern Reconstruction

KURAISHI Takuya, ITO Masashi, ITO Akinori, MAKINO Shozo

ITE Technical Report　2008

More details Close

Multiple Description (MD) Coding is one of effective methods for concealing burst packet loss. This method divides source information into multiple streams, and adds them correlation using redundant information. Utilizing the redundant information, the source can be fairly recovered if packet losses occur during the transmission. In this paper, we propose a method of MD Coding for Flash Video (FLV) based on bitstream pattern reconstruction. The effectiveness of the proposed method is examined for actual video data with packet loss simulations. Our proposed method showed almost equal quality with related method, but only needed a little redundancy. This result supported the proposed method to be effective for concealing burst packet loss.
複数パスを有する音素モデル連結のためのパス間接続確率の平滑化法の検討

本間大輔, 大河雄一, 鈴木基之, 伊藤彰則, 牧野正三

日本音響学会2007年秋季研究発表会講演論文集　2007/09
HMnetのパス接続確率を利用した音素認識の検討

本間大輔, 大河雄一, 鈴木基之, 伊藤彰則, 牧野正三

日本音響学会2007年春季研究発表会講演論文集　2007/03
発話速度と言語的特徴の影響を考慮した持続時間モデルを用いた音声認識に関する研究

大河雄一, 伊藤彰則, 鈴木基之, 牧野正三

東北大学電気通信研究所音響工学研究会 344-1　2006/08
音素持続時間予測モデルを用いたリスコアリングによる自然発話音声認識

大河雄一, 伊藤彰則, 鈴木基之, 牧野正三

日本音響学会2006年春季研究発表会講演論文集　2006/03
再学習とモデル選択の反復によるマルチパス音響モデルの最適化

大河雄一, 伊藤彰則, 鈴木基之, 牧野正三

日本音響学会2004年秋季研究発表会講演論文集 I　2004/09
オールスターモデル選択法による自然発話音声音響モデル学習の検討

大河雄一, 伊藤彰則, 鈴木基之, 牧野正三

日本音響学会2004年春季研究発表会講演論文集 I　2004/03
SATを用いた二言語混合音響モデルの話者適応

小笠原洋一, 伊藤彰則, 鈴木基之, 牧野正三, 大河雄一

日本音響学会2004年春季研究発表会講演論文集 I　2004/03
自然発話音声認識のための高精度な音響モデル学習法の検討

大河雄一, 鈴木基之, 伊藤彰則, 牧野正三

東北大学電気通信研究所音響工学研究会327-1　2003/11
学習話者の異なる複数言語の音響モデルの話者適応の検討

小笠原洋一, 鈴木基之, 伊藤彰則, 牧野正三, 大河雄一

日本音響学会 2003年秋季研究発表会講演論文集 I　2003/09
マルチパス音響モデルによる自然発話音声の認識に関する研究

大河雄一, 吉田明弘, 鈴木基之, 伊藤彰則, 牧野正三

東北大学電気通信研究所音響工学研究会 325-1　2003/07
適応学習における話者適応法の比較

大河雄一, 鈴木基之, 伊藤彰則, 牧野正三

日本音響学会 2002年秋季研究発表会講演論文集 I　2002/09
Evaluation of Japanese Dictation ToolKit-1997 version-

Kawahara,Tatsuya, Lee,Akinobu, Kobayashi,Tetsunori, Takeda,Kazuya, Minematsu,Nobuaki, Ito,Katsunobu, Ito,Akinori, Yamamoto,Mikio, Yamada,Atsushi, Utsuro,Takehito, Shikano,Kiyohiro

IPSJ SIG Notes　1998/05

Show all Show first 5

Industrial Property Rights 5

スコアリングモデル生成装置、学習データ生成装置、検索システム、スコアリングモデル生成方法、学習データ生成方法、検索方法及びそのプログラム

特許第5700566号

Property Type: Patent
音声評価装置，音声評価方法，及びプログラム

特許第5805474号

Property Type: Patent
モデルパラメータ配列装置とその方法とプログラム

大庭隆伸, 堀貴明, 中村篤, 伊藤彰則

特許第5610304号

Property Type: Patent
モデル縮減装置とその方法とプログラム

大庭隆伸, 堀貴明, 中村篤, 伊藤彰則

特許第5780516号

Property Type: Patent
データ通信方法、データ通信システムおよびデータ通信プログラム

鈴木陽一, 伊藤彰則, 阿部俊一郎, 須藤裕史, 吉木伸二, 染谷大

特許第4911385号

Property Type: Patent

Research Projects 25

Music Information Processing Competitive

2004/04 - Present
Development of a CALL system using speech recognition technology Competitive

System: Grant-in-Aid for Scientific Research

2004/04 - Present
Development of Speech Recognition System Competitive

System: Ordinary Research

2002/04 - Present
Development of spoken dialog systems Competitive

2002/04 - Present
Pseudo-Dynamic Preservation and Elucidation of Neural Processing of Endangered Languages Based on Natural Discourse Corpora with Physiological Indices

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Scientific Research (A)

Institution: Tohoku University

2024/04/01 - 2028/03/31
Development of a virtual classmate for assistance of online course

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Scientific Research (B)

Institution: Tohoku University

2021/04/01 - 2026/03/31
Development of a virtual classmate for assistance of online course

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Scientific Research (B)

Institution: Tohoku University

2021/04/01 - 2026/03/31
話者・地域・スタイルモーフィング音声合成による実環境リスニング学習支援

能勢隆, 伊藤彰則

Offer Organization: 日本学術振興会

System: 科学研究費助成事業

Category: 基盤研究(B)

Institution: 東北大学

2022/04/01 - 2025/03/31

More details Close

本研究課題では、「音響工学および音声知覚の観点から、リスニング能力の効率的な向上のための方法論とはなにか?」という学術的問いに対する解を導くため、これまで我々が統計的音声合成、機械学習、対話型英会話学習システムなどの研究により培ってきた個別の要素技術を融合・発展させ、話者・地域・スタイル・訛りといった英語音声の特徴を深層学習に基づくモーフィング技術により段階的にシミュレーション可能な全く新しい実環境リスニング学習支援の実現を目指し、以下の具体的な4項目について検討を行うことを目的とする。(a)多様な話者・地域・スタイルを有する音声コーパスの設計と構築、(b)深層学習に基づくモーフィング音声合成技術の確立、(c)モーフィング音声合成を用いたリスニング学習支援システムの開発、(d)提案システムによる実環境におけるリスニング能力向上の実証実験。2023年度は上記のうち(b)および(c)について話速スタイルの観点から検討を行った。(b)については、Glow-TTSをベースとして話速情報を埋め込むことにより話速および話速に関係するスタイル（話速スタイル）の制御が可能であることを示すとともに、テキストエンコーダの改良により、音声・スタイルの再現性についての改善手法を提案し、その有効性を客観指標により示した。(c)についてはWebベースで利用可能な段階的な話速制御に基づくリスニング学習・評価システムを構築した。(d)については(c)のシステムをクラウドソーシングにおり実際に利用してもらい、従来の話速制御を行わないシステムと比較してリスニング能力が向上することを実験的に示した。
話者・地域・スタイルモーフィング音声合成による実環境リスニング学習支援

能勢隆, 伊藤彰則

Offer Organization: 日本学術振興会

System: 科学研究費助成事業

Category: 基盤研究(B)

Institution: 東北大学

2022/04/01 - 2025/03/31
Field-based Cognitive Neuroscientific Study of Word Order in Language and Order of Thinking from the OS Language Perspective

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Scientific Research (S)

Institution: Tohoku University

2019/06/26 - 2024/03/31
Measurement of entrepreneurship using natural language processing and application to the improvement of education program

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Challenging Research (Exploratory)

Institution: Tohoku University

2020/07/30 - 2023/03/31
Research and development of multi-modal interactive English learning system based on deep learning

ITO Akinori

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Scientific Research (A)

Institution: Tohoku University

2017/04/01 - 2021/03/31

More details Close

We developed technologies for an English conversation learning system based on deep learning and created a CALL system for practicing English conversation: (1) We established technology for recognizing English speech spoken by Japanese with high accuracy to improve the accuracy of interfaces for speech, facial expressions, and gestures based on deep learning. (2) To establish English pronunciation evaluation and English conversation simulation technology based on deep learning, we investigated the effects of facial expressions and gestures on English proficiency evaluation. In addition, we established a method to evaluate pronunciation with high accuracy for interactive speech. (3) We integrated the technologies to create a spoken dialogue English conversation learning system.
OS言語からみた「言語の語順」と「思考の順序」に関するフィールド言語心理学的研究

小泉政利, 安永大地, 木山幸子, 大塚祐子, 遊佐典昭, 酒井弘, 大滝宏一, 杉崎鉱司, Jeong Hyeonjeong, 新国佳祐, 玉岡賀津雄, 伊藤彰則, 金情浩, 那須川訓也, 里麻奈美, 矢野雅貴, 小野創

Offer Organization: 日本学術振興会

System: 科学研究費助成事業

Category: 基盤研究(A)

Institution: 東北大学

2019/04/01 - 2020/03/31

More details Close

8月にトンガ王国で以下のような調査・実験を行うための準備を進めた。（１）語彙処理，文処理，正順語順の判定，格助詞脱落などの諸問題を網羅した一連の実験と質問紙調査（２）主語関係節と目的語関係節の理解過程の比較実験（３）統語的能格性の獲得に関する行動実験また、関連する研究動向について情報収集を行うために、日本言語学会第158回大会（一橋大学）に参加した。
Basic research for YASASHII NIHONGO database construction

MAEDA Rikako, SATOH Kazuyuki, ITO Akinori, SUGITO Seiju, SUN Weiting, BABA Yasumasa, MIZUNO Yoshimichi, MISONOU Yasuko, YONEDA Masato

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Scientific Research (C)

Institution: Daito Bunka University

2015/04/01 - 2018/03/31

More details Close

To support GENSAI practice for beginner learner of Japanese language who disadvantaged in catching emergency information, I collected and analyzed YASASHII NIHONGO resources for GENSAI. And I built learning resources for the people who try to be a user of "YASASHII NIHONGO for GENSAI". On building learning resources, I focused on emergency information which prepared to send until 72 hours later after an earthquake occurs.
Research of Human-Kind Dialogue System with Recognition and Synthesis of Various Speech Based on State Estimation

Nose Takashi, MORI Hiroki

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Scientific Research (B)

Institution: Tohoku University

2015/04/01 - 2018/03/31

More details Close

In this research project, we improved and advanced techniques of recognition and synthesis of various speech, and studied a state estimation technique of system users and its applications to realize a dialogue system kind to users. Specifically, (1) We studied the validity of using emotions and a technique for emotion estimation. (2) We proposed and evaluated a sentence selection technique based on extended entropy where phonetic and prosodic contexts are taken into account. (3) We recorded and analyzed dialogue data for willingness estimation. (4) We constructed a large-scale emotional speech corpus that can be used for emotional speech synthesis/recognition and emotion estimation. (5) We proposed and evaluated variance compensation and taylor-made speech synthesis as a technique of synthesizing various and high-quality speech synthesis.
Development of Easy Japanese composition support system using sentence difficulty estimation and speech synthesis

Ito Akinori, CHIBA Yuya, NAGANO Takeshi

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Scientific Research (B)

Institution: Tohoku University

2014/04/01 - 2017/03/31

More details Close

We conducted development of Easy Japanese composition support system YANSIS and related investigations. We developed a method of automatic estimation of difficulty of a sentence, and investigated relation between intelligibility of Japanese speech listened by non-Japanese-native speakers and speech rate, pause, and speech degradation by reverberation. This investigation revealed the most appropriate speech rate for Easy Japanese speech. In addition, we implemented the function of automatic sentence difficulty estimation and speech synthesizer into YANSIS.
Development of an English conversation learning system based on spoken dialog with an agent

ITO Akinori, HIROI Yutaka

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Challenging Exploratory Research

Institution: Tohoku University

2012/04/01 - 2015/03/31

More details Close

In this project, we developed a system for training of communication skill in English for Japanese learners, in which the learner makes conversation exercises with a robot or a virtual character. First, we developed a robot that could move in a room by following a person, understand a position by recognizing the pointing gesture and made conversations with the learner in English. Then we developed a speech recognition method where the learner’s speech with grammatical mistakes can be recognized correctly. Finally, we developed a method for conversation exercise with a virtual character where the learner can acquire a proper timing for answering the interlocutor’s utterance.
Automatic prosody evaluation and grammatical mistake detection for English learning by Japanese native speakers

ITO Akinori, SUZUKI Motoyuki, MAKINO Shozo, OHKAWA Yuichi

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Scientific Research (B)

Institution: Tohoku University

2008 - 2010

More details Close

I have developed two methods that enable speaking exercise in a computer-assisted language learning (CALL) system : a method for evaluating prosody of an English utterance made by a learner, and a method for detecting grammatical mistakes included in the learner's utterance. As for prosody evaluation, I developed an estimation method of word importance factors using a decision tree, and obtained a high correlation to human assessment score, which is comparable to correlation between scores given by human evaluators. As for the grammatical mistake detection, I proposed a method for training an n-gram language model from artificially generated sentences with mistakes, and obtained 89.2% word accuracy.
Development of a computer-assisted language learning system utilizing a speaker adaptation and a grammatical error modeling

ITO Akinori, SUZUKI Motoyuki, MAKINO Shozo

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Scientific Research (B)

Institution: Tohoku University

2004 - 2007

More details Close

1. Improvement of Pronunciation Evaluation We developed a method to detect pronunciation mistakes made by a foreign language learner by applying a speech recognition technology. We focused on the following two targets: learning of English by Japanese native speakers and learning of Japanese by Korean native speakers. To improve the accuracy of detection of mispronunciations, we developed a bilingual speaker adaptation method to adapt both of English and Japanese HMMs to the learner. To solve a problem that the strictness of the detection of mispronunciations depends on the linguistic context of the pronunciation, we developed a method of detecting mispronunciations using a decision tree, which gave a detection accuracy of more than 90% on English utterances made by Japanese native speakers. 2. Evaluation of Intonation and Rhythm In addition to the detection of mispronunciations, we developed methods to evaluate intonation and rhythm of the English utterances. We found that the log F0, log power and their derivatives were good features for the evaluation of intonation. To adjust the strictness of intonation evaluation from word to word, we introduced a method to estimate the word importance factors using a decision tree. We also found that a word duration ratio was a good feature for rhythm evaluation. 3. Development of an interactive CALL system We developed a method of detecting grammatical errors from utterances of a learner for the application to an interactive CALL system that enables for a learner to learn a foreign language through dialogues with a computer. As for the learning of Japanese, we developed a method to recognize the learner's speech using a finite state automaton to which grammatical error rules were applied. As for the learning of English, we developed a method to use an n-gram language model trained from a corpus that was automatically generated using the grammatical error rules.
Large Vocabulary Continuous Speech Recognition System on Japanese Newspaper Reading Task

KOHDA Masaki, KATOH Masaharu, ITO Akinori

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Scientific Research (C)

Institution: Yamagata University

1998 - 2000

More details Close

We investigated large vocabulary continuous speech recognition (LVCSR) system on Japanese newspaper reading task, and obtained the following results. (1) Acoustic models : A Hidden Markov Network (HM-Net) is a highly accurate and robust acoustic model which represents a tied-state structure of context dependent Hidden Markov Models as a network. We propose a state clustering-based rapid topology design method to generate high accuracy HM-Nets for LVCSR.Furthermore, MLLR (Maximum Likelihood Linear Regression)-based speaker adaptation of acoustic models is investigated, and a regression class selection algorithm based on the BIC principle is proposed. (2) Language models : N-gram task adaptation method is investigated, which uses large corpus of the general task (TI text) and small corpus of the specific task (AD text), and employs a simple weighting to mix TI and AD texts. Furthermore we propose a new SCFG (Stochastic Context Free Grammar) model which uses a phrase-based dependency grammar instead of general CFG.Word error rate in the case of using the mixture model besed on the proposed SCFG model and trigram becomes less than that in the case of using only the trigram. (3) Decoder : We investigate about fast search strategies for LVCSR, and propose a new method - a phoneme-graph-based hypothesis restriction, which effectually prunes the search space. In the proposed method, a phoneme graph is generated at the pre-processing stage, and then the best word sequence is searched while restricting expansion of hypotheses using the information of the phoneme graph at the main recognition stage. In the multiple pass LVCSR system that uses word graph as an intermediate data structure, decoder parameters should be optimized in order to generate a good word graph. A new method to optimize these parameters is proposed. This method uses rescoring of the word graph using bigram LM instead of generating many word graphs for each parameter setting. (4) Software Tool : We describe a statistical language model toolkit for word and class-based n-gram. This toolkit has command-level compatibility with CMU-Cambridge SLM Toolkit, and supports class n-gram and n-gram count mixture as well as combined language model using linear interpolation.
日本語音声認識のための統計的言語モデルとそのタスク適応に関する研究

伊藤彰則

Offer Organization: 日本学術振興会

System: 科学研究費助成事業

Category: 奨励研究(A)

Institution: 山形大学

1997 - 1998

More details Close

今年度の研究では,「日本語連続音声認識のための形態素解析によらない統計的言語モデル」の研究を行った.この研究は2つのサブテーマから成っている.一つは統計的に選ばれた文字列を単位とした言語モデルの作成であり,もう一つは統計的手法に基づく漢字かな混じり文への読みの付与である. 文字列を単位とした言語モデルの作成においては,提案法の評価実験として,さまざまな文字列への分割方法の比較実験,および学習テキストと評価テキストのタスクと規模を変えた実験を行った.その結果,頻度による文字列の抽出と左最長一致法による解析の組み合わせにおいて,もっとも大きいパ-プレキシティ低減効果(最大9.3%)が見られた.また,コーパスによる性能差を見るために,3種類の対話コーパスと,書き言葉であるEDRコーパスを用いた比較実験を行った.その結果,単一タスクであるATR会話コーパスにおけるパ-プレキシティ低減率がもっとも大きかった.これは,学習テキストのみから統計量の推定と分割単位の双方を決定するためであり,本手法の適用限界を示すものと言うことができる. 統計的な手法を用いた読みの付与では,EDRコーパスを用いて,N-gramモデルを応用した読み付与システムを作成し評価した.その結果,当該文字の前後1文字を用いてモデルを作成した場合が最も高性能であることが明らかとなった.システムの最高性能として96.27%の読み付与精度が得られた.
Continuous speech recognition with adaptabilty to the speaking rate of an input speech

MAKINO Shozo, SUZUKI Motoyuki, SONE Hideaki

Offer Organization: Japan Society for the Promotion of Science

System: Grants-in-Aid for Scientific Research

Category: Grant-in-Aid for Scientific Research (B)

Institution: Tohoku University

1995 - 1997

More details Close

This tesearch developed a spoken word recognition system which used phoneme duration information estimated from the speaking rate of an input speech. In this research, the speaking rate is assumed to be reflected to the average vowel length. The acoustic processor transforms the input speech into a similarity matrix using the modified LVQ2. The average vowel length is computed from the preliminary recognition result. The duration of each phoneme in each word template is estimated from the average length of vowels in the input speech. By taking into account the estimated phoneme duration, the spoken word recognition experiments were carried out using the DTW.The word recognition score was 97.3% for the 212 word vocabulary uttered by 5 male speakers (test set). The phoneme duration information is collected from the 212 word vocabulary uttered by another 5 male and 10 female speakers (training set). The hybrid combination of the prceiding phoneme dependent estimation and the follwoing phoneme dependent estimation gave the best performance. The above-mentioned method was extended to phoneme recognition. The phoneme accuracy increased from 71.8% to 86.3% for phonemes in the 212 word vocabulary uttered by 5 male speakers (test set).
連想的手法を用いた自由発話文音声の認識

伊藤彰則

Offer Organization: 日本学術振興会

System: 科学研究費助成事業

Category: 奨励研究(A)

Institution: 東北大学

1994 - 1994

More details Close

本研究は,自由発話文の認識のために新しい枠組の開発を目指したものであり,その方針として「連想関係」を中心にすえた.本研究は次の3つの段階から成る. 1.大規模な言語データベースを解析し,その中に表れる連想関係やその他の言語情報を調査する. 2.連想情報を用いて効率の良い認識のできるアルゴリズムを開発する. 3.これらの結果を用いて,実際に稼働する実験システムを試作する. 本年度の研究成果として,これらの段階のうち,(1)言語データベースの解析と,各種言語現象の調査,(2)連想関係を用いて認識を行うアルゴリズム「拡張RHA法」の開発の2つを行った.以下にその概略を述べる. 1.言語データベースの解析: 分析に用いたデータベースは,日本音響学会研究用連続音声データベースの模擬対話テキストデータベースに含まれる書き起こしテキストである.この中の44対話(3633発話,19019文節)を分析対象とした.まず,このテキストに対して形態素解析を行ない,実質語3386個,機能語615個を抽出した.次に,この分析結果から,対話音声のための文節モデルを構築した.このモデルは,従来我々が文章朗読音声認識のために用いてきた文節モデルを拡張したものである.この文節モデルを用いて,データベース内の単語間の遷移確率,perplexity等を求めている. 認識アルゴリズム「拡張RHA法」の開発: 連想関係を用いて連続音声中から単語認識を行なうアルゴリズム「拡張RHA法」を開発した.この認識法は,各種の情報を用いて単語を連想し認識するというものであり,従来のパターンマッチング的手法とは異なる.今回は,連想情報源として認識された音素のみを用い,従来的な連続音声認識の単語予備選択法として用い,その有効性を検証した.全く同じ枠組で,例えば単語の連続関係等の情報を有効に用いた音声認識が可能である.
統計と連想に基づく連続音声認識に関する研究

伊藤彰則

Offer Organization: 日本学術振興会

System: 科学研究費助成事業

Category: 奨励研究(A)

Institution: 東北大学

1993 - 1993

More details Close

今年度の研究内容としては,(1)文法情報の構築,(2)連想に基づく単語検出法の構築,および(3)単語連想における統計情報の利用,の3つが挙げられる.今回の研究では,当初の研究計画にある「単語から単語,あるいは単語から場面への連想情報の構築」は行っていない. 文法情報は,本研究の基礎となる重要な情報源である.本研究では,自然な発話の認識を目指して,会話音声中の文節構造を表現する有限オートマトンの構築を行った.会話資料としては,日本音響学会の連続音声データベース中の会話音声の書き起こしテキストを用いている.このテキストから間投詞などのいわゆる不用語を除き,残った表現を受理する文節内文法を有限オートマンで表現した.この文法の構築は,筆者の以前構築した文章音声のための文節内文法を改変する形で行われた. 連想に基づく単語検出法の研究として,「拡張RHA法」を提案した.拡張RHA法は,高速な単語認識法に用いられる「RHA(Redundant Hash Addressing)法」を連続音声認識用に拡張したものである.RHA法を連続音声認識に応用する際には,(1)単語向けの手法を連続音声用に変更することと,(2)元のRHA法の精度を改善することの2点が重要であった.(1)として,RHA法に「活性点(activation point)」の概念を導入し,RHAを単語検出に応用した.また(2)として,あらかじめ音素認識誤りを見込んだ「拡張fragment」を導入し,検出の高精度化をはかった.単語検出実験により,従来この用途に用いられてきた「連続DP法」と比較し,検出性能は遜色なく,検出速度は数倍高速であることが確かめられた. 拡張RHA法による単語検出に統計的要素を導入する一手段として,拡張fragmentによる単語検出法を提案した.拡張RHA法において,単語を連想するための単位は,あらかじめ固定された長さの音素組であったが,拡張fragmentを用いる方法では,その単位を統計的に決定する.この手法では,検出対象となる単語集合が与えられたとき,ひとつの連想単位から連想される単語が一定数以下になるように統計的に連想単位を決定する.具体的には,不定長の音素組を使って単語を連想するようになる.これによって無駄な連想が抑えられ,単語の誤検出を少なく抑えることができるようになった.
機能語予測CYK法を用いた連続音声認識システムに関する研究

伊藤彰則

Offer Organization: 日本学術振興会

System: 科学研究費助成事業

Category: 奨励研究(A)

Institution: 東北大学

1992 - 1992

Show all Show first 5

Works 2

palmkit: a toolkit for statistical language modeling

http://palmkit.sourceforge.net/ 2001/11/05 -

Type: Software
w3m: a web browser

http://w3m.sourceforge.net/ 1999/01/10 -

Type: Software

Social Activities 4

サイエンスカフェ

2013/06/28 -

More details Close

「スマホやロボットとどうやって会話できるのか？」と題して、おんせい認識・合成・対話技術について公開の公演を行った。
出前講義

2008/12/04 -

More details Close

宮城県仙台第二高校において，「ロボットとの対話」という題目で，高校生を対象に出前講義を行った．
出前講義

2008/10/18 -

More details Close

群馬県立太田高校において，「ロボットとの対話」という題目で，高校生を対象に出前講義を行った．
ネット障害時円滑送信

2007/03/23 -

Other 1

日本語ディクテーション基本ソフトウェアの開発

More details Close

日本語の大語彙連続音声認識の研究・開発・実用化を促進するため、誰でも利用でき、高精度な音声認識システムを開発する。このため、不特定話者に対して利用できる高精度な音響モデル、大量の言語データを用いて学習した言語モデル、および高速・高精度な音声認識エンジンの開発を行う。