【Azure、C#】Text to Speechをウェブアプリで使用する

Azure Text to Speechとは

文字列を音声にしてくれるサービス。

Text to Speechサービスにアクセスする

Azureサービスの一つText to Speechを使用するには、フロントエンドで直接Text to Speechサービスに接続する、バックエンドでサービスにアクセスし、フロントに返す方法がある。

Text to Speechを使用するためのSDK情報はこちらのMSサイトを参照。また、サンプルコードはこちらのMS社サイト、こちらのGitHubに記載がある。

ブラウザでText to Speechにアクセス

フロントエンドで直接サービスにアクセスする場合は、Html一つで完結することができるが、Text to Speechを使用するにあたり必要となるKey情報がHtml内に平文で載ることとなり、Keyがほかで使用される可能性が多いにあるため、公開アプリとしては使用できない。（Keyを隠す方法があればよいが。。。）

サンプルで紹介されているものほぼほぼそのままだが、こちらがHtmlのみでText to Speechにアクセスすることができる。
※Key、Regionは、Azureから取得するもので置き換えが必要。

<!DOCTYPE html>
<html lang="en">
<head>
    <title>Microsoft Cognitive Services Speech SDK JavaScript Quickstart</title>
    <meta charset="utf-8" />
</head>
<body>

    <button id="startSpeakTextAsyncButton" onclick="synthesizeSpeech()">speak</button>

    <!-- Speech SDK reference sdk. -->
    <!--<script src="microsoft.cognitiveservices.speech.sdk.bundle.js"></script>-->
    <script src="https://cdn.jsdelivr.net/npm/microsoft-cognitiveservices-speech-sdk@latest/distrib/browser/microsoft.cognitiveservices.speech.sdk.bundle-min.js"></script>
    <script>function synthesizeSpeech() {

    var speechConfig = SpeechSDK.SpeechConfig.fromSubscription("key", "region");

            var synthesizer = new SpeechSDK.SpeechSynthesizer(speechConfig);
            let inputText = "this is a pen";

            synthesizer.speakTextAsync(
                inputText,
                function (result) {
                    startSpeakTextAsyncButton.disabled = false;
                    window.console.log(result);
                    synthesizer.close();
                    synthesizer = undefined;
                });
        }</script>
</body>
</html>

バックエンドでText to Speechにアクセス

バックエンドでText to Speechにアクセスし、フロントエンドにbyte[]を返し、音声を再生する。

こちらがサンプルで作成したウェブサイト。

改善できる点は多数あるが、動きとしては、Audio部は、~/Speech/Readを指しており、ページが表示・リフレッシュされるタイミングで~/Speech/Readから音声byte[]を取得し、再生する。
Speechボタンが押されたときは、テキストフィールドに入力された文字列をセッションに入れる。（ボタンはサブミットのため、このときにページのリフレッシュが起きる。）
ページがリフレッシュされるとReadファンクションが実行され、先程セッションに入れたテキストフィールドの文字列をText to Speechサービスに送る、音声byte[]を返す。

Controllerファイル

using System;
using System.Threading.Tasks;
using Microsoft.AspNetCore.Mvc;
using Microsoft.CognitiveServices.Speech;

using CoreProject.Models;
using System.IO;
using Microsoft.AspNetCore.Http;

// For more information on enabling MVC for empty projects, visit https://go.microsoft.com/fwlink/?LinkID=397860

namespace CoreProject.Controllers
{
    public class SpeechController : Controller
    {
        [HttpGet]
        public IActionResult Index()
        {
            return View();
        }

        [HttpPost]
        [ActionName("Index")]
        public IActionResult Post(ReadModel readModel)
        {
            HttpContext.Session.SetString("READTEXT", readModel.readText);

            return View(readModel);
        }

        public async Task<IActionResult> Read()
        {
            string readText = string.Empty;
            if(HttpContext.Session.GetString("READTEXT") != null)
            {
                readText = HttpContext.Session.GetString("READTEXT").ToString();
            }

            var result = await Speech(readText);

            return  File(result, "audio/wav");
        }

        public async Task<Byte[]> Speech(string speechString)
        {
            var config = SpeechConfig.FromEndpoint(new Uri("https://centralus.api.cognitive.microsoft.com/sts/v1.0/issuetoken"), "Azureから取得するKeyを入力する");
            using var synthesizer = new SpeechSynthesizer(config, null);
            var result = await synthesizer.SpeakTextAsync(speechString);
            using var stream = new MemoryStream(result.AudioData);

            return stream.ToArray();
        }
    }
}

cshtmlファイル

@model CoreProject.Models.ReadModel

@{
    ViewData["Title"] = "Text to Speech";
}

<form method="post">
    <br />
    <label>Text to Speech</label>
    @Html.TextBoxFor((m) => Model.readText, new { onclick = "this.select()" })
    <button>Speech</button>
    <br />
    <audio controls autoplay>
        <source src="@Url.Content("~/Speech/Read")" type="audio/wav" />
        Your browser does not support the Audio
    </audio>
</form>

Azure Text to Speechとは

Text to Speechサービスにアクセスする

ブラウザでText to Speechにアクセス

バックエンドでText to Speechにアクセス

Share this: