解讀 Android TTS 語音合成播報 [復制鏈接]

2019-6-12 17:28
斜杠Allen 閱讀:311 評論:0 贊:2
Tag:  

隨著從事 Android 開發年限增加,負責的工作項目也從應用層開發逐步過渡到 Android Framework 層開發。雖然一開始就知道 Android 知識體系的龐大,但是當你逐漸從 Application 層向 Framework 層走的時候,你才發現之前懂得認知真是太少。之前更多打交道的 Activity 和 Fragment ,對于 Service 和 Broadcast 涉及的很少,更多注重的是界面的布局、動畫、網絡請求等,雖然走應用開發的話,后期會關注架構、性能優化、Hybrid等,但是逐漸接觸 Framework 層相關模塊時候,發現里面的知識點各種錯綜復雜,就好比講講今天分享的主題是 Android TTS

話不多說,先來張圖,分享大綱如下:

圖片描述

之前受一篇文章啟發,說的是如何講解好一個技術點知識,可以分為兩部分去介紹:外部應用維度和內部設計維度,基本從這兩個角度出發,可以把一個技術點講的透徹。同樣,我把這種方式應用到寫作中去。

外部應用維度

什么是 TTS

在 Android 中,TTS全稱叫做 Text to Speech,從字面就能理解它解決的問題是什么,把文本轉為語音服務,意思就是你輸入一段文本信息,然后Android 系統可以把這段文字播報出來。這種應用場景目前比較多是在各種語音助手APP上,很多手機系統集成商內部都有內置文本轉語音服務,可以讀當前頁面上的文本信息。同樣,在一些閱讀類APP上我們也能看到相關服務,打開微信讀書,里面就直接可以把當前頁面直接用語音方式播放出來,特別適合哪種不方便拿著手機屏幕閱讀的場景。

TTS 技術規范

這里主要用到的是TextToSpeech類來完成,使用TextToSpeech的步驟如下:

創建TextToSpeech對象,創建時傳入OnInitListener監聽器監聽示范創建成功。
設置TextToSpeech所使用語言國家選項,通過返回值判斷TTS是否支持該語言、國家選項。

調用speak()或synthesizeToFile方法。
關閉TTS,回收資源。

XML文件

<?xml version="1.0" encoding="utf-8"?>
<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent">

    <ScrollView
        android:layout_width="match_parent"
        android:layout_height="match_parent">

        <LinearLayout
        android:layout_width="match_parent"
        android:layout_height="match_parent"
        android:orientation="vertical">

<EditText
    android:id="@+id/edit_text1"
    android:layout_width="match_parent"
    android:layout_height="wrap_content"
    android:text="杭州自秦朝設縣治以來已有2200多年的歷史,曾是吳越國和南宋的都城。因風景秀麗,素有“人間天堂”的美譽。杭州得益于京杭運河和通商口岸的便利,以及自身發達的絲綢和糧食產業,歷史上曾是重要的商業集散中心。" />

<Button
    android:id="@+id/btn_tts1"
    android:layout_width="150dp"
    android:layout_height="60dp"
    android:layout_marginTop="10dp"
    android:text="TTS1" />

<EditText
    android:id="@+id/edit_text2"
    android:layout_width="match_parent"
    android:layout_height="wrap_content"
    android:text="伊利公開舉報原創始人鄭俊懷:多名高官充當保護傘  北京青年報  2018-10-24 12:01:46    10月24日上午,伊利公司在企業官方網站發出舉報信,公開舉報鄭俊懷等人,聲稱鄭俊懷索要巨額犯罪所得不成,動用最高檢某原副檢察長等人施壓,長期造謠迫害伊利,多位省部級、廳局級領導均充當鄭俊懷保護傘,人為抹掉2.4億犯罪事實,運作假減刑,14年來無人敢處理。" />

<Button
    android:id="@+id/btn_tts2"
    android:layout_width="150dp"
    android:layout_height="60dp"
    android:layout_marginTop="10dp"
    android:text="TTS2" />

<Button
    android:id="@+id/btn_cycle"
    android:layout_width="150dp"
    android:layout_height="60dp"
    android:layout_marginTop="10dp"
    android:text="Cycle TTS" />

<Button
    android:id="@+id/btn_second"
    android:layout_width="150dp"
    android:layout_height="60dp"
    android:layout_marginTop="10dp"
    android:text="Second TTS" />

        </LinearLayout>
    </ScrollView>
</RelativeLayout>

Activity文件

public class TtsMainActivity extends AppCompatActivity implements View.OnClickListener,TextToSpeech.OnInitListener {
    private static final String TAG = TtsMainActivity.class.getSimpleName();
    private static final int THREADNUM = 100; // 測試用的線程數目

    private EditText mTestEt1;
    private EditText mTestEt2;
    private TextToSpeech mTTS;  // TTS對象
    private XKAudioPolicyManager mXKAudioPolicyManager;
    private HashMap mParams = null;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        mTestEt1 = (EditText) findViewById(R.id.edit_text1);
        mTestEt2 = (EditText) findViewById(R.id.edit_text2);

        findViewById(R.id.btn_tts1).setOnClickListener(this);
        findViewById(R.id.btn_tts2).setOnClickListener(this);
        findViewById(R.id.btn_cycle).setOnClickListener(this);
        findViewById(R.id.btn_second).setOnClickListener(this);
        init();
    }

    private void init(){
        mTTS = new TextToSpeech(this.getApplicationContext(),this);
        mXKAudioPolicyManager = XKAudioPolicyManager.getInstance(this.getApplication());
        mParams = new HashMap();
        mParams.put(TextToSpeech.Engine.KEY_PARAM_STREAM, "3"); //設置播放類型(音頻流類型)
    }

@Override
public void onInit(int status) {
        if (status == TextToSpeech.SUCCESS) {
int result = mTTS.setLanguage(Locale.ENGLISH);
if (result == TextToSpeech.LANG_MISSING_DATA || result == TextToSpeech.LANG_NOT_SUPPORTED) {
    Toast.makeText(this, "數據丟失或不支持", Toast.LENGTH_SHORT).show();
}
        }
    }

    @Override
    public void onClick(View v) {
        int id = v.getId();
        switch (id){
        case R.id.btn_tts1:
            TtsPlay1();
            break;
        case R.id.btn_tts2:
            TtsPlay2();
            break;
        case R.id.btn_second:
            TtsSecond();
            break;
        case R.id.btn_cycle:
            TtsCycle();
            break;
        default:
            break;
        }
    }

    private void TtsPlay1(){
        if (mTTS != null &amp;&amp; !mTTS.isSpeaking() &amp;&amp; mXKAudioPolicyManager.requestAudioSource()) {
//mTTS.setOnUtteranceProgressListener(new ttsPlayOne());
String text1 = mTestEt1.getText().toString();
Log.d(TAG, "TtsPlay1-----------播放文本內容:" + text1);
//朗讀,注意這里三個參數的added in API level 4   四個參數的added in API level 21
mTTS.speak(text1, TextToSpeech.QUEUE_FLUSH, mParams);
        }
    }

    private void TtsPlay2(){
        if (mTTS != null &amp;&amp; !mTTS.isSpeaking() &amp;&amp; mXKAudioPolicyManager.requestAudioSource()) {
//mTTS.setOnUtteranceProgressListener(new ttsPlaySecond());
String text2 = mTestEt2.getText().toString();
Log.d(TAG, "TtsPlay2-----------播放文本內容:" + text2);
// 設置音調,值越大聲音越尖(女生),值越小則變成男聲,1.0是常規
mTTS.setPitch(0.8f);
//設定語速 ,默認1.0正常語速
mTTS.setSpeechRate(1f);
//朗讀,注意這里三個參數的added in API level 4   四個參數的added in API level 21
mTTS.speak(text2, TextToSpeech.QUEUE_FLUSH, mParams);
        }
    }

    private void TtsSecond(){
        Intent intent = new Intent(TtsMainActivity.this,TtsSecondAcitivity.class);
        startActivity(intent);
    }

    private void TtsCycle(){
        long millis1 = System.currentTimeMillis();

        for (int i = 0; i < THREADNUM; i++) {
        Thread tempThread = new Thread(new MyRunnable(i, THREADNUM));
        tempThread.setName("線程" + i);
        tempThread.start();
        }

        long millis2 = System.currentTimeMillis();
        Log.d(TAG, "循環測試發音耗費時間:" + (millis2 - millis1));
    }

    @Override
    protected void onStart() {
        super.onStart();
    }

    @Override
    protected void onStop() {
        super.onStop();
    }

    @Override
    protected void onDestroy() {
        super.onDestroy();
        shutDown();
    }

    private void shutDown(){
        if(mTTS != null){
        mTTS.stop();
        mTTS.shutdown();
        }
        if(mXKAudioPolicyManager != null){
        mXKAudioPolicyManager.releaseAudioSource();
        }
    }

    /**
     * 自定義線程可執行處理
     * */
    class MyRunnable implements Runnable {
        private int i; // 第幾個線程
        private int threadNum; // 總共創建了幾個線程

        public MyRunnable(int i, int threadNum) {
        this.i = i;
        this.threadNum = threadNum;
        }
        @Override
        public void run() {
        runOnUiThread(new Runnable() {
@Override
public void run() {
        Log.d(TAG, "在主線程中執行index:" + i + ",線程總數:" + threadNum);
        if(i % 2 == 0){
        Log.d(TAG, "TtsPlay1 index:" + i);
        TtsPlay1();
        }
        else{
        Log.d(TAG, "TtsPlay2 index:" + i);
        TtsPlay2();
        }
        try {
        Thread.sleep(10000);
        } catch (InterruptedException e) {
        e.printStackTrace();
        }
    }
});
}
}
    public class ttsPlayOne extends UtteranceProgressListener{

        @Override
        public void onStart(String utteranceId) {
        Log.d(TAG, "ttsPlayOne-----------onStart");
        }

        @Override
        public void onDone(String utteranceId) {
        Log.d(TAG, "ttsPlayOne-----------onDone");
        }

        @Override
        public void onError(String utteranceId) {
        Log.d(TAG, "ttsPlayOne-----------onError");
        }
    }

    public class ttsPlaySecond extends  UtteranceProgressListener{

        @Override
        public void onStart(String utteranceId) {
        Log.d(TAG, "ttsPlaySecond-----------onStart");
        }

        @Override
        public void onDone(String utteranceId) {
        Log.d(TAG, "ttsPlaySecond-----------onDone");
        }

        @Override
        public void onError(String utteranceId) {
        Log.d(TAG, "ttsPlaySecond-----------onError");
        }
    }
}

加上權限

<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE"></uses-permission>
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE"></uses-permission>

TTS 最佳實踐

由于目前我在公司負責開發的產品是屬于語音助手類型,自然這類 TTS 發聲的問題和坑日常見的比較多。常見的有如下幾種類型:

  • 系統自帶的 TTS 功能是不支持中文的,想要支持中文的話,需要借助第三方引擎,比如常見的科大訊飛、百度等。

  • 如果換成支持中文引擎的話,一旦輸入的文本中有夾雜著英文,那么有時候第三方TTS引擎有時候就很不友好,有時候會把英文單詞每個字母讀出來,英文甚至是發音不了,這里就需要注意下引擎的測試。

  • 在設置 TTS 參數的時候,需要注意語速、音高、音調的上限值,有時候參數可能是0-100的范圍,有時候有些參數是在0-10的范圍,特別需要根據不同引擎參數的值類型去設定。

使用趨勢

隨著物聯網的到來,IoT設備增多,那么對于類似語音助手相關應用也會增多,因為語音是一個很好的入口,現在逐步從顯示到去顯示的過程,很多智能設備有些是不需要屏幕的,只需要能識別語音和播放聲音。因此,隨著這類應用的增長,對于TTS 相關的API接口調用頻率肯定也是加大,相信谷歌在這方面也會逐步在完善。

內部設計維度

從外部使用角度入手,基本是熟悉API接口和具體項目中應用碰到的問題,然后不斷總結出來比較優化的實踐方式。了解完外部角度切入,那么我們需要里面內部設計是怎么一回事,畢竟作為一個開發者,知道具體實現原理是一個基本功。

解決目標

Android TTS 目標就是解決文本轉化為語音播報的過程。那它到底是怎么實現的呢,我們從TextToSpeech類的構造函數開始分析。

這里我們用Android 6.0版本源碼分析為主,主要涉及的相關類和接口文件,在源碼中的位置如下:

frameworkbasecorejavaandroidspeechttsTextToSpeech.java

frameworkbase/corejava/androidspeechttsTextToSpeechService.java
externalsvoxpicosrccomsvoxpicoPicoService.java
externalsvoxpicocompatsrccomandroidttscompatCompatTtsService.java
externalsvoxpicocompatsrccomandroidttscompatSynthProxy.java
externalsvoxpicocompatjnicom_android_tts_compat_SynthProxy.cpp
externalsvoxpicottscom_svox_picottsengine.cpp

實現原理

初始化角度:先看TextToSpeech類,在使用時,一般TextToSpeech類要進行初始化,它的構造函數有三個,最后真正調用的構造函數代碼如下:

 /**
     * Used by the framework to instantiate TextToSpeech objects with a supplied
     * package name, instead of using {@link android.content.Context#getPackageName()}
     *
     * @hide
     */
public TextToSpeech(Context context, OnInitListener listener, String engine,
String packageName, boolean useFallback) {
        mContext = context;
        mInitListener = listener;
        mRequestedEngine = engine;
        mUseFallback = useFallback;

        mEarcons = new HashMap<String, Uri>();
        mUtterances = new HashMap<CharSequence, Uri>();
        mUtteranceProgressListener = null;

        mEnginesHelper = new TtsEngines(mContext);
        initTts();
    }

從構造函數可以看到,調用到initTts操作,我們看下initTts方法里是什么東東,代碼如下:

private int initTts() {
        // Step 1: Try connecting to the engine that was requested.
        if (mRequestedEngine != null) {
if (mEnginesHelper.isEngineInstalled(mRequestedEngine)) {
    if (connectToEngine(mRequestedEngine)) {
        mCurrentEngine = mRequestedEngine;
        return SUCCESS;
    } else if (!mUseFallback) {
        mCurrentEngine = null;
        dispatchOnInit(ERROR);
        return ERROR;
    }
} else if (!mUseFallback) {
    Log.i(TAG, "Requested engine not installed: " + mRequestedEngine);
    mCurrentEngine = null;
    dispatchOnInit(ERROR);
    return ERROR;
}
        }

        // Step 2: Try connecting to the user's default engine.
        final String defaultEngine = getDefaultEngine();
        if (defaultEngine != null &amp;&amp; !defaultEngine.equals(mRequestedEngine)) {
if (connectToEngine(defaultEngine)) {
    mCurrentEngine = defaultEngine;
    return SUCCESS;
}
        }

        // Step 3: Try connecting to the highest ranked engine in the
        // system.
        final String highestRanked = mEnginesHelper.getHighestRankedEngineName();
        if (highestRanked != null &amp;&amp; !highestRanked.equals(mRequestedEngine) &amp;&amp;
    !highestRanked.equals(defaultEngine)) {
if (connectToEngine(highestRanked)) {
    mCurrentEngine = highestRanked;
    return SUCCESS;
}
        }

        // NOTE: The API currently does not allow the caller to query whether
        // they are actually connected to any engine. This might fail for various
        // reasons like if the user disables all her TTS engines.

        mCurrentEngine = null;
        dispatchOnInit(ERROR);
        return ERROR;
    }

這里比較有意思了,第一步先去連接用戶請求的TTS引擎服務(這里可以讓我們自定義TTS引擎,可以替換系統默認的引擎),如果沒找到連接用戶的TTS引擎,那么就去連接默認引擎,最后是連接高性能引擎,從代碼可以看出高性能引擎優先級最高,默認引擎其次,connectToEngine方法代碼如下:

private boolean connectToEngine(String engine) {
        Connection connection = new Connection();
        Intent intent = new Intent(Engine.INTENT_ACTION_TTS_SERVICE);
        intent.setPackage(engine);
        boolean bound = mContext.bindService(intent, connection, Context.BIND_AUTO_CREATE);
        if (!bound) {
Log.e(TAG, "Failed to bind to " + engine);
return false;
        } else {
Log.i(TAG, "Sucessfully bound to " + engine);
mConnectingServiceConnection = connection;
return true;
        }
    }

這里的Engine.INTENT_ACTION_TTS_SERVICE的值為"android.intent.action.TTS_SERVICE";其連接到的服務為action,為"android.intent.action.TTS_SERVICE"的服務,在externalsvoxpico目錄中的AndroidManifest.xml文件可以發現:

<service android:name=".PicoService"
         android:label="@string/app_name">
         <intent-filter>
              <action android:name="android.intent.action.TTS_SERVICE" />
              <category android:name="android.intent.category.DEFAULT" />
         </intent-filter>
         <meta-data android:name="android.speech.tts" android:resource="@xml/tts_engine" />
</service>

系統自帶的默認連接的服務叫做PicoService,其具體代碼如下:其繼承于CompatTtsService。

public class PicoService extends CompatTtsService {

    private static final String TAG = "PicoService";
    
    @Override
    protected String getSoFilename() {
        return "libttspico.so";
    }

}

我們再來看看CompatTtsService這個類,這個類為抽象類,它的父類為TextToSpeechService,其有一個成員SynthProxy類,該類負責調用TTS的C++層代碼。如圖:

圖片描述

我們來看看CompatTtsService的onCreate()方法,該方法中主要對SynthProxy進行了初始化:

@Override
public void onCreate() {
        if (DBG) Log.d(TAG, "onCreate()");

        String soFilename = getSoFilename();

        if (mNativeSynth != null) {
            mNativeSynth.stopSync();
            mNativeSynth.shutdown();
            mNativeSynth = null;
        }

        // Load the engineConfig from the plugin if it has any special configuration
        // to be loaded. By convention, if an engine wants the TTS framework to pass
        // in any configuration, it must put it into its content provider which has the URI:
        // content://<packageName>.providers.SettingsProvider
        // That content provider must provide a Cursor which returns the String that
        // is to be passed back to the native .so file for the plugin when getString(0) is
        // called on it.
        // Note that the TTS framework does not care what this String data is: it is something
        // that comes from the engine plugin and is consumed only by the engine plugin itself.
        String engineConfig = "";
        Cursor c = getContentResolver().query(Uri.parse("content://" + getPackageName()
    + ".providers.SettingsProvider"), null, null, null, null);
        if (c != null){
        c.moveToFirst();
        engineConfig = c.getString(0);
        c.close();
        }
        mNativeSynth = new SynthProxy(soFilename, engineConfig);

        // mNativeSynth is used by TextToSpeechService#onCreate so it must be set prior
        // to that call.
        // getContentResolver() is also moved prior to super.onCreate(), and it works
        // because the super method don't sets a field or value that affects getContentResolver();
        // (including the content resolver itself).
        super.onCreate();
    }

緊接著看看SynthProxy的構造函數都干了什么,我也不知道干了什么,但是里面有個靜態代碼塊,其加載了ttscompat動態庫,所以它肯定只是一個代理,實際功能由C++本地方法實現

/**
  * Constructor; pass the location of the native TTS .so to use.
  */
public SynthProxy(String nativeSoLib, String engineConfig) {
        boolean applyFilter = shouldApplyAudioFilter(nativeSoLib);
        Log.v(TAG, "About to load "+ nativeSoLib + ", applyFilter=" + applyFilter);
        mJniData = native_setup(nativeSoLib, engineConfig);
        if (mJniData == 0) {
        throw new RuntimeException("Failed to load " + nativeSoLib);
        }
        native_setLowShelf(applyFilter, PICO_FILTER_GAIN, PICO_FILTER_LOWSHELF_ATTENUATION,
    PICO_FILTER_TRANSITION_FREQ, PICO_FILTER_SHELF_SLOPE);
    }

我們可以看到,在構造函數中,調用了native_setup方法來初始化引擎,其實現在C++層(com_android_tts_compat_SynthProxy.cpp)。

圖片描述

我們可以看到ngine->funcs->init(engine, __ttsSynthDoneCB, engConfigString);這句代碼比較關鍵,這個init方法上面在com_svox_picottsengine.cpp中,如下:

/* Google Engine API function implementations */

/** init
 *  Allocates Pico memory block and initializes the Pico system.
 *  synthDoneCBPtr - Pointer to callback function which will receive generated samples
 *  config - the engine configuration parameters, here only contains the non-system path
 *      for the lingware location
 *  return tts_result
*/
tts_result TtsEngine::init( synthDoneCB_t synthDoneCBPtr, const char *config )
{
    if (synthDoneCBPtr == NULL) {
        ALOGE("Callback pointer is NULL");
        return TTS_FAILURE;
    }

    picoMemArea = malloc( PICO_MEM_SIZE );
    if (!picoMemArea) {
        ALOGE("Failed to allocate memory for Pico system");
        return TTS_FAILURE;
    }

    pico_Status ret = pico_initialize( picoMemArea, PICO_MEM_SIZE, &amp;picoSystem );
    if (PICO_OK != ret) {
        ALOGE("Failed to initialize Pico system");
        free( picoMemArea );
        picoMemArea = NULL;
        return TTS_FAILURE;
    }

    picoSynthDoneCBPtr = synthDoneCBPtr;

    picoCurrentLangIndex = -1;

    // was the initialization given an alternative path for the lingware location?
    if ((config != NULL) &amp;&amp; (strlen(config) > 0)) {
        pico_alt_lingware_path = (char*)malloc(strlen(config));
        strcpy((char*)pico_alt_lingware_path, config);
        ALOGV("Alternative lingware path %s", pico_alt_lingware_path);
    } else {
        pico_alt_lingware_path = (char*)malloc(strlen(PICO_LINGWARE_PATH) + 1);
        strcpy((char*)pico_alt_lingware_path, PICO_LINGWARE_PATH);
        ALOGV("Using predefined lingware path %s", pico_alt_lingware_path);
    }

    return TTS_SUCCESS;
}

到這里,TTS引擎的初始化就完成了。

再看下TTS調用的角度,一般TTS調用的類是TextToSpeech中的speak()方法,我們來看看其執行流程:

public int speak(final CharSequence text,
        final int queueMode,
        final Bundle params,
        final String utteranceId) {
        return runAction(new Action<Integer>() {
@Override
public Integer run(ITextToSpeechService service) throws RemoteException {
    Uri utteranceUri = mUtterances.get(text);
    if (utteranceUri != null) {
        return service.playAudio(getCallerIdentity(), utteranceUri, queueMode,
    getParams(params), utteranceId);
    } else {
        return service.speak(getCallerIdentity(), text, queueMode, getParams(params),
    utteranceId);
           }
       }
    }, ERROR, "speak");
}

主要是看runAction()方法:

private <R> R runAction(Action<R> action, R errorResult, String method,
boolean reconnect, boolean onlyEstablishedConnection) {
        synchronized (mStartLock) {
if (mServiceConnection == null) {
    Log.w(TAG, method + " failed: not bound to TTS engine");
    return errorResult;
}
return mServiceConnection.runAction(action, errorResult, method, reconnect,
        onlyEstablishedConnection);
        }
}

主要看下mServiceConnection類的runAction方法,

public <R> R runAction(Action<R> action, R errorResult, String method,
    boolean reconnect, boolean onlyEstablishedConnection) {
    synchronized (mStartLock) {
    try {
        if (mService == null) {
        Log.w(TAG, method + " failed: not connected to TTS engine");
        return errorResult;
        }
        if (onlyEstablishedConnection &amp;&amp; !isEstablished()) {
        Log.w(TAG, method + " failed: TTS engine connection not fully set up");
        return errorResult;
        }
        return action.run(mService);
    } catch (RemoteException ex) {
        Log.e(TAG, method + " failed", ex);
        if (reconnect) {
        disconnect();
        initTts();
        }
        return errorResult;
    }
}
}

可以發現最后會回調action.run(mService)方法。接著執行service.playAudio(),這里的service為PicoService,其繼承于抽象類CompatTtsService,而CompatTtsService繼承于抽象類TextToSpeechService。

所以會執行TextToSpeechService中的playAudio(),該方法位于TextToSpeechService中mBinder中。該方法如下:

  @Override
        public int playAudio(IBinder caller, Uri audioUri, int queueMode, Bundle params,
    String utteranceId) {
if (!checkNonNull(caller, audioUri, params)) {
    return TextToSpeech.ERROR;
}

SpeechItem item = new AudioSpeechItemV1(caller,
        Binder.getCallingUid(), Binder.getCallingPid(), params, utteranceId, audioUri);
return mSynthHandler.enqueueSpeechItem(queueMode, item);
}

接著執行mSynthHandler.enqueueSpeechItem(queueMode, item),其代碼如下:

/**
 * Adds a speech item to the queue.
 *
 * Called on a service binder thread.
*/
public int enqueueSpeechItem(int queueMode, final SpeechItem speechItem) {
UtteranceProgressDispatcher utterenceProgress = null;
if (speechItem instanceof UtteranceProgressDispatcher) {
    utterenceProgress = (UtteranceProgressDispatcher) speechItem;
}

if (!speechItem.isValid()) {
    if (utterenceProgress != null) {
        utterenceProgress.dispatchOnError(
    TextToSpeech.ERROR_INVALID_REQUEST);
    }
    return TextToSpeech.ERROR;
}

if (queueMode == TextToSpeech.QUEUE_FLUSH) {
    stopForApp(speechItem.getCallerIdentity());
} else if (queueMode == TextToSpeech.QUEUE_DESTROY) {
    stopAll();
}
Runnable runnable = new Runnable() {
    @Override
    public void run() {
        if (isFlushed(speechItem)) {
        speechItem.stop();
        } else {
        setCurrentSpeechItem(speechItem);
        speechItem.play();
        setCurrentSpeechItem(null);
        }
    }
};
Message msg = Message.obtain(this, runnable);

// The obj is used to remove all callbacks from the given app in
// stopForApp(String).
//
// Note that this string is interned, so the == comparison works.
msg.obj = speechItem.getCallerIdentity();

if (sendMessage(msg)) {
    return TextToSpeech.SUCCESS;
} else {
    Log.w(TAG, "SynthThread has quit");
    if (utterenceProgress != null) {
        utterenceProgress.dispatchOnError(TextToSpeech.ERROR_SERVICE);
    }
    return TextToSpeech.ERROR;
}
}

主要是看 speechItem.play()方法,代碼如下:

/**
 * Plays the speech item. Blocks until playback is finished.
 * Must not be called more than once.
 *
 * Only called on the synthesis thread.
 */
public void play() {
synchronized (this) {
    if (mStarted) {
        throw new IllegalStateException("play() called twice");
    }
    mStarted = true;
}
playImpl();
}

protected abstract void playImpl();

可以看到主要播放實現方法為playImpl(),那么在TextToSpeechService中的playAudio()中代碼可以知道這里的speechitem為SynthesisSpeechItemV1。

因此在play中執行的playimpl()方法為SynthesisSpeechItemV1類中的playimpl()方法,其代碼如下:

@Override
protected void playImpl() {
    AbstractSynthesisCallback synthesisCallback;
    mEventLogger.onRequestProcessingStart();
    synchronized (this) {
    // stop() might have been called before we enter this
    // synchronized block.
    if (isStopped()) {
        return;
    }
    mSynthesisCallback = createSynthesisCallback();
    synthesisCallback = mSynthesisCallback;
}

TextToSpeechService.this.onSynthesizeText(mSynthesisRequest, synthesisCallback);

// Fix for case where client called .start() &amp; .error(), but did not called .done()
if (synthesisCallback.hasStarted() &amp;&amp; !synthesisCallback.hasFinished()) {
    synthesisCallback.done();
}
}

在playImpl方法中會執行onSynthesizeText方法,這是個抽象方法,記住其傳遞了一個synthesisCallback,后面會講到。哪該方法具體實現是在哪里呢,沒錯,就是在TextToSpeechService的子類CompatTtsService中。來看看它怎么實現的:

@Override
protected void onSynthesizeText(SynthesisRequest request, SynthesisCallback callback) {
        if (mNativeSynth == null) {
        callback.error();
        return;
        }

        // Set language
        String lang = request.getLanguage();
        String country = request.getCountry();
        String variant = request.getVariant();
        if (mNativeSynth.setLanguage(lang, country, variant) != TextToSpeech.SUCCESS) {
        Log.e(TAG, "setLanguage(" + lang + "," + country + "," + variant + ") failed");
        callback.error();
        return;
        }

        // Set speech rate
        int speechRate = request.getSpeechRate();
        if (mNativeSynth.setSpeechRate(speechRate) != TextToSpeech.SUCCESS) {
        Log.e(TAG, "setSpeechRate(" + speechRate + ") failed");
        callback.error();
        return;
        }

        // Set speech
        int pitch = request.getPitch();
        if (mNativeSynth.setPitch(pitch) != TextToSpeech.SUCCESS) {
        Log.e(TAG, "setPitch(" + pitch + ") failed");
        callback.error();
        return;
        }

        // Synthesize
        if (mNativeSynth.speak(request, callback) != TextToSpeech.SUCCESS) {
        callback.error();
        return;
        }
}

最終又回到系統提供的pico引擎中,在com_android_tts_compat_SynthProxy.cpp這個文件中,可以看到使用speak方法,代碼如下:

static jint
com_android_tts_compat_SynthProxy_speak(JNIEnv *env, jobject thiz, jlong jniData,
        jstring textJavaString, jobject request)
{
    SynthProxyJniStorage* pSynthData = getSynthData(jniData);
    if (pSynthData == NULL) {
        return ANDROID_TTS_FAILURE;
    }

    initializeFilter();

    Mutex::Autolock l(engineMutex);

    android_tts_engine_t *engine = pSynthData->mEngine;
    if (!engine) {
        return ANDROID_TTS_FAILURE;
    }

    SynthRequestData *pRequestData = new SynthRequestData;
    pRequestData->jniStorage = pSynthData;
    pRequestData->env = env;
    pRequestData->request = env->NewGlobalRef(request);
    pRequestData->startCalled = false;

    const char *textNativeString = env->GetStringUTFChars(textJavaString, 0);
    memset(pSynthData->mBuffer, 0, pSynthData->mBufferSize);

    int result = engine->funcs->synthesizeText(engine, textNativeString,
pSynthData->mBuffer, pSynthData->mBufferSize, static_cast<void *>(pRequestData));
    env->ReleaseStringUTFChars(textJavaString, textNativeString);

    return (jint) result;
}

至此,TTS的調用就結束了。

TTS 優劣勢

從實現原理我們可以看到Android系統原生自帶了一個TTS引擎。那么在此,我們就也可以去自定義TTS引擎,只有繼承ITextToSpeechService接口即可,實現里面的方法。這就為后續自定義TTS引擎埋下伏筆了,因為系統默認的TTS引擎是不支持中文,那么市場上比較好的TTS相關產品,一般是集成訊飛或者Nuance等第三方供應商。

因此,我們也可以看到TTS優劣勢。

優勢:接口定義完善,有著完整的API接口方法,同時支持擴展,可根據自身開發業務需求重新打造TTS引擎,并且與原生接口做兼容,可適配。

劣勢:原生系統TTS引擎支持的多國語言有限,目前不支持多實例和多通道。

演進趨勢

從目前來看,隨著語音成為更多Iot設備的入口,那么在語音TTS合成播報方面技術會越來越成熟,特別是對于Android 系統原生相關的接口也會越來越強大。因此,對于TTS后續的發展,應該是冉冉上升。

小結

總的來說,對于一個知識點,前期通過使用文檔介紹,到具體實踐,然后在實踐中優化進行總結,選擇一個最佳的實踐方案。當然不能滿足“知其然而不知其所以然”,所以得去看背后的實現原理是什么。這個知識點優劣勢是什么,在哪些場景比較適用,哪些場景不適用,接下來會演進趨勢怎么樣。通過這么一整套流程,那么對于一個知識點來說,可以算是了然于胸了。


我來說兩句
您需要登錄后才可以評論 登錄 | 立即注冊
facelist
所有評論(0)
領先的中文移動開發者社區
18620764416
7*24全天服務
意見反饋:[email protected]

掃一掃關注我們

Powered by Discuz! X3.2© 2001-2019 Comsenz Inc.( 粵ICP備15117877號 )

黑龙江彩票网