Alexaの組み込み3:Alexaサーバとのやり取り

January 06, 2018

前回組み込んだAlexaPiを使って、ピカチュウのAlexa Skillを使ってみました。やり方は簡単で、前回認証に使用したアマゾンのアカウント(国内)でスキルを有効化するだけです。ウェブからだとなぜか英語のスキルしか出てこなかったので、AlexaのAndroidアプリを使用して有効化しました。

python main.pyを実行してから、「ピカチュウよんで」といってみたのですが、ピカチュウのスキルが発動しないようでした。他のまともに動いていそうなAlexaクライアント(alexa-avs-sample-app)をパソコンで実際に動かしてみてログを分析して見たところ、イベント(再生を開始したとか、終了したとか)を頻繁に投げているようで、また、認識イベントのcontextにも再生状態等をきちんとセットしていることがわかりました。

18:05:01.395 [AWT-EventQueue-0] INFO  com.amazon.alexa.avs.http.AVSClient - Request metadata: 
{
  "event" : {
    "header" : {
      "namespace" : "SpeechRecognizer",
      "name" : "Recognize",
      "messageId" : "09c78cfe-6a19-4169-838f-03ac7d9bb022",
      "dialogRequestId" : "256021d1-a3b0-4028-ba2e-077c6859922b"
    },
    "payload" : {
      "profile" : "NEAR_FIELD",
      "format" : "AUDIO_L16_RATE_16000_CHANNELS_1"
    }
  },
  "context" : [ {
    "header" : {
      "namespace" : "AudioPlayer",
      "name" : "PlaybackState"
    },
    "payload" : {
      "token" : "",
      "offsetInMilliseconds" : 0,
      "playerActivity" : "IDLE"
    }
  }, {
    "header" : {
      "namespace" : "SpeechSynthesizer",
      "name" : "SpeechState"
    },
    "payload" : {
      "token" : "amzn1.as-ct.v1.ThirdPartySdkSpeechlet#ACRI#ValidatedSpeakDirective_amzn1.ask.skill.c089d482-7796-4e9a-a952-43dadb5260f1_f9a79890-ea1c-4e78-89c1-d719f1837ae8",
      "offsetInMilliseconds" : 0,
      "playerActivity" : "FINISHED"
    }
  }, {
    "header" : {
      "namespace" : "Alerts",
      "name" : "AlertsState"
    },
    "payload" : {
      "allAlerts" : [ ],
      "activeAlerts" : [ ]
    }
  }, {
    "header" : {
      "namespace" : "Speaker",
      "name" : "VolumeState"
    },
    "payload" : {
      "volume" : 50,
      "muted" : false
    }
  } ]
}
18:05:01.396 [AWT-EventQueue-0] DEBUG com.amazon.alexa.avs.SpeechRequestAudioPlayerPauseController - Speech request started
18:05:01.396 [RequestThread] INFO  com.amazon.alexa.avs.http.CachingContentProvider - Create new CachingIterator
18:05:01.396 [AWT-EventQueue-0] DEBUG com.amazon.alexa.avs.AVSAudioPlayer - Interrupting all Alexa output
18:05:06.132 [DownchannelRequestThread] INFO  com.amazon.alexa.avs.http.MessageParser - Response metadata: 
{
  "directive" : {
    "header" : {
      "namespace" : "SpeechRecognizer",
      "name" : "StopCapture",
      "messageId" : "3c6756c8-8fee-4ed3-8e8a-636d9d0cc566"
    },
    "payload" : { }
  }
}
18:05:06.133 [IndependentDirectiveThread] INFO  com.amazon.alexa.avs.AVSController - Handling directive: SpeechRecognizer.StopCapture
18:05:06.134 [IndependentDirectiveThread] DEBUG com.amazon.alexa.avs.SpeechRequestAudioPlayerPauseController - Finished listening to user speech
18:05:07.118 [RequestThread] INFO  com.amazon.alexa.avs.http.AVSClient - Response code: 200
18:05:07.118 [RequestThread] INFO  com.amazon.alexa.avs.http.AVSClient - Response headers: access-control-allow-origin: *
x-amzn-requestid: 0627fffffea84a7c-00007427-0002f060-465497e4af746066-1fa3026a-151
content-type: multipart/related;boundary=------abcde123;start=metadata.1514396349929;type="application/json"


18:05:07.286 [RequestThread] INFO  com.amazon.alexa.avs.http.MessageParser - Response metadata: 
{
  "directive" : {
    "header" : {
      "namespace" : "SpeechSynthesizer",
      "name" : "Speak",
      "messageId" : "3d82ee1d-57d1-4607-bce1-87a3dc870bc0",
      "dialogRequestId" : "256021d1-a3b0-4028-ba2e-077c6859922b"
    },
    "payload" : {
      "url" : "cid:ValidatedSpeakDirective_amzn1.ask.skill.dada8578-05b4-4305-8137-fe5b0aa334be_25909d51-13cd-4f10-81ea-85a13beacf80_179193997",
      "format" : "AUDIO_MPEG",
      "token" : "amzn1.as-ct.v1.ThirdPartySdkSpeechlet#ACRI#ValidatedSpeakDirective_amzn1.ask.skill.dada8578-05b4-4305-8137-fe5b0aa334be_25909d51-13cd-4f10-81ea-85a13beacf80"
    }
  }
}
18:05:07.287 [RequestThread] INFO  com.amazon.alexa.avs.http.MessageParser - Response metadata: 
{
  "directive" : {
    "header" : {
      "namespace" : "SpeechRecognizer",
      "name" : "ExpectSpeech",
      "messageId" : "450586f1-c856-427c-9af6-1bc08831257f",
      "dialogRequestId" : "256021d1-a3b0-4028-ba2e-077c6859922b"
    },
    "payload" : {
      "timeoutInMilliseconds" : 8000
    }
  }
}
18:05:07.646 [RequestThread] DEBUG com.amazon.alexa.avs.AVSController - Speech processing finished. Dependent queue size: 2
18:05:07.647 [DependentDirectiveThread] INFO  com.amazon.alexa.avs.AVSController - Handling directive: SpeechSynthesizer.Speak
18:05:07.647 [RequestThread] DEBUG com.amazon.alexa.avs.SpeechRequestAudioPlayerPauseController - Finished processing speech request
18:05:07.647 [DependentDirectiveThread] DEBUG com.amazon.alexa.avs.SpeechRequestAudioPlayerPauseController - Dispatching directive
18:05:07.647 [DependentDirectiveThread] DEBUG com.amazon.alexa.avs.SpeechRequestAudioPlayerPauseController - Alexa speech started
18:05:07.647 [DependentDirectiveThread] INFO  com.amazon.alexa.avs.http.AVSClient - Request metadata: 
{
  "event" : {
    "header" : {
      "namespace" : "SpeechSynthesizer",
      "name" : "SpeechStarted",
      "messageId" : "3a6aca78-61c5-431e-94ba-bd2f9cb70f61"
    },
    "payload" : {
      "token" : "amzn1.as-ct.v1.ThirdPartySdkSpeechlet#ACRI#ValidatedSpeakDirective_amzn1.ask.skill.dada8578-05b4-4305-8137-fe5b0aa334be_25909d51-13cd-4f10-81ea-85a13beacf80"
    }
  }
}
18:05:07.650 [Thread-133] DEBUG com.amazon.alexa.avs.SpeechRequestAudioPlayerPauseController - Started resume audio thread
18:05:07.934 [RequestThread] INFO  com.amazon.alexa.avs.http.AVSClient - Response code: 204
18:05:07.935 [RequestThread] INFO  com.amazon.alexa.avs.http.AVSClient - Response headers: access-control-allow-origin: *
x-amzn-requestid: 0627fffffea84a7c-00007427-0002f060-465497e4af746066-1fa3026a-153


18:05:07.935 [RequestThread] INFO  com.amazon.alexa.avs.http.AVSClient - This response successfully had no content.
18:05:22.475 [Thread-134] INFO  com.amazon.alexa.avs.http.AVSClient - Request metadata: 
{
  "event" : {
    "header" : {
      "namespace" : "SpeechSynthesizer",
      "name" : "SpeechFinished",
      "messageId" : "cb06db6c-e157-4800-b2d4-dcec3f5f260e"
    },
    "payload" : {
      "token" : "amzn1.as-ct.v1.ThirdPartySdkSpeechlet#ACRI#ValidatedSpeakDirective_amzn1.ask.skill.dada8578-05b4-4305-8137-fe5b0aa334be_25909d51-13cd-4f10-81ea-85a13beacf80"
    }
  }
}
18:05:22.475 [Thread-134] DEBUG com.amazon.alexa.avs.SpeechRequestAudioPlayerPauseController - Alexa speech finished
18:05:22.475 [Thread-133] DEBUG com.amazon.alexa.avs.SpeechRequestAudioPlayerPauseController - Resuming all Alexa output
18:05:22.475 [Thread-133] DEBUG com.amazon.alexa.avs.AVSAudioPlayer - Resuming all Alexa output
18:05:22.476 [DependentDirectiveThread] INFO  com.amazon.alexa.avs.AVSController - Handling directive: SpeechRecognizer.ExpectSpeech
18:05:22.478 [DependentDirectiveThread] DEBUG com.amazon.alexa.avs.SpeechRequestAudioPlayerPauseController - Dispatching directive
18:05:22.502 [Thread-136] INFO  com.amazon.alexa.avs.http.AVSClient - Request metadata: 
{
  "event" : {
    "header" : {
      "namespace" : "SpeechRecognizer",
      "name" : "Recognize",
      "messageId" : "b6a24468-ab6f-4c66-93c1-ed36eb72af92",
      "dialogRequestId" : "c147bb7e-ca3f-4b99-8247-b94ffa0ff22c"
    },
    "payload" : {
      "profile" : "NEAR_FIELD",
      "format" : "AUDIO_L16_RATE_16000_CHANNELS_1"
    }
  },
  "context" : [ {
    "header" : {
      "namespace" : "AudioPlayer",
      "name" : "PlaybackState"
    },
    "payload" : {
      "token" : "",
      "offsetInMilliseconds" : 0,
      "playerActivity" : "IDLE"
    }
  }, {
    "header" : {
      "namespace" : "SpeechSynthesizer",
      "name" : "SpeechState"
    },
    "payload" : {
      "token" : "amzn1.as-ct.v1.ThirdPartySdkSpeechlet#ACRI#ValidatedSpeakDirective_amzn1.ask.skill.dada8578-05b4-4305-8137-fe5b0aa334be_25909d51-13cd-4f10-81ea-85a13beacf80",
      "offsetInMilliseconds" : 0,
      "playerActivity" : "FINISHED"
    }
  }, {
    "header" : {
      "namespace" : "Alerts",
      "name" : "AlertsState"
    },
    "payload" : {
      "allAlerts" : [ ],
      "activeAlerts" : [ ]
    }
  }, {
    "header" : {
      "namespace" : "Speaker",
      "name" : "VolumeState"
    },
    "payload" : {
      "volume" : 50,
      "muted" : false
    }
  } ]
}
18:05:22.504 [Thread-136] DEBUG com.amazon.alexa.avs.SpeechRequestAudioPlayerPauseController - Speech request started
18:05:22.504 [Thread-136] DEBUG com.amazon.alexa.avs.AVSAudioPlayer - Interrupting all Alexa output
18:05:22.882 [RequestThread] INFO  com.amazon.alexa.avs.http.AVSClient - Response code: 204
18:05:22.883 [RequestThread] INFO  com.amazon.alexa.avs.http.AVSClient - Response headers: access-control-allow-origin: *
x-amzn-requestid: 0627fffffea84a7c-00007427-0002f060-465497e4af746066-1fa3026a-155


18:05:22.883 [RequestThread] INFO  com.amazon.alexa.avs.http.AVSClient - This response successfully had no content.
18:05:22.884 [RequestThread] INFO  com.amazon.alexa.avs.http.CachingContentProvider - Create new CachingIterator
18:05:28.459 [DownchannelRequestThread] INFO  com.amazon.alexa.avs.http.MessageParser - Response metadata: 
{
  "directive" : {
    "header" : {
      "namespace" : "SpeechRecognizer",
      "name" : "StopCapture",
      "messageId" : "2099875a-b6b5-4ad7-9e8b-15163729befa"
    },
    "payload" : { }
  }
}
18:05:28.460 [IndependentDirectiveThread] INFO  com.amazon.alexa.avs.AVSController - Handling directive: SpeechRecognizer.StopCapture
18:05:28.461 [IndependentDirectiveThread] DEBUG com.amazon.alexa.avs.SpeechRequestAudioPlayerPauseController - Finished listening to user speech
18:05:28.676 [RequestThread] INFO  com.amazon.alexa.avs.http.AVSClient - Response code: 200
18:05:28.676 [RequestThread] INFO  com.amazon.alexa.avs.http.AVSClient - Response headers: access-control-allow-origin: *
x-amzn-requestid: 0627fffffea84a7c-00007427-0002f060-465497e4af746066-1fa3026a-157
content-type: multipart/related;boundary=------abcde123;start=metadata.1514396750192;type="application/json"


18:05:28.679 [RequestThread] INFO  com.amazon.alexa.avs.http.MessageParser - Response metadata: 
{
  "directive" : {
    "header" : {
      "namespace" : "SpeechSynthesizer",
      "name" : "Speak",
      "messageId" : "6519ffb1-8b76-434d-b712-03b67f88a919",
      "dialogRequestId" : "c147bb7e-ca3f-4b99-8247-b94ffa0ff22c"
    },
    "payload" : {
      "url" : "cid:ValidatedSpeakDirective_amzn1.ask.skill.dada8578-05b4-4305-8137-fe5b0aa334be_92818407-7feb-4a18-bb6e-b6f695613406_2120356629",
      "format" : "AUDIO_MPEG",
      "token" : "amzn1.as-ct.v1.ThirdPartySdkSpeechlet#ACRI#ValidatedSpeakDirective_amzn1.ask.skill.dada8578-05b4-4305-8137-fe5b0aa334be_92818407-7feb-4a18-bb6e-b6f695613406"
    }
  }
}
18:05:28.680 [RequestThread] INFO  com.amazon.alexa.avs.http.MessageParser - Response metadata: 
{
  "directive" : {
    "header" : {
      "namespace" : "SpeechRecognizer",
      "name" : "ExpectSpeech",
      "messageId" : "e5e92151-473f-43a8-b436-1a5740f35bb8",
      "dialogRequestId" : "c147bb7e-ca3f-4b99-8247-b94ffa0ff22c"
    },
    "payload" : {
      "timeoutInMilliseconds" : 8000
    }
  }
}
18:05:28.681 [RequestThread] DEBUG com.amazon.alexa.avs.AVSController - Speech processing finished. Dependent queue size: 1
18:05:28.681 [DependentDirectiveThread] INFO  com.amazon.alexa.avs.AVSController - Handling directive: SpeechSynthesizer.Speak
18:05:28.681 [RequestThread] DEBUG com.amazon.alexa.avs.SpeechRequestAudioPlayerPauseController - Finished processing speech request
18:05:28.681 [DependentDirectiveThread] DEBUG com.amazon.alexa.avs.SpeechRequestAudioPlayerPauseController - Dispatching directive
18:05:28.681 [DependentDirectiveThread] DEBUG com.amazon.alexa.avs.SpeechRequestAudioPlayerPauseController - Alexa speech started
18:05:28.682 [DependentDirectiveThread] INFO  com.amazon.alexa.avs.http.AVSClient - Request metadata: 
{
  "event" : {
    "header" : {
      "namespace" : "SpeechSynthesizer",
      "name" : "SpeechStarted",
      "messageId" : "744fc8fb-ece5-4be3-9b65-968e9f3084c7"
    },
    "payload" : {
      "token" : "amzn1.as-ct.v1.ThirdPartySdkSpeechlet#ACRI#ValidatedSpeakDirective_amzn1.ask.skill.dada8578-05b4-4305-8137-fe5b0aa334be_92818407-7feb-4a18-bb6e-b6f695613406"
    }
  }
}
18:05:28.685 [Thread-140] DEBUG com.amazon.alexa.avs.SpeechRequestAudioPlayerPauseController - Started resume audio thread
18:05:28.937 [RequestThread] INFO  com.amazon.alexa.avs.http.AVSClient - Response code: 204
18:05:28.937 [RequestThread] INFO  com.amazon.alexa.avs.http.AVSClient - Response headers: access-control-allow-origin: *
x-amzn-requestid: 0627fffffea84a7c-00007427-0002f060-465497e4af746066-1fa3026a-159


18:05:28.937 [RequestThread] INFO  com.amazon.alexa.avs.http.AVSClient - This response successfully had no content.
18:05:29.776 [Thread-141] INFO  com.amazon.alexa.avs.http.AVSClient - Request metadata: 
{
  "event" : {
    "header" : {
      "namespace" : "SpeechSynthesizer",
      "name" : "SpeechFinished",
      "messageId" : "e81d10ce-884d-40ce-bcae-4ceb0ef6cf04"
    },
    "payload" : {
      "token" : "amzn1.as-ct.v1.ThirdPartySdkSpeechlet#ACRI#ValidatedSpeakDirective_amzn1.ask.skill.dada8578-05b4-4305-8137-fe5b0aa334be_92818407-7feb-4a18-bb6e-b6f695613406"
    }
  }
}
18:05:29.776 [Thread-141] DEBUG com.amazon.alexa.avs.SpeechRequestAudioPlayerPauseController - Alexa speech finished

上のログのJSONのうち、directiveはAlexaサーバからのレスポンス(クライアントに対する指示みたいなものを返す)で、eventはクライアントからAlexaサーバに送るリクエスト(取り込んだ音声とか設定、状態などを送る)のようです。AVS Documentationにこれらevent,directiveの情報は全部書いてありました。ざっと見た感じでは、最初のリクエストはSpeechRecognizerインターフェースのRecognizeイベントですが、contextで各コンポーネントの状態(Alert、AudioPlayer、SpeechSynthesizer、Speakerなど)をセットしています。そして、SpeechSynthesizerインターフェースのSpeakディレクティブにより音声の再生が指示され、それを逐次再生しては、SpeechSynthesizerインターフェースのSpeechFinishedイベントで再生が完了したことを通知しているようです。ちなみに、このSpeakディレクティブは複数返ることがあるようなのですが、前回のAlexaPiでは、1つしか再生していなかったようです。SpeechRecognizerインターフェースのExpectSpeechディレクティブは、おそらく対話が継続している状態にレスポンスとして返されるものと思われます。

これらを踏まえて、前回作成したソースに手を加えて、contextの設定や、音声の再生開始、終了のイベントを送るようにして見ました。後、前回は音声を録音した後、その音声をAlexaサーバに送っていましたが、時間がかかってしまうため、今回は録音しつつ送るように変更しています(チャンク形式によるストリーミング)。そのソースが下記です。

#! /usr/bin/env python
# -*- coding: utf-8 -*-
import os
import random
import time
import RPi.GPIO as GPIO
import alsaaudio
import wave
import random
from creds import *
import requests
from hyper.contrib import HTTP20Adapter
import json
import re
import threading
from memcache import Client
import Queue
import uuid
import ssl

# Settings
# button = 18 #GPIO Pin with button connected
# lights = [24, 25] # GPIO Pins with LED's conneted
device = "plughw:1,0"  # Name of your microphone/soundcard in arecord -L

# Setup
recorded = False
servers = ["127.0.0.1:11211"]
mc = Client(servers, debug=1)
path = os.path.realpath(__file__).rstrip(os.path.basename(__file__))
BASEURL = 'https://avs-alexa-fe.amazon.com'
RETRY_COUNT = 3
states = {
    'SpeechSynthesizer':{'token':'','state':'IDLE','offset':0}
}

# audio input
inp = None
aqueue = Queue.Queue()
recording_thread = None
timeout_thread = None

def internet_on():
    print "Checking Internet Connection"
    try:
        r = requests.get('https://api.amazon.com/auth/o2/token')
        print "Connection OK"
        return True
    except:
        print "Connection Failed"
        return False


def gettoken():
    token = mc.get("access_token")
    refresh = refresh_token
    if token:
        return token
    elif refresh:
        payload = {"client_id": Client_ID, "client_secret": Client_Secret,
                   "refresh_token": refresh, "grant_type": "refresh_token", }
        url = "https://api.amazon.com/auth/o2/token"
        r = requests.post(url, data=payload)
        resp = json.loads(r.text)
        mc.set("access_token", resp['access_token'], 3570)
        return resp['access_token']
    else:
        return False


def alexa_speech_recognizer_generate_top(boundary):
    chunk = '--%s\r\n' % boundary
    chunk += (
        'Content-Disposition: form-data; name="metadata"\r\n'
        'Content-Type: application/json; charset=UTF-8\r\n\r\n'
    )
    # make request
    data = {
        "context": get_context(),
        "event": {
            "header": {
                "namespace": "SpeechRecognizer",
                "name": "Recognize",
                "messageId": str(uuid.uuid4()),
                "dialogRequestId": str(uuid.uuid4())
            },
            "payload": {
                "profile": "CLOSE_TALK",
                "format": "AUDIO_L16_RATE_16000_CHANNELS_1"
            }
        }
    }
    chunk += json.dumps(data) + '\r\n'
    chunk += '--%s\r\n' % boundary
    chunk += (
        'Content-Disposition: form-data; name="audio"\r\n'
        'Content-Type: application/octet-stream\r\n\r\n'
    )
    return chunk

def get_context():
    cxt = []
    audioplayer_state = {
        "header": {
            "name": "PlaybackState",
            "namespace": "AudioPlayer",
        },
        "payload": {
            "token": "",
            "offsetInMilliseconds": "0",
            "playerActivity": "IDLE"
        }
    }
    speechsynthesizer_state = {
        "header" : {
            "namespace" : "SpeechSynthesizer",
            "name" : "SpeechState"
        },
        "payload" : {
            "token" : states['SpeechSynthesizer']['token'],
            "offsetInMilliseconds" : states['SpeechSynthesizer']['offset'],
            "playerActivity" : states['SpeechSynthesizer']['state']
        }
    }
    cxt.append(audioplayer_state)
    cxt.append(speechsynthesizer_state)
    return cxt

class AudioData():
    _remain = ""
    _boundary = ""
    _recording = False
    def __init__(self, boundary):
        self._boundary = boundary
        self._remain = alexa_speech_recognizer_generate_top(boundary)
        start_audio_stream()
        self._recording = True

    def read(self, size):
        while len(self._remain) < size and self._recording:
            audio = get_audio_data()
            if audio == "":
                stop_audio_stream()
                self._recording = False
                chunk = '--%s--\r\n' % self._boundary
                self._remain += chunk
                break
            self._remain += audio
        if len(self._remain) < size:
            ret = self._remain
            self._remain = ""
        else:
            ret = self._remain[:size]
            self._remain = self._remain[size:]
        return ret


def async_recording():
    global recorded
    global aqueue
    global inp
    # time.sleep(5)
    while recorded == True:
        l, data = inp.read()
        if l:
            aqueue.put(data)


def async_timeout(s):
    global recorded
    time.sleep(s)
    recorded = False


def start_audio_stream():
    global audio
    global recorded
    global inp
    global aqueue
    global recording_thread
    global timeout_thread
    global device
    if inp == None:
        inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE,
                            alsaaudio.PCM_NORMAL, device)
        inp.setchannels(1)
        inp.setrate(16000)
        inp.setformat(alsaaudio.PCM_FORMAT_S16_LE)
        inp.setperiodsize(160)
        aqueue.queue.clear()
        recorded = True

    # recording(asyncronous)
    recording_thread = threading.Thread(target=async_recording)
    recording_thread.start()
    timeout_thread = threading.Thread(target=async_timeout, args=(5,))
    timeout_thread.start()

def get_audio_data():
    global aqueue
    global recording_thread
    global timeout_thread

    try:
        data = aqueue.get(block=True, timeout=2)
        if data:
            return data
    except Queue.Empty:
        pass
    return ""

def stop_audio_stream():
    global recorded
    global recording_thread
    global timeout_thread
    global inp
    recording_thread = None
    timeout_thread = None
    inp = None
    recorded = False

def speak(token, audio):
    with open(path + "response.mp3", 'wb') as f:
        f.write(audio)
    send_event("SpeechSynthesizer", "SpeechStarted", {"token":token})
    # GPIO.output(25, GPIO.LOW)
    os.system('mpg123 -q {}1sec.mp3 {}response.mp3'.format(path, path))
    # GPIO.output(24, GPIO.LOW)
    send_event("SpeechSynthesizer", "SpeechFinished", {"token":token})
    states['SpeechSynthesizer'] = {'token':token, 'state':'FINISHED','offset':0}

def alexa():
    # make session
    ses = requests.Session()
    ses.mount(BASEURL, HTTP20Adapter())
    # GPIO.output(24, GPIO.HIGH)
    boundary = 'this-is-a-boundary'
    headers = {
        'Authorization': 'Bearer %s' % gettoken(),
        'Content-Type': 'multipart/form-data; boundary=%s' % boundary,
    }

    audio = AudioData(boundary)
    try:
        r = ses.post(BASEURL + '/v20160207/events', headers=headers, data=audio)
    except ssl.SSLError as e:
        print e
        print 'Failed to send voice'
        return

    if r.status_code == 200:
        for v in r.headers['content-type'].split(";"):
            if re.match('.*boundary.*', v):
                boundary = v.split("=")[1]
        data = r.content.split(boundary)
        data.pop(0)
        data.pop()
        directives = []
        audiofiles = {}
        for d in data:
            # if (len(d) >= 1024):
            #    audio = d.split('\r\n\r\n')[1].rstrip('--')
            segments = d.split('\r\n\r\n')
            sheader = segments[0]
            sheaders = sheader.split('\r\n')
            sheaders.pop(0)
            cid = ct = None
            for kv in sheaders:
                kk = kv.split(":", 1)[0]
                vv = kv.split(":", 1)[1]
                if re.match('content-type', kk.lower()):
                    ct = vv.strip()
                if re.match('content-id', kk.lower()):
                    m = re.search(r"^<(.*)>$", vv.strip())
                    cid = m.group(1)
            sbody = segments[1].rstrip('--')
            if re.match('.*application/json.*', ct):
                directives.append(json.loads(sbody))
                #print sbody
            elif re.match('.*application/octet-stream.*', ct):
                audiofiles[cid] = sbody
        for directive in directives:
            header = directive['directive']['header']
            if header['namespace'] == 'SpeechSynthesizer' and header['name'] == 'Speak':
                payload = directive['directive']['payload']
                m = re.search(r"^cid:(.*)$", payload['url'])
                cid = m.group(1)
                speak(payload['token'], audiofiles[cid])
    else:
        # GPIO.output(lights, GPIO.LOW)
        for x in range(0, 3):
            time.sleep(.2)
            # GPIO.output(25, GPIO.HIGH)
            time.sleep(.2)
            # GPIO.output(lights, GPIO.LOW)


def init_alexa():
    payload = {
        "settings": [
            {
                "key": "locale",
                "value": "ja-JP"
            }
        ]
    }
    send_event("Settings", "SettingsUpdated", payload)

def send_event(namespace, name, payload, ext_header = None):
    # make session
    ses = requests.Session()
    ses.mount(BASEURL, HTTP20Adapter())
    # make request
    headers = {'Authorization': 'Bearer %s' % gettoken()}
    dheader = {
        "namespace": namespace,
        "name": name,
        "messageId": str(uuid.uuid4())
    }
    if ext_header != None:
        dheader = {k: v for dic in [dheader, ext_header] for k, v in dic.items()}
    d = {
        "event": {
            "header": dheader,
            "payload": payload
        }
    }
    files = [
        ('file', ('metadata', json.dumps(d), 'application/json; charset=UTF-8'))
    ]
    for i in range(1, RETRY_COUNT + 1):
        try:
            r = ses.post(BASEURL + '/v20160207/events', headers=headers, files=files)
            if r.status_code not in [200, 204]:
                raise RuntimeError(r.text)
            return r
        except ssl.SSLError as e:
            print e
            print "Retrying to send event(%d)" % i
    raise RuntimeError('send event error')

def start():
    while True:
        # os.system('mpg123 -q {}record_now.mp3'.format(path))
        # os.system('arecord -d 3 -D {} {}recording.wav'.format(device,path))
        os.system('mpg123 -q {}request_now.mp3'.format(path))

        # call alexa
        alexa()
        # time.sleep(2)

if __name__ == "__main__":
    while internet_on() == False:
        print "."
    token = gettoken()
    init_alexa()
    os.system('mpg123 -q {}1sec.mp3 {}hello.mp3'.format(path, path))
    start()

この変更によって、ようやくピカチュウのスキルが発動するようになりました。「ピカチュウよんで」で色々説明があった後(ただし2回目以降はピカチュウの声だけ)、「こんにちは」とか話しかけるとピカチュウが喋ってくれます。「10万ボルト」というと、何やら技の掛け声みたいな声が返ってきました。とりあえず、スキルとのやり取りはこれでうまくいっているようです。

ただ、まだ重要な課題があります。それは、AudioPlayerに対応していないことです。これに対応しないと、ラジオを聞いたり、一部の音声の再生ができないのですが、とりあえずそこまで求めてないので、AlexaPiの改良はここら辺で一区切りつけたいと思います。次はカスタムスキルの開発を行う予定です。その前に、買ったまま放置しているカメラを試してみるかもしれません。