Real Time Processing

The Deeptone SDK can be used to process a real time audio stream.

Sample Data

You can download this sample audio file from the LibriSpeech ASR corpus with a woman speaking for the examples below. For code sample go to Example Usage.

Supported Formats

The data being fed via the input_generator should be 16-bit PCM with the sample rate of 16 kHz. If a different sample rate is provided, it will be re-sampled, but generally the results might not be accurate.

Configuration options and outputs

There are different configuration options and types of outputs which can be used depending on the SDK language.

For code sample go to Example Usage. For detailed output specification go to Output specification.

Available configuration options

There are several possible arguments which can be passed to the process_stream function:

  • input_generator - generator that yields byte arrays representing audio data properly sampled
  • models - the list of model names to use for the audio analysis
  • output_period - how often (in milliseconds, multiple of 64) the output of the models should be returned

For code sample go to Example Usage. For detailed output specification go to Output specification.

Available Outputs

A generator will be returned which will yield one output per output_period milliseconds of the provided input, representing timestamped results from the requested models.

{"timestamp" : 0, {"results": "gender": {"result": "female", "confidence": 0.6418}}}
{"timestamp" : 1024, {"results": "gender": {"result": "male", "confidence": 0.9012}}}
{"timestamp" : 2048, {"results": "gender": {"result": "male", "confidence": 0.7698}}}
{"timestamp" : 3072, {"results": "gender": {"result": "male", "confidence": 0.6606}}}
{"timestamp" : 4096, {"results": "gender": {"result": "female", "confidence": 0.9780}}}
{"timestamp" : 5120, {"results": "gender": {"result": "female", "confidence": 0.8991}}}

Example Usage

You can use the process_stream method to process a stream of audio. You will need to provide a valid generator that yields audio bytes. Below you will find two different examples, where we:

  • open an audio file and stream bytes from that file, or
  • stream bytes using microphone as an input source

1. Streaming bytes from an audio file

from deeptone import Deeptone
from scipy.io import wavfile
def input_generator(filepath, chunk_size=1024):
print(f"Opening file {filepath}")
rate, data = wavfile.read(filepath)
print(f"Detected sample rate: {rate}")
index = 0
while index < len(data):
yield data[index: min(len(data), index + chunk_size)]
index += chunk_size
return
# Initialise Deeptone
engine = Deeptone(model_path="path/to/model", license_key="...")
audio_generator = input_generator("PATH_TO_AUDIO_FILE")
output = engine.process_stream(
input_generator=audio_generator,
models=[engine.models.Gender],
output_period=1024
)

2. Streaming bytes from a microphone

You can find even more detailed recipes on using a microphone in the Gender model recipes section.

from deeptone import Deeptone
import pyaudio
# Initialise an audio stream
pa = pyaudio.PyAudio()
stream = pa.open(
format=pyaudio.paInt16,
channels=1,
rate=16000,
input=True
)
def input_generator():
data = stream.read(1024)
while stream.is_active():
yield data
data = stream.read(1024)
# Initialise Deeptone
engine = Deeptone(model_path="path/to/model", license_key="...")
audio_generator = input_generator()
output = engine.process_stream(
input_generator=audio_generator,
models=[engine.models.Gender],
output_period=1024
)

In either of those two cases, the returned object is a generator that will yield results for every output_period milliseconds:

# Inspect the result
for ts_result in output:
ts = ts_result["timestamp"]
res = ts_result["results"]["gender"]
print(f'Timestamp: {ts}ms\tresult: {res["result"]}'
f' with confidence {res["confidence"]}')

The output of the script would be something like:

Timestamp: 0ms result: female confidence: 0.6418
Timestamp: 1024ms result: female confidence: 0.8682
Timestamp: 2048ms result: female confidence: 0.6546
Timestamp: 3072ms result: female confidence: 0.6606

Raw output:

{ "timestamp" : 0, {"results": "gender": { "result": "female", "confidence": 0.6418, } } }
{ "timestamp" : 1024, {"results": "gender": { "result": "female", "confidence": 0.8682, } } }
{ "timestamp" : 2048, {"results": "gender": { "result": "female", "confidence": 0.6546, } } }
{ "timestamp" : 3072, {"results": "gender": { "result": "female", "confidence": 0.6606, } } }

Further examples

You can find more detailed recipes for real-time processing of microphone input in the Gender model recipes section.