Gender Model

The gender model can classify speech into "male", "female" or "unknown". The "unknown" class means that we cannot reliably classify the speech into "male" or "female. Because it only makes sense to apply this model to speech audio, it is combined with the speech model to increase the reliability of the results.

The receptive filed of this model is 1851 milliseconds.

Specification

Receptive FieldResult Type
1851 msresult ∈ ["female", "male", "unknown", "no_speech"]

Time-series

The time-series result will be an iterable with elements that contain the following information:

{
"timestamp": 0,
"gender": {
"result": "female",
"confidence": 0.92
}
}

Summary

In case a summary is requested the following will be returned

{
"gender": {
"male_fraction": 0.30,
"female_fraction": 0.60,
"unknown_fraction": 0.05,
"no_speech_fraction": 0.05
}
}

where x_fraction represents the percentage of time that x class was identified for the duration of the input.

Transitions

In case the transitions are requested a time-series with transition elements like shown below will be returned.

{
"timestamp_start": 0,
"timestamp_end": 1500,
"result": "male",
"confidence": 0.96
},
{
"timestamp_start": 1500,
"timestamp_end": 4000,
"result": "female",
"confidence": 0.89
}

The example above means that the first 1500ms of the audio snippet contained speech by male speaker(s), and between 1500ms and 4000ms female speaker(s) were detected.