A developer to build a server-based command-line application that will accept a multi-channel (up to 4 channels) audio file (AIFF/FLAC) and segment / diarise the entire file based on audio signals on each channel.
Segmenting must take into account who is speaking (on one of the channels) and be able to cleanly extract segments (by way of individual time markers in a JSON file).
The audio is relatively clean (limited background noise) but will contain some overtalking (two people speaking at the same time), rapid transitioning (person 1 is speaking and person 2 start nearly immediately after person 1 ends speaking) and other 'conversational characteristic' areas where advanced separation techniques need to be deployed.
Development needs to server-centric and programmatically triggered. We will send an audio file and expect back a JSON file of highly accurate segmented data (not fooled by quick transitions, non-human harmonics, non-speech noises).
Please supply a synopsis of how you would develop a solution (technology / steps / conditions / constraints / output) with your summarised CV for consideration. Previous expertise is appreciated. Please also identify any advanced technique (ie: PCA) that you plan to adopt.
Please read thru the introductory spec carefully - ask questions - please include a PROPOSED SOLUTION - The PROPOSED SOLUTION does not have to be optimal or viable, but it will indicate to us that you understand the challenge we are trying to solve.