While developing IOS application for text and audio synchronization, the main problem on the way was to synchronize audio and transcript. The possible solutions the team found was speech recognition, use external file such as srt, webvtt for caption. But these solutions also have there own problems associated with them.

In the case of speech recognition, the audio voice was not so clear and without that, the audio voice can’t be recognized precisely. Also the phone was not able to process the audio that fast and this lead to performance issues.

While using captions and time tags, it was found that though srt, webvtt formats were available but there were no tools/libraries, that can directly absorb these in IOS media player kit.

The team was using Mpmediaplayer framework to play the audio which was getting streamed from server. Media player framework does not provide forward and reverse button so a custom media player was implemented. Mpmedia player framework is also capable to play caption with audio and video but it should be embedded with media file. Since the audio stream didn’t have embedded text, So the team went with the SRT route. Now in SRT also, the team had to parse the srt file and convert it to a structured data which could now be used to generate captions based on time intervals. Using this structured data, it was finally able to get synchronized time tags with media player playback time and highlighted the related transcript text on textview.