That is really nice.
To me it looks like they are syncing the sound with data attributes in the
<p><span> tags of the html.
<span data-begin="143.385987" data-end="146.990219" data-index="39" class="is-read">And only 30% goes to raw materials and production.</span>
These also have a click action that takes the sound back or to the correct section also using the data attributes.
I would say this very is possible with coding ,
Possibly without coding you could get a facsimilie ??.
but both ways will take some work and more than a few moving parts…