Mid-attribute Speaker Generation using Optimal-Transport-based Interpolation of Gaussian Mixture Models


Paper
arXiv preprint

Author
Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Detai Xin, Hiroshi Saruwatari
(The University of Tokyo, Japan.)

* All the speakers of synthetic speech are artificially generated.

Speech sample #1: control in gender axis

(female, non-native)
(mid, non-native)
(male, non-native)





















(female, native)
(mid, native)
(male, native)

Speech sample #2: control in nativeness (language fluency) axis

(female, non-native)

                   
(male, non-native)
(female, mid)

                   
(male, mid)
(female, native)

                   
(male, native)

Speech sample #3: control in two axises

(mid, non-native)
(female, mid)
(mid, mid)
(male, mid)
(mid, native)