Hi, I have done a similar text classification task at Microsoft with tf+Keras, Conv1D, as well as FastText. Per your project description, your model input is like (x = tags, y = expert labels), and you are training a model predicting expertlabels, right?
Though the model structure and optimization method is still not clear to me, I think I could give helpful suggestions.
Hope we could solve it together.