I've been teaching guitar online for the last two weeks to about 50 students, and I've used different systems, because it turned out that there's no single system that works in practice for all students. However, I mainly use browser based systems, and on average they work fine.
I've also encountered the first problem you describe, namely the deterioration of any sound other than speech. This has nothing to do with the system you use. It can happen on skype or anywhere else. It has to do with the device on the other end of the communication link. Some devices try to detect speech and suppress anything else, which they classify as noise. Depending on the system, there may be an audio setting that allows you to switch off that "optimize for speech" setting, but how and if it is possible at all depends on the system, the drivers, etc. In general, unless the student is handy with computers, you're out of luck in such a situation. Now, this only happens when the student uses the built-in mic (or a mic directly connected to the computer via the analog audio input). If you can convince the student to use a USB-mic or to connect a mic via an audio interface the problem will be solved.
As for delay/latency, that's inherent to internet communication, and you can't get rid of it. As far as I know, there is no simple way to actually play/sing together in such a setting. Of course, the teacher can play an accompaniment and the student can play/sing along, but the teacher won't be able to hear if the student's timing is correct. One way to fix that is to send a recording to the student, have the student play it back during the lesson while she/he sings/plays along. In that case it's possible to judge timing and phrasing because both sounds are fed back in sync via the student's mic.