I will respond to your embedded answers from the POV of modern math.
1) You can get the gist of model theory by looking at problems that were solved by just this sort of model:
Model theory is instantiated in modern abstract algebra. Beyond syntax and embedding in a known domain, it provides a third way of looking at patterns, by giving them a range of different semantic interpretations that can be compared and contrasted.
2) Why does anyone need it?
Why does anyone need the instance-based models of algebra? By design, they always address things that can be described perfectly well by symbol manipulation: that is why the whole field is called algebra, after all.
But this approach derived because less intricate and 'fussy' ways of looking at things did not give us the right way to classify the solutions for polynomials. We could not see the reason why fifth-degree polynomials do not have a closed form solution like all simpler ones.
We needed Galois theory to show the roots are bound into groups of permutations and basic facts about the possibilities there that were solved by looking at the internal relationships between instances of 'groupness'. Those are both model theories -- one is the model theory of fields and the other is the model theory of groups.
Humans simply have a real gift for comparative semantic projections which is stronger than their ability to process symbols more directly. It provided leverage to solve problems that were impervious to more direct approaches.
3) Formal proofs are not the target of model theory. They are not even the target for models of set theory.
Their purpose is to play the same role in logic that groups, rings, and fields play in modern algebra -- particularly (perhaps, unfortunately) in establishing negative results. The independence of Choice and the Continuum Hypothesis from the rest of set theory and Real Analysis was done by 'forcing' conflicting homomorphisms, a technique model theory developed through its application in the form of abstract algebra.
But like groups, rings, and fields, they have a beautiful internal structure of their own. Like those things, we expect that structure to have applications to unrelated problems.
Consider Woodin's work on ordinals. It is an abstract study of what kinds of orderings there can be and whether and how all those kinds can be reduced into something we can describe in a finite manner.
Yes, ultimately it is going to be a set of proofs. But they won't matter as much as the perspective that they provide on how humans think about orderings. A large part of the encoding for how they are related is built on which models of set theories provide structures into which multiple models of the overall set theory can be embedded.
It is just the right way to look at the problem.
4) There are syntactic models of semantics.
No, syntax is necessarily weaker than semantics, unless you throw out parts of semantics like Real Analysis.
For instance, the Real Numbers are uncountable, and any syntactic formulation that describes them will have countable models because there are only countable finite strings of symbols, and that is what syntax is made of. That is basic model theory, the Lowenheim-Skolem theorem.
So any structure you can actually nail down syntactically is not the shared semantic model on which Real Analysts actually do their work.
You can claim that the differences are necessarily irrelevant. But how could you possibly know? Delving into second-order logic give us some really strange results about how the countable interpretations of set theory "Henkin models" behave. They do not fit our intuition -- they are not true to our natural understanding of what we are actually meaning to analyze.
To the point of the big bold part
As a software engineer, you might also realize that problems can sometimes be best addressed by simulations and heuristics, especially NP-complete problems.
We optimize graph layouts by creating fake energy fields. We improve curve fitting estimates by creating colonies of competing parameter sets and 'breeding' them. We apply social models of preference or simulate little auctions to guide which combinations of alternatives to test for optimality. We do all kinds of metaphorical 'semantic' nonsense...
Every such simulation is always a completely unnecessary elaboration of details theoretically irrelevant to the problem at hand, added onto the complexity of the problem itself.
Simulations inject rules irrelevant to the real problem to force better convergence, etc. And those produce multiple different biased approaches that need to be compared and evaluated... All superfluous.
And yet, much in the spirit of point 2 above, it is often way easier to get things done this way. And even if you need an exact solution, which cannot come from estimation, these approaches get you close faster and provide you with tools you need.
Well, simulating the problem is generating irrelevant semantic models for something instead of analyzing it in sheerly syntactical manner.