The relevant part of the etymology of tone is:
from Greek tonos "vocal pitch, raising of voice, accent, key in
music," originally "a stretching, tightening, taut string," related to
teinein "to stretch"
Note that many of those meanings seem to have to do with stretching and preparing an instrument or voice to produce a (specific) pitch. The tonos was originally a sort of conceptual tool for constructing the scale, for "stretching" the strings or "raising the voice" to produce tones. The specific pitches (which we now also call tones) were frequently the result of tuning through whole tones, originally the ratio of 9:8.
To see the origin, we have to go back to the ancient Greeks, specifically the Pythagoreans. Among the Pythagoreans, all consonant intervals could be formed through ratios from the first four integers: 1, 2, 3, 4, part of a doctrine known as the tetractys.
A 2:1 octave, a 3:2 perfect fifth, and a 4:3 perfect fourth were consonances, of course. So were a 3:1 perfect twelfth and a 4:1 double octave. (The only other combination of these numbers -- 4:2 -- was equivalent to a 2:1 octave. Notably, the Pythagoreans called the 4:3 fourth a consonance, but the 8:3 perfect eleventh would not be a consonance, as it required an integer greater than 4, whereas the 3:1 twelfth and 4:1 double octave were consonances. The Pythagoreans were concerned mostly the importance of rational number, rather than practical consistency in how musicians may have used these things.)
Anyhow, the whole tone emerges naturally from these intervals, specifically defined in most ancient Greek music theory treatises as the difference between the 3:2 fifth and the 4:3 fourth, which produces a 9:8 tone. The Greeks also were somewhat fascinated with mathematical properties of so-called "superparticular" ratios, where the larger number is one bigger than the smaller number, another reason for privileging the 9:8 tone. This interval is implied clearly perhaps the first time in a fragment by the early Pythagorean philosopher Philolaus (ca. late fifth century to early fourth century BCE) by the division of an octave that would create a 6:8:9:12 ratio with perfect fifths, fourths, and a central whole tone:
The magnitude of harmonia [in this case, an octave] is syllaba
[perfect fourth] and di’oxeian [perfect fifth]. The di’oxeian [fifth]
is greater than the syllaba [fourth] in epogdoic [9:8] ratio. From
hypate [E] to mese [A] is a syllaba, from mese [A] to neate [or nete,
E'] is a di’oxeian, from neate [E'] to trite [later paramese, B] is a
syllaba, and from trite [B] to hypate [E] is a di’oxeian. The interval
between trite [B] and mese [A] is epogdoic [9:8], the syllaba is
epitritic [4:3], the di’oxeian hemiolic [3:2], and the dia pason
[octave] is duple [2:1]. Thus harmonia consists of five epogdoics and
two dieses; di’oxeian is three epogdoics and a diesis, and syllaba is
two epogdoics and a diesis.
[Translation from Barker, Greek Musical Writings: Volume 2, Harmonic and Acoustic Theory]
Interpreting this terminology a bit using modern note letter names (as given in brackets), we get an octave tuned with an E-A-B-E in the 12:9:8:6 ratio.
In the final sentences here, we can see this early notion of an octave divided into five tones ("epogdoics" here, which refers specifically to a 9:8 ratio) and two "dieses" (a generic Greek term for a small interval). It's pretty clear even in this early form that the "diesis" was not exactly half of a whole tone, as the ratios wouldn't work out correctly. Greek treatises almost always make a significant point of how it's impossible to divide the whole tone into two equal parts. It's only much later that you start seeing reference to a specific sort of "semitone" interval, and that interval was almost always only thought of as approximately half a tone, though it could be given a number of specific ratios and sometimes just was a term for any interval smaller than a whole tone. Meanwhile, the whole tone could easily always be tuned as a 9:8 ratio, the difference between a perfect fifth and fourth.
The association of "tonos" with this interval dates at least to the Euclidean Sectio canonis (likely from around 300 BCE) as well as to Aristoxenus (also around this same time). I believe Aristoxenus may be the first to actually use the term "tonos" for this interval. In Elementa harmonica he makes statements like:
The tone [tonos] is that by which the fifth [diapente] is greater than the
fourth [diatesseron]; the fourth contains two and a half tones.
[Translation from Creese, The Monochord in Ancient Greek Harmonic Science. In brackets here I noted that the Greek nomenclature is much closer to standard terminology than in the archaic terms Philolaus used: not only the term tonos but what became standard Greek/Latin terms for intervals like diapente ("through five [notes]") and diatesseron ("through four") are employed by Aristoxenus. The idea of a "half" tone mentioned here will be addressed below.]
The Sectio canonis is more explicit about associating the specific ratios:
It remains, then, to give an account of the interval of the tone
[toniaion diastema], that it is epogdoic [9:8]. For we learned that
if an epitritic [4:3] interval is taken away from a hemiolic [3:2]
interval, the remainder is epogdoic. And if the fourth is taken away
from the fifth, the remainder is the interval of the tone; the
interval of the tone is therefore epogdoic.
[Translation from Creese, The Monochord in Ancient Greek Harmonic Science.]
The Sectio canonis goes further with the concept of the tone and specifically discusses "diatonic" divisions of the scale, which involve combinations of 9:8 whole tones with other intervals. Basically, it begins with dividing an octave in the 6:8:9:12 ratio as discussed above, creating the so-called "immovable" notes of the scale. The other notes that filled in the 6:8 and 9:12 perfect fourths depended on the specific tuning system.
But when they were tuned using 9:8 whole tones, those notes were referred to as the "diatonos" versions of those notes, as they were derived using whole tones. ("Diatonic" means roughly "through [whole] tones.") Generally, Greeks tuned the scale from the high notes downward, so the E-B fourth could be filled in with a 9:8 E-D and a 9:8 D-C, leaving a 256:243 diesis (also known as a leimma/limma or "leftover part") between C and B. Similarly, tuning from A we could construct G and F with 9:8 whole tones, with a leftover 256:243 limma between F and E. (The 9:8 whole tone, for reference, is about 204 cents, while the 256:243 limma is about 90 cents.)
Note that the term tonos in ancient Greek music theory could refer to other things, particularly tonoi could be scale systems and methods of tuning that we might today call "modes" or different "species" of octave with semitones in different locations. (I'm going to ignore the various terms used to denote these things and how they may have differed -- I just wanted to point out that the word tonos also had other connotations at the time.)
The 9:8 interval had a number of different names, often referred to in early sources by mathematical terms for the ratio 9:8 itself (epogdoic was one of those), and by tonos and various other words with the root ton-. Also, while the earliest "diatonic" tunings were based around a 9:8 tone, later treatises introduced other possible "tones" with other ratios that were somewhat close in size to that interval. However, out of context, the "tone" came to be associated with a 9:8 ratio.
Greek theorists as early as Aristoxenus recognized it was also possible to view the octave as a division into 12 roughly equal parts, but such a concept was only mentioned in passing, and there were no practical tuning instructions or approximations given for such a notion. (Aristoxenus discusses the possibility of third-tones and quarter-tones as well; while some people attribute the concept of a 12-tone equal tempered scale to him, it's clear that he was just using all sorts of approximate small interval divisions to describe various intervals -- he'd just as easily describe a fourth as composed of ten quarter-tones as he'd think of it as five semitones.
However, Greek music never used more than a couple consecutive small intervals in this manner. While it was theoretically possible to divide up octaves and other intervals into semitones, it really had no meaning or utility in Greek music, which is probably why Aristoxenus and other Greek theorists never really pursued this line of reasoning much further.)
Instead, the closest the Greeks had was the 9:8 whole tone as a standard measure derived from consonant intervals and with a rather simple numerical relationship that could be easily measured with a ratio of string lengths.
That's why the "tone" emerged as the standard intervallic measure in music for thousands of years. As for the rest of the question about what we should do with "steps" or whether semitones are a better measure, that's calling for more of an opinion. However, as long as we keep using the letter-name system (A-B-C-D-E-F-G) that relates fundamentally to a diatonic scale, it probably still makes sense to think of the tone as pretty fundamental to the standard Western scale.