I am just beginning to learn sed and awk. I have to submit an homework assignment tomorrow, which is a copy-paste from Wikipedia. Just the opportunity to practice some sed scripting!
So I have the document in html format. Now I need to replace [<number>] with nothing. How would I do this?
This is what I tried, but I think it does not even match the pattern I want:
cat content.xml | sed 's/\[\d+\]/ /g' > content2.xml
As a next stage, I will be implementing the replacement of these patterns, which are hyperlinks, but even the above mentioned simple pattern is not being matched:
<a href="https://en.wikipedia.org/wiki/Immune_system">immune system</a>
and then remove the citations:
<a name="cite_ref-Gleeson2007_27-0"/><a href="https://en.wikipedia.org/wiki/Physical_exercise#cite_note-Gleeson2007-27">[27]</a>