1

This is a tricky one that requires a level of Excel function mastery I don't currently have. I have been studying Excel's site for way too long and I'm still not progressing.

I am trying to take a complicated URL like:

http://dp18776.live.domiain.net/active/MO_Mi_WFM

and parse it so that I get this:

live.domain.net

The closest I have gotten is with this:

http://dp18776.live.domiain.net

My logic:

=LEFT(A17, SEARCH(".net",A17)+3)

The reason I can't stop there is because the local part can be variable across the thousands of records, whereas only the domain is (e.g. live.domain.net) constant.

I also know I need to use the MID function, I was testing what I could get away with on more basic functions, but the bigger wrench in the works is that I need to be able to key off ".net" to retrieve the domain, so I need to set the starting point in MID() as ".net" - len(domain)

Would anyone be able to help me complete this formula?

GPP
  • 113
  • 4

3 Answers3

4

Everybody here seems to focus on the .com or .net, but domains don't always end with .com or .net... however, they always end before the /.

So first find where the host name begins:

=FIND("://", A1) + LEN("://")
=FIND("://", A1) + 3

Then find the first slash afterwards – that's where the host name ends:

=FIND("/", A1, FIND("://", A1) + 3)

Or, just in case there's no path at all:

=IFERROR(FIND("/", A1, FIND("://", A1) + 3), LEN(A1) + 1)

And extract everything in between:

=MID(A1;FIND("://";A1)+3;IFERROR(FIND("/";A1;FIND("://";A1)+3);LEN(A1)+1)-(FIND("://";A1)+3))

A1: https://www.example.com/foo/bar
A2: =FIND("://";A1)+3
A3: =IFERROR(FIND("/";A1;A2);LEN(A1)+1)
A4: =MID(A1;A2;A3-A2)

If you want to throw away the leftmost domain component, you can use FIND(".",A1):

=MID(A1;FIND(".";A1)+1;IFERROR(FIND("/";A1;FIND(".";A1)+1);LEN(A1)+1)-(FIND(".";A1)+1))

A1: https://www.example.com/foo/bar
A2: =FIND(".";A1)+1
A3: =IFERROR(FIND("/";A1;A2);LEN(A1)+1)
A4: =MID(A1;A2;A3-A2)

For Salesforce, with no error checks:

=MID(A1,FIND("://",A1)+3,FIND("/",A1,FIND("://",A1)+3)-(FIND("://",A1)+3))

=MID(A1,FIND(".",A1)+1,FIND("/",A1,FIND(".",A1)+1)-(FIND(".",A1)+1))

Note: This still has a few assumptions – e.g. it won't work right if the URLs have a username@domain or username:password@domain. The FIND(".") version also assumes there are at least two domain components and won't work right with https://example/foo – it needs some extra IFERROR() checks for that.

u1686_grawity
  • 426,297
  • 64
  • 894
  • 966
  • This is the best answer. Good job, grawity. This is what I came here to add, but you already nailed it. – picobit Apr 10 '16 at 04:12
  • Ditto what @picobit said, this is impressive, thank you!! Only nuance I have now: when trying to import into Salesforce as a formula field, Salesforce does not have an equivalent to IFERROR functions, so I need to re-write for that. Any tips on how best to tackle? – GPP Apr 11 '16 at 19:06
  • 1
    If you're certain that all URLs will have a path (at least an empty path, i.e. `http://foo/` and not `http://foo`) then you can omit IFERROR() entirely. Otherwise, find out how Salesforce reports errors from FIND() or SEARCH()... – u1686_grawity Apr 11 '16 at 19:12
  • @grawity from what I have seen, the fields will have always have some kind of http://foo.domain.com/string/moreStrings/yetAnotherString/, so I am not super worried. I am getting hung up on how to de-Nest the IFERROR statement: It looks like IFERROR(FIND("/",A34,FIND(".",A34)+1),LEN(A34)+1) is the chunk, but there is a minus operator succeeding the statement: -(FIND(".";A1)+1)) – GPP Apr 11 '16 at 22:40
  • @grawity in other words, is the '-(FIND(".";A1)+1))' string part of that IFERROR function? – GPP Apr 11 '16 at 22:44
  • @GPP: No, of course it isn't. It's a regular minus sign. (See edit) – u1686_grawity Apr 12 '16 at 06:46
  • @grawity sure, but what is that statement being subtracted from? – GPP Apr 12 '16 at 18:48
  • @GPP: From the return value of IFERROR(...). – u1686_grawity Apr 12 '16 at 20:40
  • Thanks Grawity! I marked yours as the answer to the question, but there's an additional challenge: Some of the urls have numbers appended like this: http://13723.live.streamingurl.com:80/VBNCASA_SC, which means the string comes out like this: live.streamingurl.com:80 Any suggestions on how to fix? I can't use IFERROR, but another nested IF statement? – GPP Apr 17 '16 at 00:19
3

You can use regexp via VBA

and get your hostname with something like that:

https?:\/\/[a-zA-Z0-9-]+\.([a-zA-Z0-9.-]+)\/.*

group 1:

live.domain.net
Eugene Lebedev
  • 176
  • 1
  • 5
  • Intriguing, thanks for jumping in! I am trying to stick with the lesser-powered 'LEFT' and 'MID' functions since I am ultimately aiming to turn this into a Salesforce formula field (I was directed here). Is this a realistic expectation? (I am digging into the Regex links on that answer link btw) – GPP Apr 09 '16 at 06:47
1

Without knowing what the conditions are of your local part and where your hangups are, I'd search for the extension and then the first . as you don't have a www.. If you do - search for the second .. I'll provide the parts and then the actual formula.

Searching for whatever extension you need to find.

=IFERROR((SEARCH(".com",A1)),0)+IFERROR((SEARCH(".net",A1)),0)+IFERROR(SEARCH(".org",A1),0)

Searching for your first .

=SEARCH(".",A1)

Then combine them in =MID()

=MID(A1,SEARCH(".",A1)+1,IFERROR((SEARCH(".com",A1)),0)+IFERROR((SEARCH(".net",A1)),0)+IFERROR(SEARCH(".org",A1),0)-SEARCH(".",A1)+3)

If there isn't always a local part, then it may get trickier. There are also some other ways to do it if your local string is always the same length or always has a specific string, or certain values, etc.

You can get rid of the iferror portions if you always know the extension, or you can add additional extensions.

Raystafarian
  • 21,583
  • 11
  • 60
  • 89