Regex for email validation

This is a discussion on Regex for email validation within the PHP forums in Programming Languages category; Lupus Michaelis wrote: > Argh ! Howmany times it is in ? I spent so many time to write a > regex > that belongs the RFC822 :-/ Because all the regex in answer here was > false. They don't allow email like "Mickael Doodoo"@lupusmic.com nor That format is about as dead as the dinosaurs. I know it IS a valid format, but ... /Per Jessen, Zürich...

Go Back   Application Development Forum > Programming Languages > PHP

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #11  
Old 08-27-2008, 02:04 PM
Per Jessen
Guest
 
Default Re: [PHP] Regex for email validation

Lupus Michaelis wrote:

> Argh ! Howmany times it is in ? I spent so many time to write a
> regex
> that belongs the RFC822 :-/ Because all the regex in answer here was
> false. They don't allow email like "Mickael Doodoo"@lupusmic.com nor


That format is about as dead as the dinosaurs. I know it IS a valid
format, but ...


/Per Jessen, Zürich

Reply With Quote
  #12  
Old 08-27-2008, 02:05 PM
tedd
Guest
 
Default Re: [PHP] Regex for email validation

At 6:30 PM +0200 8/27/08, Per Jessen wrote:
>Well, I left that for the OP to figure out. Still, your regex is
>worse - a domain name cannot contain '%'. The only valid characters
>for a domain name are letters, numbers and a hyphen. Also, maximum
>length for a domain name is 64 characters, which could/should be
>checked too.


Well, I stole that regex anyway -- I agree that % should have not been there.

>No, they can't. There are no 8-bit characters allowed in an
>email-address. Check out RFC2821.


You can throw all the facts and documentation you
want at me, but the left side of the @ has always
been open to anything you want. The right side of
the @ has had to deal with 7-bit limitation (the
DNS problem). But, considering the work that the
IDNS has done, (circa 2000) we can use Unicode
characters on both sides of the @.

However, the software (browsers and email apps)
may/may-not be able to deal with it, as shown by
my recent example of:

> > tedd@à.com
>>
>> is a legal and working email address.

>
>If that reads "tedd(at)<space>.com", it might be valid on your system,
>but not in public.


The email address is perfectly valid, and works,
but our definition of "public" is apparently
different.

I claim if it's valid on any system, then it's
public. I don't hold to the notion that if M$
doesn't recognize it then it isn't public. M$ has
always had it's collective head up it's vested
interest butt anyway.

For demonstration Safari has absolutely no
problems dealing with IDNS, whereas all IE's do.
To prove my point, if you have Safari, try
entering option v into the browser URL box and
hit return. You don't have to enter anything else
(i.e., no http://, www, or dot com).

What will happen is that you will be
automagically transported to one of my sites
where the url is square-root dot com. However if
you're dealing with one of the leading "also-ran"
IE browsers, then you'll see the PUNYCODE
equivalent, which was never intended to be seen
by end users anyway. Just another example of how
M$ always has a better idea.

So, regardless of the documentation, which may be
outdated, I know that Unicode characters can be
used in IDNS and thus on both sides of the @, but
it's the software that needs to catch up to the
technology.

Cheers,

tedd

--
-------
http://sperling.com http://ancientstones.com http://earthstones.com
Reply With Quote
  #13  
Old 08-27-2008, 02:10 PM
tedd
Guest
 
Default Re: [PHP] Regex for email validation

At 7:55 PM +0200 8/27/08, Lupus Michaelis wrote:
>mike a écrit :
>>
>>php should have a good check built-in.
>>
>>see http://www.php.net/manual/en/function.filter-var.php

>
> Argh ! Howmany times it is in ? I spent so
>many time to write a regex that belongs the
>RFC822 :-/ Because all the regex in answer here
>was false. They don't allow email like "Mickael
>Doodoo"@lupusmic.com nor
>mickael+doudou@lupusmic.org ; and they are
>valuable email addresses. Without the fact that
>a top level domain isn't always between two and
>three characters (think about .museum).


Or TLD's like:

http://tedd.mobi/

Things are a changing fast.

Just wait until you start designing stuff for cell phones.

Cheers,

tedd
--
-------
http://sperling.com http://ancientstones.com http://earthstones.com
Reply With Quote
  #14  
Old 08-27-2008, 02:35 PM
Per Jessen
Guest
 
Default Re: [PHP] Regex for email validation

tedd wrote:

>
>>No, they can't. There are no 8-bit characters allowed in an
>>email-address. Check out RFC2821.

>
> You can throw all the facts and documentation you want at me, but the
> left side of the @ has always been open to anything you want.


Except anything 8-bit, yes. Seriously, read RFC2821 and maybe -2822

> The right side of the @ has had to deal with 7-bit limitation (the
> DNS problem). But, considering the work that the IDNS has done, (circa
> 2000) we can use Unicode characters on both sides of the @.


No, you cannot. Certainly not on the left side, and only on the right
side if you assume visual representation = email-address.

Why don't you send me an email at this address: Ã*@jessen.ch (that's an
a with accent grave like in your domain further down).

> However, the software (browsers and email apps) may/may-not be able to
> deal with it, as shown by my recent example of:
>
>> > tedd@Ã*.com
>>>
>>> is a legal and working email address.

>>
>>If that reads "tedd(at)<space>.com", it might be valid on your system,
>>but not in public.

>
> The email address is perfectly valid, and works,
> but our definition of "public" is apparently
> different.


Sorry, I didn't see the a with the accent grave. Still, try using that
address in Thunderbird, and you'll see that it doesn't work. The
correct email-address (which is what we're talking about)
for 'tedd@Ã*.com' is tedd@xn--0ca.com, which an email-system like
sendmail/exim/postfix/etc will understand (whereas it will choke
on 'tedd@Ã*.com'.

> So, regardless of the documentation, which may be outdated, I know
> that Unicode characters can be used in IDNS and thus on both sides of
> the @,


You're wrong - IDNs only apply to the right side of the @. (check out
what the 'D' means).

Go on, send me that email to 'Ã*@jessen.ch' ... for what it's worth, I
can't even define an account like that, so my mailserver might well
reject it.



/Per Jessen, Zürich

Reply With Quote
  #15  
Old 08-27-2008, 02:56 PM
Lupus Michaelis
Guest
 
Default Re: [PHP] Regex for email validation

Per Jessen a écrit :

> That format is about as dead as the dinosaurs.


Why ?

--
Mickaël Wolff aka Lupus Michaelis
http://lupusmic.org
Reply With Quote
  #16  
Old 08-27-2008, 03:20 PM
Per Jessen
Guest
 
Default Re: [PHP] Regex for email validation

Lupus Michaelis wrote:

> Per Jessen a écrit :
>
>> That format is about as dead as the dinosaurs.

>
> Why ?


I don't know, but I suspect due to lack of support in popular mailers
and mail-servers. Also, the use of quotes does make it cumbersome to
work with, both as a user and as a mailserver admin.


/Per Jessen, Zürich

Reply With Quote
  #17  
Old 08-27-2008, 05:22 PM
Lupus Michaelis
Guest
 
Default Re: [PHP] Regex for email validation

Per Jessen a écrit :

> I don't know, but I suspect due to lack of support in popular mailers
> and mail-servers. Also, the use of quotes does make it cumbersome to
> work with, both as a user and as a mailserver admin.


I had to write some pieace of code that can handle "toto toto"@ndd
five years ago, it was Lotus Mail habits of the end users

--
Mickaël Wolff aka Lupus Michaelis
http://lupusmic.org
Reply With Quote
  #18  
Old 08-27-2008, 05:56 PM
tedd
Guest
 
Default Re: [PHP] Regex for email validation

At 8:35 PM +0200 8/27/08, Per Jessen wrote:
>Go on, send me that email to 'à@jessen.ch' ... for what it's worth, I
>can't even define an account like that, so my mailserver might well
>reject it.


Yes, you are right.

I was thinking of something else, namely that the
LHS of the email address is case-sensitive --
this was something that was discussed about five
years ago on the IDNS list, of which I attended.

Considering that the list was created to solve
the IDNS problem, I mistakenly remembered them
were discussing IDNS problems, but instead they
were discussing case-sensitivity.

Sorry to add to the confusion.

tedd

--
-------
http://sperling.com http://ancientstones.com http://earthstones.com
Reply With Quote
  #19  
Old 08-27-2008, 06:44 PM
Kevin Waterson
Guest
 
Default Re: [PHP] Regex for email validation

This one time, at band camp, Yeti <yeti@myhich.com> wrote:

> <?php
> # this one worked fine for me, but it does not cover the full RFC
> like: "name" name@notworking.cc OR name <name@notworking.cc>
> $regex = "^[a-z0-9,!#\$%&'\*\+/=\?\^_`\{\|}~-]+(\.[a-z0-9,!#\$%&'\*\+/=\?\^_`\{\|}~-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*\.([a-z]{2,})$";
> if (eregi($regex, $email)) {
> // do something
> }
> # Beware that the filter functions only work under PHP5+. If your PHP
> supports them they should be the preferred choice
> ?>


There is no silver bullet regex to validate all RFC compliant email address.
Many have tried, but they all fail at some point. The best you can do is
cater to most _sane_ addresses.

And when the domain name space is opened up, well, you will back to strpos() and @

Kevin
Reply With Quote
  #20  
Old 08-27-2008, 06:49 PM
tedd
Guest
 
Default Re: [PHP] Regex for email validation

At 8:35 PM +0200 8/27/08, Per Jessen wrote:
> > So, regardless of the documentation, which may be outdated, I know
> > that Unicode characters can be used in IDNS and thus on both sides of
>> the @,

>
>You're wrong - IDNs only apply to the right side of the @. (check out
>what the 'D' means).



The D in IDNS is Internationalized "Domain" Names -- note what the
'I' stands for.

I was wrong to say that Unicode code points can be used on the LHS of
the @ but domain names contain Unicode code points (in fact, that's
all they contain) and thus these code points can appear on the RHS of
email.

For example, one *can* use other than ASCII characters in a domain
name -- that's what the IDNS WG was for solving.

The WG did solve this issue and came up with a way to do that -- the
current algorithm is called PUNYCODE which allows Unicode code-points
to appear in a domain name. I know this to be true because I have
several domains that lie outside the standard ASCII AND they are real
domains that have real web sites.

For example:

http://xn--u2g.com

If you have a browser (like Safari) that is capable of showing the
URL in it's native charset, then you will see the Rx.com in the url.
If not, then you'll see xn--u2g.com.

Now, email can be sent from that domain, but I have not found an
application that will send nor receive it. The software has simply
not caught up with the technology.

One thing for sure, as the rest of the world logs on, more and more
people will demand that their applications will implement the
capabilities of the current IDNS.

Cheers,

tedd


--
-------
http://sperling.com http://ancientstones.com http://earthstones.com
Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 09:04 AM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.