regex - How to extract expression matching an email address in a text file using R or Command Line? -


I have a text file that includes email addresses and some information.

I would like to know how can I remove those email addresses using R or Terminal?

I have read that I can use a regular expression that will match the email address such as

  "^ [_a-Z0-9-] (\ \ [_ One-Z0-9 -]. +) * @ [One-Z0-9 -] (\ [[Z-- 0 - 0]. +) * (\.. [Az] {2,4} ) $ " 

But what command or function do I need to use to remove those emails?

There is no pattern in the text file command or the function should search on the document and remove the email addresses.

Let's unstructured example file:

  This is a test Fred's fred @foo.com and who is joe@example.com - but @ this is a twitter handle for twit@here.com  

Then if you do this:

  myText & lt; - readLines ("testmail.txt") email = list list (Regions (MyText, GregxPro ("([_ a- Z0-9-] (\\ [_ one-Z0-9 -.] +) * @ [One- Z0-9 -] (\\ [one-Z0-9 -.] +) * (\\. [As] {2,4}) ", mytext))) gt; Email [1] "fred@foo.com" "joe@example.com" "twit@here.com"  

This removes a vector from all emails, more than that One on one line I do not think it will find broken e-mail addresses by breaking the line, but if you paste the reading line together it can also do this:

  & Gt; MyText = paste (readLines ("testmail.txt"), Fall = "") & gt; Email = Regiments (MyText, Grayzspe ("([_ a-z0-9 -] + (\\. [_A-z0-9 -] +) * @ [a-z0-9 -] + (\\. [Az0-9 -] +) * (\\. [Az] {2,4}) ", myText ())> Email [[1]] [1]" fred@foo.com " "Joe@example.com" "twit@here.com"  

In this case only myText because we have affixed all the rows together, so return The given list is the only element in the email object.

Note that the regex string is not strictly defined for valid email address, for example, it is limited to those addresses Which end I have between 2 and 4 characters after the point, so it does not match fred@foo.fnord . The top-level domains are more than four characters, so you can modify the regex

In addition, it only matches alphanumeric and dot in the name part - a valid address such as foo + Bar@google.com .

A regex that fixes these two problems, may be:

  "([_ + A-z0-9 -] (\\ [_ + one-Z0-9 -.] +) * @ [One-Z0-9 -] (\\ [one-z0-9 -.] +) * (\ \. [Az] {2,14}) " 

But there are probably other problems in it and you would be better at searching for better e-mail address regex online because I say better, because one The ideal person does not exist ...


Comments

Popular posts from this blog

sqlite3 - UPDATE a table from the SELECT of another one -

c# - Showing a SelectedItem's Property -

javascript - Render HTML after each iteration in loop -