Crypto, data analysis and BI商业智能，数据挖掘和比特币: 强悍的oracle正则表达式

以及for, while Loop和translate函数（可以替换指定多个字符，比replace省事）,以及unipivot等可以取代case

Translate and replace are very similar in their appearance but can produce very different results. Translate replaces by position, the first character of the list to match is replaced by the first character of the replacement list. The second character with the second, and if there are characters in the list to match that do not have positional equivalents in the replacements list they are dropped.

例子 Character Replacement
And Elimination The a is replaced with an e, the h has no complement and is dropped.
SELECT TRANSLATE('So What', 'ah', 'e')
FROM dual

psoug.org techonthenet.com

如何之保留field里面的数字？

1. SQL detect begin and end of number position

2. SQL use Translate() or RegExp to replace letter

3. Use excel/scripting/VBS to generate a nested Stripchar for CLEM expression

强悍的正则表达式

Looks for the pattern xxx.xxx.xxxx and reformats pattern to (xxx) xxx-xxxx col testcol format a15
col result format a15

SELECT testcol, REGEXP_REPLACE(testcol,
'([[:digit:]]{3})\.([[:digit:]]{3})\.([[:digit:]]{4})',
'(\1) \2-\3') RESULT
FROM test
WHERE LENGTH(testcol) = 12;

Values Starting with 'a%b' SELECT testcol
FROM test
WHERE REGEXP_LIKE(testcol, '^ab*')

select regexp_replace('TAYLOR''S AUTO PRODUCTS,INCýý59', '[^A-Za-z0-9,'' ]+',' ') a from dual;
VERY close except tab and `'

UPDATE myTable
SET column = REPLACE(column, CHAR(241), 'n') --> to replace ñ with n

Or

SELECT REPLACE(REPLACE(REPLACE(column, CHAR(241), ''), CHAR(185), ''), CHAR(nnn), '')

I've nested REPLACE statements 10-15 deep

单引号和反引号不同
Single backquote `又叫做
grave accent ot backtick
prime symbol ( ′ )
'`'`~

select regexp_replace('TAYLOR''S AUTO PRODUCTS,INCýý59', '[^A-Za-z0-9,'' ]+',' ') a from dual;
Assisted Solution

07.10.2008 at 01:26PM PDT, ID: 21977057

正则表达式替换例子
Rank: Sage
sdstuber:
the above will replace multiple invalid characters with a single space, if you want a one-for-one swap then remove the "+"

select regexp_replace('TAYLOR''S AUTO PRODUCTS,INCýý59', '[^A-Za-z0-9,'' ]',' ') a from dual;
bypass any valid and then replace irregular with space

VB:
Private Function StripChars(ByVal ToStrip As String) As String
Dim RegEx As New Regex("[\x7E-\xFF]")
Return RegEx.Replace(ToStrip, "")
End Function
End Class

General Information

Anchoring Characters
Character Class Description
^ Anchor the expression to the start of a line
$ Anchor the expression to the end of a line

Equivalence Classes
Character Class Description
= = Oracle supports the equivalence classes through the POSIX '[==]' syntax. A base letter and all of its accented versions constitute an equivalence class. For example, the equivalence class '[=a=]' matches � and �. The equivalence classes are valid only inside the bracketed expression

Match Options
Character Class Description
c Case sensitive matching
i Case insensitive matching
m Treat source string as multi-line activating Anchor chars
n Allow the period (.) to match any newline character

Posix Characters
Character Class Description
[:alnum:] Alphanumeric characters
[:alpha:] Alphabetic characters
[:blank:] Blank Space Characters
[:cntrl:] Control characters (nonprinting)
[:digit:] Numeric digits
[:graph:] Any [:punct:], [:upper:], [:lower:], and [:digit:] chars
[:lower:] Lowercase alphabetic characters
[:print:] Printable characters
[:punct:] Punctuation characters
[:space:] Space characters (nonprinting), such as carriage return, newline, vertical tab, and form feed
[:upper:] Uppercase alphabetic characters
[:xdigit:] Hexidecimal characters

Quantifier Characters
Character Class Description
* Match 0 or more times
? Match 0 or 1 time
+ Match 1 or more times
{m} Match exactly m times
{m,} Match at least m times
{m, n} Match at least m times but no more than n times
\n Cause the previous expression to be repeated n times

Alternative Matching And Grouping Characters
Character Class Description
| Separates alternates, often used with grouping operator ()
( ) Groups subexpression into a unit for alternations, for quantifiers, or for backreferencing (see "Backreferences" section)
[char] Indicates a character list; most metacharacters inside a character list are understood as literals, with the exception of character classes, and the ^ and - metacharacters

Demo Table CREATE TABLE test (
testcol VARCHAR2(50));

INSERT INTO test VALUES ('abcde');
INSERT INTO test VALUES ('12345');
INSERT INTO test VALUES ('1a4A5');
INSERT INTO test VALUES ('12a45');
INSERT INTO test VALUES ('12aBC');
INSERT INTO test VALUES ('12abc');
INSERT INTO test VALUES ('12ab5');
INSERT INTO test VALUES ('12aa5');
INSERT INTO test VALUES ('12AB5');
INSERT INTO test VALUES ('ABCDE');
INSERT INTO test VALUES ('123-5');
INSERT INTO test VALUES ('12.45');
INSERT INTO test VALUES ('1a4b5');
INSERT INTO test VALUES ('1 3 5');
INSERT INTO test VALUES ('1 45');
INSERT INTO test VALUES ('1 5');
INSERT INTO test VALUES ('a b c d');
INSERT INTO test VALUES ('a b c d e');
INSERT INTO test VALUES ('a e');
INSERT INTO test VALUES ('Steven');
INSERT INTO test VALUES ('Stephen');
INSERT INTO test VALUES ('111.222.3333');
INSERT INTO test VALUES ('222.333.4444');
INSERT INTO test VALUES ('333.444.5555');
INSERT INTO test VALUES ('abcdefabcdefabcxyz');
COMMIT;

REGEXP_COUNT
Syntax REGEXP_COUNT(, [[, ], []])

-- match parameter:
'c' = case sensitive
'i' = case insensitive search
'm' = treats the source string as multiple lines
'n' = allows the period (.) wild character to match newline
'x' = ignore whitespace characters

Count's occurrences based on a regular expression SELECT REGEXP_COUNT(testcol, '2a', 1, 'i') RESULT
FROM test;

SELECT REGEXP_COUNT(testcol, 'e', 1, 'i') RESULT
FROM test;

REGEXP_INSTR
Syntax REGEXP_INSTR(, [[, ][, ][, ][, ][, ]])

Find words beginning with 's' or 'r' or 'p' followed by any 4 alphabetic characters: case insensitive SELECT REGEXP_INSTR('500 Oracle Pkwy, Redwood Shores, CA', '[o][[:alpha:]]{3}', 1, 1, 0, 'i') RESULT
FROM dual;

SELECT REGEXP_INSTR('500 Oracle Pkwy, Redwood Shores, CA', '[o][[:alpha:]]{3}', 1, 1, 1, 'i') RESULT
FROM dual;

SELECT REGEXP_INSTR('500 Oracle Pkwy, Redwood Shores, CA', '[o][[:alpha:]]{3}', 1, 2, 0, 'i') RESULT
FROM dual;

SELECT REGEXP_INSTR('500 Oracle Pkwy, Redwood Shores, CA', '[o][[:alpha:]]{3}', 1, 2, 1, 'i') RESULT
FROM dual;
Find the position of try, trying, tried or tries SELECT REGEXP_INSTR('We are trying to make the subject easier.', 'tr(y(ing)?|(ied)|(ies))') RESULTNUM
FROM dual;
Using Sub-Expression option SELECT testcol, REGEXP_INSTR(testcol, 'ab', 1, 1, 0, 'i', 0)
FROM test;

SELECT testcol, REGEXP_INSTR(testcol, 'ab', 1, 1, 0, 'i', 1)
FROM test;

SELECT testcol, REGEXP_INSTR(testcol, 'a(b)', 1, 1, 0, 'i', 1)
FROM test;

REGEXP_LIKE
Syntax REGEXP_LIKE(, , )

AlphaNumeric Characters SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:alnum:]]');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:alnum:]]{3}');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:alnum:]]{5}');

Alphabetic Characters SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:alpha:]]');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:alpha:]]{3}');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:alpha:]]{5}');

Control Characters INSERT INTO test VALUES ('zyx' || CHR(13) || 'wvu');
COMMIT;

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:cntrl:]]{1}');

Digits SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:digit:]]');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:digit:]]{3}');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:digit:]]{5}');

Lower Case SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:lower:]]');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:lower:]]{2}');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:lower:]]{3}');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:lower:]]{5}');

Printable Characters SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:print:]]{5}');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:print:]]{6}');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:print:]]{7}');

Punctuation TRUNCATE TABLE test;

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:punct:]]');

Spaces SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:space:]]');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:space:]]{2}');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:space:]]{3}');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:space:]]{5}');

Upper Case SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:upper:]]');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:upper:]]{2}');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:upper:]]{3}');
Values Starting with 'a%b' SELECT testcol
FROM test
WHERE REGEXP_LIKE(testcol, '^ab*');
'a' is the third value SELECT testcol
FROM test
WHERE REGEXP_LIKE(testcol, '^..a.');
Contains two consecutive occurances of the letter 'a' or 'z' SELECT testcol
FROM test
WHERE REGEXP_LIKE(testcol, '([az])\1', 'i');
Begins with 'Ste' ends with 'en' and contains either 'v' or 'ph' in the center SELECT testcol
FROM test
WHERE REGEXP_LIKE(testcol, '^Ste(v|ph)en$');
Use a regular expression in a check constraint CREATE TABLE mytest (c1 VARCHAR2(20),
CHECK (REGEXP_LIKE(c1, '^[[:alpha:]]+$')));

Identify SSN

Thanks: Byron Bush HIOUG CREATE TABLE ssn_test (
ssn_col VARCHAR2(20));

INSERT INTO ssn_test VALUES ('111-22-3333');
INSERT INTO ssn_test VALUES ('111=22-3333');
INSERT INTO ssn_test VALUES ('111-A2-3333');
INSERT INTO ssn_test VALUES ('111-22-33339');
INSERT INTO ssn_test VALUES ('111-2-23333');
INSERT INTO ssn_test VALUES ('987-65-4321');
COMMIT;

SELECT ssn_col
from ssn_test
WHERE REGEXP_LIKE(ssn_col,'^[0-9]{3}-[0-9]{2}-[0-9]{4}$');

REGEXP_REPLACE
Syntax REGEXP_REPLACE(, ,
, , , )
Looks for the pattern xxx.xxx.xxxx and reformats pattern to (xxx) xxx-xxxx col testcol format a15
col result format a15

SELECT testcol, REGEXP_REPLACE(testcol,
'([[:digit:]]{3})\.([[:digit:]]{3})\.([[:digit:]]{4})',
'(\1) \2-\3') RESULT
FROM test
WHERE LENGTH(testcol) = 12;
Put a space after every character SELECT testcol, REGEXP_REPLACE(testcol, '(.)', '\1 ') RESULT
FROM test
WHERE testcol like 'S%';
Replace multiple spaces with a single space SELECT REGEXP_REPLACE('500 Oracle Parkway, Redwood Shores, CA', '( ){2,}', ' ') RESULT
FROM dual;
Insert a space between a lower case character followed by an upper case character SELECT REGEXP_REPLACE('George McGovern', '([[:lower:]])([[:upper:]])', '\1 \2') CITY
FROM dual;
Replace the period with a string (note use of '\') SELECT REGEXP_REPLACE('We are trying to make the subject easier.','\.',' for you.') REGEXT_SAMPLE
FROM dual;

Demo CREATE TABLE t(
testcol VARCHAR2(10));

INSERT INTO t VALUES ('1');
INSERT INTO t VALUES ('2 ');
INSERT INTO t VALUES ('3 new ');

col newval format a10

SELECT LENGTH(testcol) len, testcol origval,
REGEXP_REPLACE(testcol, '\W+$', ' ') newval,
LENGTH(REGEXP_REPLACE(testcol, '\W+$', ' ')) newlen
FROM t;

REGEXP_SUBSTR
Syntax REGEXP_SUBSTR(source_string, pattern
[, position [, occurrence
[, match_parameter]]])
Searches for a comma followed by one or more occurrences of non-comma characters followed by a comma SELECT REGEXP_SUBSTR('500 Oracle Parkway, Redwood Shores, CA', ',[^,]+,') RESULT
FROM dual;
Look for http:// followed by a substring of one or more alphanumeric characters and optionally, a period (.) col result format a50

SELECT REGEXP_SUBSTR('Go to http://www.oracle.com/products and click on database',
'http://([[:alnum:]]+\.?){3,4}/?') RESULT
FROM dual;
Extracts try, trying, tried or tries SELECT REGEXP_SUBSTR('We are trying to make the subject easier.','tr(y(ing)?|(ied)|(ies))')
FROM dual;
Extract the 3rd field treating ':' as a delimiter SELECT REGEXP_SUBSTR('system/pwd@orabase:1521:sidval',
'[^:]+', 1, 3) RESULT
FROM dual;

Extract from string with vertical bar delimiter CREATE TABLE regexp (
testcol VARCHAR2(50));

INSERT INTO regexp
(testcol)
VALUES
('One|Two|Three|Four|Five');

SELECT * FROM regexp;

SELECT REGEXP_SUBSTR(testcol,'[^|]+', 1, 3)
FROM regexp;
Equivalence classes SELECT REGEXP_SUBSTR('iSelfSchooling NOT ISelfSchooling', '[[=i=]]SelfSchooling') RESULT
FROM dual;

Parsing Demo set serveroutput on

DECLARE
x VARCHAR2(2);
y VARCHAR2(2);
c VARCHAR2(40) := '1:3,4:6,8:10,3:4,7:6,11:12';
BEGIN
x := REGEXP_SUBSTR(c,'[^:]+', 1, 1);
y := REGEXP_SUBSTR(c,'[^,]+', 3, 1);

dbms_output.put_line(x ||' '|| y);
END;
/

Crypto, data analysis and BI商业智能，数据挖掘和比特币

Friday, October 10, 2008

强悍的oracle正则表达式

0 Comments:

About Me

Previous Posts