几天前,我研究了从数字的字符串中删除尾随零的问题。
在该问题的连续性中,我发现这个问题很有趣,因为它将问题扩展到了包含逗号的数字。
我采用了我在以前处理的上一个问题中编写的正则表达式的模式,并对其进行了改进,以便可以用逗号将数字视为该问题的答案。
我对自己的热情和对正则表达式的喜爱使我无所适从。我不知道结果是否完全符合迈克尔·普雷斯科特(Michael Prescott)表示的需求。我想知道我的正则表达式中过多或不足的地方,并对其进行更正以使其更适合您。
现在,经过长时间的正则表达式工作之后,我的脑袋有了很大的重量,因此我还不够新鲜,无法给出很多解释。如果要点不清楚,或者是否有人对它感兴趣,请问我。
正则表达式是建立,以便它可以检测以科学计数法表示的数字2E10甚至5,22,454.12E-00.0478,在这些数字的两个部分删除不必要的零了。如果指数等于零,则会修改数字,以使不再有指数。
我在模式中进行了一些验证,以使某些特定情况不匹配,例如“ 12..57”将不匹配。但是在',111'中,字符串'111'匹配,因为前面的逗号被视为不是逗号,而不是数字的逗号。
我认为应该改进逗号的管理,因为在我看来,印度编号中逗号之间只有2位数字。我想纠正不会困难
以下是演示我的正则表达式工作方式的代码。有两个函数,根据一个函数是否要将数字“ .1245”转换为“ 0.1245”。如果在某些情况下仍然存在错误或不必要的匹配或不匹配,我不会感到惊讶。那么我想知道这些情况,以了解并纠正缺陷。
我为用Python编写的代码道歉,但是正则表达式是跨语言的,我认为每个人都可以理解reex的模式
import re
regx = re.compile('(?<![\d.])(?!\.\.)(?<![\d.][eE][+-])(?<![\d.][eE])(?<!\d[.,])'
'' #---------------------------------
'([+-]?)'
'(?![\d,]*?\.[\d,]*?\.[\d,]*?)'
'(?:0|,(?=0)|(?<!\d),)*'
'(?:'
'((?:\d(?!\.[1-9])|,(?=\d))+)[.,]?'
'|\.(0)'
'|((?<!\.)\.\d+?)'
'|([\d,]+\.\d+?))'
'0*'
'' #---------------------------------
'(?:'
'([eE][+-]?)(?:0|,(?=0))*'
'(?:'
'(?!0+(?=\D|\Z))((?:\d(?!\.[1-9])|,(?=\d))+)[.,]?'
'|((?<!\.)\.(?!0+(?=\D|\Z))\d+?)'
'|([\d,]+\.(?!0+(?=\D|\Z))\d+?))'
'0*'
')?'
'' #---------------------------------
'(?![.,]?\d)')
def dzs_numbs(x,regx = regx): # ds = detect and zeros-shave
if not regx.findall(x):
yield ('No match,', 'No catched string,', 'No groups.')
for mat in regx.finditer(x):
yield (mat.group(), ''.join(mat.groups('')), mat.groups(''))
def dzs_numbs2(x,regx = regx): # ds = detect and zeros-shave
if not regx.findall(x):
yield ('No match,', 'No catched string,', 'No groups.')
for mat in regx.finditer(x):
yield (mat.group(),
''.join(('0' if n.startswith('.') else '')+n for n in mat.groups('')),
mat.groups(''))
NS = [' 23456000and23456000. or23456000.000 00023456000 s000023456000. 000023456000.000 ',
'arf 10000 sea10000.+10000.000 00010000-00010000. kant00010000.000 ',
' 24: 24, 24. 24.000 24.000, 00024r 00024. blue 00024.000 ',
' 8zoom8. 8.000 0008 0008. and0008.000 ',
' 0 00000M0. = 000. 0.0 0.000 000.0 000.000 .000000 .0 ',
' .0000023456 .0000023456000 '
' .0005872 .0005872000 .00503 .00503000 ',
' .068 .0680000 .8 .8000 .123456123456 .123456123456000 ',
' .657 .657000 .45 .4500000 .7 .70000 0.0000023230000 000.0000023230000 ',
' 0.0081000 0000.0081000 0.059000 0000.059000 ',
' 0.78987400000 snow 00000.78987400000 0.4400000 00000.4400000 ',
' -0.5000 -0000.5000 0.90 000.90 0.7 000.7 ',
' 2.6 00002.6 00002.60000 4.71 0004.71 0004.7100 ',
' 23.49 00023.49 00023.490000 103.45 0000103.45 0000103.45000 ',
' 10003.45067 000010003.45067 000010003.4506700 ',
' +15000.0012 +000015000.0012 +000015000.0012000 ',
' 78000.89 000078000.89 000078000.89000 ',
' .0457e10 .0457000e10 00000.0457000e10 ',
' 258e8 2580000e4 0000000002580000e4 ',
' 0.782e10 0000.782e10 0000.7820000e10 ',
' 1.23E2 0001.23E2 0001.2300000E2 ',
' 432e-102 0000432e-102 004320000e-106 ',
' 1.46e10and0001.46e10 0001.4600000e10 ',
' 1.077e-300 0001.077e-300 0001.077000e-300 ',
' 1.069e10 0001.069e10 0001.069000e10 ',
' 105040.03e10 000105040.03e10 105040.0300e10 ',
' +286E000024.487900 -78.4500e.14500 .0140E789. ',
' 081,12.40E07,95.0120 0045,78,123.03500e-0.00 ',
' 0096,78,473.0380e-0. 0008,78,373.066000E0. 0004512300.E0000 ',
' ..18000 25..00 36...77 2..8 ',
' 3.8..9 .12500. 12.51.400 ',
' 00099,111.8713000 -0012,45,83,987.26+0.000,099,88,44.or00,00,00.00must',
' 00099,44,and 0000,099,88,44.bom',
'00,000,00.587000 77,98,23,45., this,that ',
' ,111 145.20 +9,9,9 0012800 .,,. 1 100,000 ',
'1,1,1.111 000,001.111 -999. 0. 111.110000 1.1.1.111 9.909,888']
for ch in NS:
print 'string: '+repr(ch)
for strmatch, modified, the_groups in dzs_numbs2(ch):
print strmatch.rjust(20),'',modified,'',the_groups
print
结果
string: ' 23456000and23456000. or23456000.000 00023456000 s000023456000. 000023456000.000 '
23456000 23456000 ('', '23456000', '', '', '', '', '', '', '')
23456000. 23456000 ('', '23456000', '', '', '', '', '', '', '')
23456000.000 23456000 ('', '23456000', '', '', '', '', '', '', '')
00023456000 23456000 ('', '23456000', '', '', '', '', '', '', '')
000023456000. 23456000 ('', '23456000', '', '', '', '', '', '', '')
000023456000.000 23456000 ('', '23456000', '', '', '', '', '', '', '')
string: 'arf 10000 sea10000.+10000.000 00010000-00010000. kant00010000.000 '
10000 10000 ('', '10000', '', '', '', '', '', '', '')
10000. 10000 ('', '10000', '', '', '', '', '', '', '')
10000.000 10000 ('', '10000', '', '', '', '', '', '', '')
00010000 10000 ('', '10000', '', '', '', '', '', '', '')
00010000. 10000 ('', '10000', '', '', '', '', '', '', '')
00010000.000 10000 ('', '10000', '', '', '', '', '', '', '')
string: ' 24: 24, 24. 24.000 24.000, 00024r 00024. blue 00024.000 '
24 24 ('', '24', '', '', '', '', '', '', '')
24, 24 ('', '24', '', '', '', '', '', '', '')
24. 24 ('', '24', '', '', '', '', '', '', '')
24.000 24 ('', '24', '', '', '', '', '', '', '')
24.000 24 ('', '24', '', '', '', '', '', '', '')
00024 24 ('', '24', '', '', '', '', '', '', '')
00024. 24 ('', '24', '', '', '', '', '', '', '')
00024.000 24 ('', '24', '', '', '', '', '', '', '')
string: ' 8zoom8. 8.000 0008 0008. and0008.000 '
8 8 ('', '8', '', '', '', '', '', '', '')
8. 8 ('', '8', '', '', '', '', '', '', '')
8.000 8 ('', '8', '', '', '', '', '', '', '')
0008 8 ('', '8', '', '', '', '', '', '', '')
0008. 8 ('', '8', '', '', '', '', '', '', '')
0008.000 8 ('', '8', '', '', '', '', '', '', '')
string: ' 0 00000M0. = 000. 0.0 0.000 000.0 000.000 .000000 .0 '
0 0 ('', '0', '', '', '', '', '', '', '')
00000 0 ('', '0', '', '', '', '', '', '', '')
0. 0 ('', '0', '', '', '', '', '', '', '')
000. 0 ('', '0', '', '', '', '', '', '', '')
0.0 0 ('', '', '0', '', '', '', '', '', '')
0.000 0 ('', '', '0', '', '', '', '', '', '')
000.0 0 ('', '', '0', '', '', '', '', '', '')
000.000 0 ('', '', '0', '', '', '', '', '', '')
.000000 0 ('', '', '0', '', '', '', '', '', '')
.0 0 ('', '', '0', '', '', '', '', '', '')
string: ' .0000023456 .0000023456000 .0005872 .0005872000 .00503 .00503000 '
.0000023456 0.0000023456 ('', '', '', '.0000023456', '', '', '', '', '')
.0000023456000 0.0000023456 ('', '', '', '.0000023456', '', '', '', '', '')
.0005872 0.0005872 ('', '', '', '.0005872', '', '', '', '', '')
.0005872000 0.0005872 ('', '', '', '.0005872', '', '', '', '', '')
.00503 0.00503 ('', '', '', '.00503', '', '', '', '', '')
.00503000 0.00503 ('', '', '', '.00503', '', '', '', '', '')
string: ' .068 .0680000 .8 .8000 .123456123456 .123456123456000 '
.068 0.068 ('', '', '', '.068', '', '', '', '', '')
.0680000 0.068 ('', '', '', '.068', '', '', '', '', '')
.8 0.8 ('', '', '', '.8', '', '', '', '', '')
.8000 0.8 ('', '', '', '.8', '', '', '', '', '')
.123456123456 0.123456123456 ('', '', '', '.123456123456', '', '', '', '', '')
.123456123456000 0.123456123456 ('', '', '', '.123456123456', '', '', '', '', '')
string: ' .657 .657000 .45 .4500000 .7 .70000 0.0000023230000 000.0000023230000 '
.657 0.657 ('', '', '', '.657', '', '', '', '', '')
.657000 0.657 ('', '', '', '.657', '', '', '', '', '')
.45 0.45 ('', '', '', '.45', '', '', '', '', '')
.4500000 0.45 ('', '', '', '.45', '', '', '', '', '')
.7 0.7 ('', '', '', '.7', '', '', '', '', '')
.70000 0.7 ('', '', '', '.7', '', '', '', '', '')
0.0000023230000 0.000002323 ('', '', '', '.000002323', '', '', '', '', '')
000.0000023230000 0.000002323 ('', '', '', '.000002323', '', '', '', '', '')
string: ' 0.0081000 0000.0081000 0.059000 0000.059000 '
0.0081000 0.0081 ('', '', '', '.0081', '', '', '', '', '')
0000.0081000 0.0081 ('', '', '', '.0081', '', '', '', '', '')
0.059000 0.059 ('', '', '', '.059', '', '', '', '', '')
0000.059000 0.059 ('', '', '', '.059', '', '', '', '', '')
string: ' 0.78987400000 snow 00000.78987400000 0.4400000 00000.4400000 '
0.78987400000 0.789874 ('', '', '', '.789874', '', '', '', '', '')
00000.78987400000 0.789874 ('', '', '', '.789874', '', '', '', '', '')
0.4400000 0.44 ('', '', '', '.44', '', '', '', '', '')
00000.4400000 0.44 ('', '', '', '.44', '', '', '', '', '')
string: ' -0.5000 -0000.5000 0.90 000.90 0.7 000.7 '
-0.5000 -0.5 ('-', '', '', '.5', '', '', '', '', '')
-0000.5000 -0.5 ('-', '', '', '.5', '', '', '', '', '')
0.90 0.9 ('', '', '', '.9', '', '', '', '', '')
000.90 0.9 ('', '', '', '.9', '', '', '', '', '')
0.7 0.7 ('', '', '', '.7', '', '', '', '', '')
000.7 0.7 ('', '', '', '.7', '', '', '', '', '')
string: ' 2.6 00002.6 00002.60000 4.71 0004.71 0004.7100 '
2.6 2.6 ('', '', '', '', '2.6', '', '', '', '')
00002.6 2.6 ('', '', '', '', '2.6', '', '', '', '')
00002.60000 2.6 ('', '', '', '', '2.6', '', '', '', '')
4.71 4.71 ('', '', '', '', '4.71', '', '', '', '')
0004.71 4.71 ('', '', '', '', '4.71', '', '', '', '')
0004.7100 4.71 ('', '', '', '', '4.71', '', '', '', '')
string: ' 23.49 00023.49 00023.490000 103.45 0000103.45 0000103.45000 '
23.49 23.49 ('', '', '', '', '23.49', '', '', '', '')
00023.49 23.49 ('', '', '', '', '23.49', '', '', '', '')
00023.490000 23.49 ('', '', '', '', '23.49', '', '', '', '')
103.45 103.45 ('', '', '', '', '103.45', '', '', '', '')
0000103.45 103.45 ('', '', '', '', '103.45', '', '', '', '')
0000103.45000 103.45 ('', '', '', '', '103.45', '', '', '', '')
string: ' 10003.45067 000010003.45067 000010003.4506700 '
10003.45067 10003.45067 ('', '', '', '', '10003.45067', '', '', '', '')
000010003.45067 10003.45067 ('', '', '', '', '10003.45067', '', '', '', '')
000010003.4506700 10003.45067 ('', '', '', '', '10003.45067', '', '', '', '')
string: ' +15000.0012 +000015000.0012 +000015000.0012000 '
+15000.0012 +15000.0012 ('+', '', '', '', '15000.0012', '', '', '', '')
+000015000.0012 +15000.0012 ('+', '', '', '', '15000.0012', '', '', '', '')
+000015000.0012000 +15000.0012 ('+', '', '', '', '15000.0012', '', '', '', '')
string: ' 78000.89 000078000.89 000078000.89000 '
78000.89 78000.89 ('', '', '', '', '78000.89', '', '', '', '')
000078000.89 78000.89 ('', '', '', '', '78000.89', '', '', '', '')
000078000.89000 78000.89 ('', '', '', '', '78000.89', '', '', '', '')
string: ' .0457e10 .0457000e10 00000.0457000e10 '
.0457e10 0.0457e10 ('', '', '', '.0457', '', 'e', '10', '', '')
.0457000e10 0.0457e10 ('', '', '', '.0457', '', 'e', '10', '', '')
00000.0457000e10 0.0457e10 ('', '', '', '.0457', '', 'e', '10', '', '')
string: ' 258e8 2580000e4 0000000002580000e4 '
258e8 258e8 ('', '258', '', '', '', 'e', '8', '', '')
2580000e4 2580000e4 ('', '2580000', '', '', '', 'e', '4', '', '')
0000000002580000e4 2580000e4 ('', '2580000', '', '', '', 'e', '4', '', '')
string: ' 0.782e10 0000.782e10 0000.7820000e10 '
0.782e10 0.782e10 ('', '', '', '.782', '', 'e', '10', '', '')
0000.782e10 0.782e10 ('', '', '', '.782', '', 'e', '10', '', '')
0000.7820000e10 0.782e10 ('', '', '', '.782', '', 'e', '10', '', '')
string: ' 1.23E2 0001.23E2 0001.2300000E2 '
1.23E2 1.23E2 ('', '', '', '', '1.23', 'E', '2', '', '')
0001.23E2 1.23E2 ('', '', '', '', '1.23', 'E', '2', '', '')
0001.2300000E2 1.23E2 ('', '', '', '', '1.23', 'E', '2', '', '')
string: ' 432e-102 0000432e-102 004320000e-106 '
432e-102 432e-102 ('', '432', '', '', '', 'e-', '102', '', '')
0000432e-102 432e-102 ('', '432', '', '', '', 'e-', '102', '', '')
004320000e-106 4320000e-106 ('', '4320000', '', '', '', 'e-', '106', '', '')
string: ' 1.46e10and0001.46e10 0001.4600000e10 '
1.46e10 1.46e10 ('', '', '', '', '1.46', 'e', '10', '', '')
0001.46e10 1.46e10 ('', '', '', '', '1.46', 'e', '10', '', '')
0001.4600000e10 1.46e10 ('', '', '', '', '1.46', 'e', '10', '', '')
string: ' 1.077e-300 0001.077e-300 0001.077000e-300 '
1.077e-300 1.077e-300 ('', '', '', '', '1.077', 'e-', '300', '', '')
0001.077e-300 1.077e-300 ('', '', '', '', '1.077', 'e-', '300', '', '')
0001.077000e-300 1.077e-300 ('', '', '', '', '1.077', 'e-', '300', '', '')
string: ' 1.069e10 0001.069e10 0001.069000e10 '
1.069e10 1.069e10 ('', '', '', '', '1.069', 'e', '10', '', '')
0001.069e10 1.069e10 ('', '', '', '', '1.069', 'e', '10', '', '')
0001.069000e10 1.069e10 ('', '', '', '', '1.069', 'e', '10', '', '')
string: ' 105040.03e10 000105040.03e10 105040.0300e10 '
105040.03e10 105040.03e10 ('', '', '', '', '105040.03', 'e', '10', '', '')
000105040.03e10 105040.03e10 ('', '', '', '', '105040.03', 'e', '10', '', '')
105040.0300e10 105040.03e10 ('', '', '', '', '105040.03', 'e', '10', '', '')
string: ' +286E000024.487900 -78.4500e.14500 .0140E789. '
+286E000024.487900 +286E24.4879 ('+', '286', '', '', '', 'E', '', '', '24.4879')
-78.4500e.14500 -78.45e0.145 ('-', '', '', '', '78.45', 'e', '', '.145', '')
.0140E789. 0.014E789 ('', '', '', '.014', '', 'E', '789', '', '')
string: ' 081,12.40E07,95.0120 0045,78,123.03500e-0.00 '
081,12.40E07,95.0120 81,12.4E7,95.012 ('', '', '', '', '81,12.4', 'E', '', '', '7,95.012')
0045,78,123.03500 45,78,123.035 ('', '', '', '', '45,78,123.035', '', '', '', '')
string: ' 0096,78,473.0380e-0. 0008,78,373.066000E0. 0004512300.E0000 '
0096,78,473.0380 96,78,473.038 ('', '', '', '', '96,78,473.038', '', '', '', '')
0008,78,373.066000 8,78,373.066 ('', '', '', '', '8,78,373.066', '', '', '', '')
0004512300. 4512300 ('', '4512300', '', '', '', '', '', '', '')
string: ' ..18000 25..00 36...77 2..8 '
No match, No catched string, No groups.
string: ' 3.8..9 .12500. 12.51.400 '
No match, No catched string, No groups.
string: ' 00099,111.8713000 -0012,45,83,987.26+0.000,099,88,44.or00,00,00.00must'
00099,111.8713000 99,111.8713 ('', '', '', '', '99,111.8713', '', '', '', '')
-0012,45,83,987.26 -12,45,83,987.26 ('-', '', '', '', '12,45,83,987.26', '', '', '', '')
00,00,00.00 0 ('', '', '0', '', '', '', '', '', '')
string: ' 00099,44,and 0000,099,88,44.bom'
00099,44, 99,44 ('', '99,44', '', '', '', '', '', '', '')
0000,099,88,44. 99,88,44 ('', '99,88,44', '', '', '', '', '', '', '')
string: '00,000,00.587000 77,98,23,45., this,that '
00,000,00.587000 0.587 ('', '', '', '.587', '', '', '', '', '')
77,98,23,45. 77,98,23,45 ('', '77,98,23,45', '', '', '', '', '', '', '')
string: ' ,111 145.20 +9,9,9 0012800 .,,. 1 100,000 '
,111 111 ('', '111', '', '', '', '', '', '', '')
145.20 145.2 ('', '', '', '', '145.2', '', '', '', '')
+9,9,9 +9,9,9 ('+', '9,9,9', '', '', '', '', '', '', '')
0012800 12800 ('', '12800', '', '', '', '', '', '', '')
1 1 ('', '1', '', '', '', '', '', '', '')
100,000 100,000 ('', '100,000', '', '', '', '', '', '', '')
string: '1,1,1.111 000,001.111 -999. 0. 111.110000 1.1.1.111 9.909,888'
1,1,1.111 1,1,1.111 ('', '', '', '', '1,1,1.111', '', '', '', '')
000,001.111 1.111 ('', '', '', '', '1.111', '', '', '', '')
-999. -999 ('-', '999', '', '', '', '', '', '', '')
0. 0 ('', '0', '', '', '', '', '', '', '')
111.110000 111.11 ('', '', '', '', '111.11', '', '', '', '')