Python re to separate some data values

Bruce Labitt bruce.labitt at myfairpoint.net
Wed Apr 28 17:57:15 EDT 2021


If someone could suggest how to do this, I'd appreciate it.  I've 
scraped a table of fine thread metric screw parameters from a website.  
I'm having some trouble with regex (re) separating the numbers.  Have 
everything working save for this last bit.

Here is a sample string:

r1[1] = ' 17.98017.87417.65517.59917.43917.291'

I'm trying to separate the numbers.  It should read like this:

17.980, 17.874, 17.655, 17.599, 17.439, 17.291

There's more than 200 lines of this, so it would be great to automate 
it!  Each number has 3 digits of precision, so I want to add a comma and 
a space after the third digit.

re.search('(\.)\d{3,3}', r1[1]) returns
<re.Match object; span=(3, 7), match='.980'> so it found the first instance.

But, re.sub('(\.)\d{3,3}', '(\.)\d{3,3}, ', r1[1]) yields a KeyError: 
'\\d' (Python3.8).  Get bad escape \d at position 4.

And, if one adds enough escapes to avoid a KeyError, the function 
actually does nothing, since Out[117] is the same as r1[1]

In [117]: re.sub('(\.)\\\d{3,3}', '(\.)\\\d{3,3}, ', r1[1])
Out[117]: ' 17.98017.87417.65517.59917.43917.291'

I've looked in https://www.w3schools.com/python/python_regex.asp, 
https://docs.python.org/3/library/re.html, 
https://docs.python.org/3.8/howto/regex.html, 
https://www.guru99.com/python-regular-expressions-complete-tutorial.html#2, 
https://www.makeuseof.com/regular-expressions-python/, and 
https://www.dataquest.io/blog/regular-expressions-data-scientists/ and 
https://realpython.com/regex-python/

Is there a way to do this with re?  re.finditer seems to work ok, it 
finds all the indices correctly.

In [121]: it = re.finditer('(\.)\d{3,3}', r1[1])
In [122]: next(it)
Out[122]: <re.Match object; span=(3, 7), match='.980'>
In [123]: next(it)
Out[123]: <re.Match object; span=(9, 13), match='.874'>
In [124]: next(it)
Out[124]: <re.Match object; span=(15, 19), match='.655'>
In [125]: next(it)
Out[125]: <re.Match object; span=(21, 25), match='.599'>
In [126]: next(it)
Out[126]: <re.Match object; span=(27, 31), match='.439'>
In [127]: next(it)
Out[127]: <re.Match object; span=(33, 37), match='.291'>

Suppose I could brute force it at this point, but one would think 
re.sub  should work, if the magic flooby dust was appropriately 
sprinkled about.  I'm clearly missing something important.  Anyone got a 
hint?







More information about the gnhlug-discuss mailing list