Python re to separate some data values
Bruce Labitt
bruce.labitt at myfairpoint.net
Wed Apr 28 17:57:15 EDT 2021
If someone could suggest how to do this, I'd appreciate it. I've
scraped a table of fine thread metric screw parameters from a website.
I'm having some trouble with regex (re) separating the numbers. Have
everything working save for this last bit.
Here is a sample string:
r1[1] = ' 17.98017.87417.65517.59917.43917.291'
I'm trying to separate the numbers. It should read like this:
17.980, 17.874, 17.655, 17.599, 17.439, 17.291
There's more than 200 lines of this, so it would be great to automate
it! Each number has 3 digits of precision, so I want to add a comma and
a space after the third digit.
re.search('(\.)\d{3,3}', r1[1]) returns
<re.Match object; span=(3, 7), match='.980'> so it found the first instance.
But, re.sub('(\.)\d{3,3}', '(\.)\d{3,3}, ', r1[1]) yields a KeyError:
'\\d' (Python3.8). Get bad escape \d at position 4.
And, if one adds enough escapes to avoid a KeyError, the function
actually does nothing, since Out[117] is the same as r1[1]
In [117]: re.sub('(\.)\\\d{3,3}', '(\.)\\\d{3,3}, ', r1[1])
Out[117]: ' 17.98017.87417.65517.59917.43917.291'
I've looked in https://www.w3schools.com/python/python_regex.asp,
https://docs.python.org/3/library/re.html,
https://docs.python.org/3.8/howto/regex.html,
https://www.guru99.com/python-regular-expressions-complete-tutorial.html#2,
https://www.makeuseof.com/regular-expressions-python/, and
https://www.dataquest.io/blog/regular-expressions-data-scientists/ and
https://realpython.com/regex-python/
Is there a way to do this with re? re.finditer seems to work ok, it
finds all the indices correctly.
In [121]: it = re.finditer('(\.)\d{3,3}', r1[1])
In [122]: next(it)
Out[122]: <re.Match object; span=(3, 7), match='.980'>
In [123]: next(it)
Out[123]: <re.Match object; span=(9, 13), match='.874'>
In [124]: next(it)
Out[124]: <re.Match object; span=(15, 19), match='.655'>
In [125]: next(it)
Out[125]: <re.Match object; span=(21, 25), match='.599'>
In [126]: next(it)
Out[126]: <re.Match object; span=(27, 31), match='.439'>
In [127]: next(it)
Out[127]: <re.Match object; span=(33, 37), match='.291'>
Suppose I could brute force it at this point, but one would think
re.sub should work, if the magic flooby dust was appropriately
sprinkled about. I'm clearly missing something important. Anyone got a
hint?
More information about the gnhlug-discuss
mailing list