Python re to separate some data values

Joshua Judson Rosen rozzin at hackerposse.com
Wed Apr 28 19:35:19 EDT 2021


On 4/28/21 7:01 PM, Bruce Labitt wrote:
> On 4/28/21 6:28 PM, Joshua Judson Rosen wrote:
>>> re.search('(\.)\d{3,3}', r1[1]) returns
>>> <re.Match object; span=(3, 7), match='.980'> so it found the first instance.
>>>
>>> But, re.sub('(\.)\d{3,3}', '(\.)\d{3,3}, ', r1[1]) yields a KeyError:
>>> '\\d' (Python3.8).  Get bad escape \d at position 4.
>> The second argument [the replacement string] to re.sub(pattern, repl, string) is not supposed to
>> just be a variation of the pattern-matching string that you passed as the first argument.
>>
>> I think the best illustration that I can give here is to just fix this up for you:
>>
>> 	re.sub(r'(\.)(\d{3,3})', r'\1\2, ', r1[1])
>>
> Thanks for the embarrassingly concise answer.  It is greatly 
> appreciated.  Can you explain the syntax of the 2nd argument?  I haven't 
> seen that before.  Where can I find further examples?
> 
> What astounds me is re.search allowed my 1st argument, but re.sub barfed 
> all over the same 1st argument.

Actually re.search also accepted your first argument just fine.
It was your _second_ argument that it barfed all over,
because your match didn't produce a "matched character group #d",
it only produced a "matched character group #1"
(IIRC Python's RE documentation generally just calls them "groups").

Note that I added a second set of parentheses to your _pattern_
so that you now have also a group #2.

I was trying to make the smallest change possible to your pattern,
but this also would work fine:

	re.sub(r'(\.\d{3,3})', r'\1, ', r1[1])


The "\1" (and "\2", in the previous example) are "references",
and are actually explained in an OK-ish way in the online Python library manual's
section for re:

	https://docs.python.org/3/library/re.html

(there are also a few other backreference syntaxes that you can use in Python,
 so that you can give non-numeric names to them or just avoid ambiguities like
 whether "\20" means `group #2 and then a literal "0"' or `group #20'...).

-- 
Connect with me on the GNU social network! <https://status.hackerposse.com/rozzin>
Not on the network? Ask me for more info!


More information about the gnhlug-discuss mailing list