Pre-cooked lexer definitions

I would like to write a library for processing a simple language. The parser level syntax I see will be simple enough that packages like lark are definitely overkill, but that leaves the lexical side. I really really don’t want to reinvent the wheel specifying regexps for floating point numbers, for instance. I had the clever idea of piggy backing on Python’s own lexical syntax, by using the tokenize standard library module - but unfortunately that won’t work because tokenize enforces indentation rules, and my language should not treat white space as significant. A shame, because I’d be perfectly happy with all my tokens exactly matching Python’s.

Among the many lexer generator packages on pypi, are there any that provide some pre-made definitions for more complex tokens like floats?

Hmm, I would go looking at the standard library JSON decoder. That has a good few of the most obvious primitives settled, but it’s still simple and fast. Specifically for floats, it uses a regular expression, which you could directly reference.

Whether that’s good enough for your purposes, I can’t say, but it’s a start.

Thanks, I didn’t know the internals of json were available like that. Yes, that is a good start.