Conversation
| else: | ||
| continue | ||
|
|
||
| def _convert_schema_dtypes(self): |
There was a problem hiding this comment.
Wondering if this is necessary? Without this function, when supplying "int", "str", etc, the types still get coerced to PySpark types, which I assume pandera is doing for us
There was a problem hiding this comment.
I found that the schema was being formatted correctly but without converting from strings into pyspark dtypes it would not actually trigger the type checks or other checks. I can take a look into it more in the future and will add a backlog ticket to review
| "float": T.FloatType(), | ||
| "string": T.StringType(), | ||
| "str": T.StringType(), | ||
| "bool": T.BooleanType(), |
There was a problem hiding this comment.
Anything for timestamps?
There was a problem hiding this comment.
Nice spot will add now
ChrisSoderberg-ONS
left a comment
There was a problem hiding this comment.
Just a couple of comments on the PySparkValidator class
|
Thanks Chris, I've updated and added support for date, datetime and timestamp types. Will expand and check these work when further examples and tests are developed (would do it here but this branch is already getting quite big!) |
|
@Jday7879 approved! |
Proposed Changes
Related Issues
Pre-requisites
This section may not be fully required if the branch is not merging into main.
Please indicate items that aren't necessary and why, with comments around incomplete checks.
been made for it/them