When evaluating file specifications to create file collections, we should follow this:
- If a file has a known extensions, mark it as text or binary based on the dictionary (implemented)
- Include files that do not have a file extension, and files with extensions not covered by the dictionary
- Guess if the file is (canonically) text, otherwise mark them as binary
- Document this flow in the specification section
When evaluating file specifications to create file collections, we should follow this: