-
Notifications
You must be signed in to change notification settings - Fork 233
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix: add parse tests for every supported extensions (#198)
* fix: add parse tests for every supported extensions * add: each parser has supported FileExtensions * fix: ValueError for unsupported extensions * fix: python version required * fix: python version * fix: python version
- Loading branch information
Showing
31 changed files
with
175 additions
and
199 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
Empty file.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
File renamed without changes.
This file was deleted.
Oops, something went wrong.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
Name,Description | ||
MegaParse,"MegaParse is the best parser, even with accents like é, è, and ñ." | ||
OtherParse,"OtherParse is a decent parser, but it struggles with accents." | ||
RandomParse,"RandomParse is another parser, but it often fails with special characters." |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# The Difficulty of Parsing Files | ||
|
||
Parsing files can be a challenging task due to several factors: | ||
|
||
## 1. File Format Variability | ||
Different file formats (e.g., JSON, XML, CSV) require different parsing techniques. Each format has its own structure and rules, making it necessary to handle each one uniquely. | ||
|
||
## 2. Inconsistent Data | ||
Files often contain inconsistent or malformed data. Handling these inconsistencies requires robust error-checking and validation mechanisms. | ||
|
||
## 3. Large File Sizes | ||
Parsing large files can be resource-intensive and time-consuming. Efficient algorithms and memory management techniques are essential to handle large datasets. | ||
|
||
## 4. Encoding Issues | ||
Files may use different character encodings (e.g., UTF-8, ASCII). Properly detecting and handling these encodings is crucial to avoid data corruption. | ||
|
||
## 5. Nested Structures | ||
Some file formats, like JSON and XML, can have deeply nested structures. Parsing these nested structures requires recursive algorithms and careful handling of hierarchical data. | ||
|
||
## Conclusion | ||
Despite these challenges, effective file parsing is essential for data processing and analysis. By understanding and addressing these difficulties, developers can create robust parsers that handle a wide variety of file formats and data inconsistencies. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# The Difficulty of Parsing Files | ||
|
||
Parsing files can be a challenging task due to several factors: | ||
|
||
## 1. File Format Variability | ||
Different file formats (e.g., JSON, XML, CSV) require different parsing techniques. Each format has its own structure and rules, making it necessary to handle each one uniquely. | ||
|
||
## 2. Inconsistent Data | ||
Files often contain inconsistent or malformed data. Handling these inconsistencies requires robust error-checking and validation mechanisms. | ||
|
||
## 3. Large File Sizes | ||
Parsing large files can be resource-intensive and time-consuming. Efficient algorithms and memory management techniques are essential to handle large datasets. | ||
|
||
## 4. Encoding Issues | ||
Files may use different character encodings (e.g., UTF-8, ASCII). Properly detecting and handling these encodings is crucial to avoid data corruption. | ||
|
||
## 5. Nested Structures | ||
Some file formats, like JSON and XML, can have deeply nested structures. Parsing these nested structures requires recursive algorithms and careful handling of hierarchical data. | ||
|
||
## Conclusion | ||
Despite these challenges, effective file parsing is essential for data processing and analysis. By understanding and addressing these difficulties, developers can create robust parsers that handle a wide variety of file formats and data inconsistencies. |
Binary file not shown.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
Lorem ipsum | ||
|
||
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc ac faucibus odio. | ||
|
||
Vestibulum neque massa, scelerisque sit amet ligula eu, congue molestie mi. Praesent ut varius sem. Nullam at porttitor arcu, nec lacinia nisi. Ut ac dolor vitae odio interdum condimentum. Vivamus dapibus sodales ex, vitae malesuada ipsum cursus convallis. Maecenas sed egestas nulla, ac condimentum orci. Mauris diam felis, vulputate ac suscipit et, iaculis non est. Curabitur semper arcu ac ligula semper, nec luctus nisl blandit. Integer lacinia ante ac libero lobortis imperdiet. Nullam mollis convallis ipsum, ac accumsan nunc vehicula vitae. Nulla eget justo in felis tristique fringilla. Morbi sit amet tortor quis risus auctor condimentum. Morbi in ullamcorper elit. Nulla iaculis tellus sit amet mauris tempus fringilla. | ||
Maecenas mauris lectus, lobortis et purus mattis, blandit dictum tellus. | ||
Maecenas non lorem quis tellus placerat varius. | ||
Nulla facilisi. | ||
Aenean congue fringilla justo ut aliquam. | ||
Mauris id ex erat. Nunc vulputate neque vitae justo facilisis, non condimentum ante sagittis. | ||
Morbi viverra semper lorem nec molestie. | ||
Maecenas tincidunt est efficitur ligula euismod, sit amet ornare est vulputate. | ||
https://github.com/QuivrHQ/MegaParse |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
<?xml version="1.0"?> | ||
<customers> | ||
<customer id="55000"> | ||
<name>Charter Group</name> | ||
<address> | ||
<street>100 Main</street> | ||
<city>Framingham</city> | ||
<state>MA</state> | ||
<zip>01701</zip> | ||
</address> | ||
<address> | ||
<street>720 Prospect</street> | ||
<city>Framingham</city> | ||
<state>MA</state> | ||
<zip>01701</zip> | ||
</address> | ||
<address> | ||
<street>120 Ridge</street> | ||
<state>MA</state> | ||
<zip>01760</zip> | ||
</address> | ||
</customer> | ||
</customers> |
File renamed without changes.
Binary file not shown.
Oops, something went wrong.