-
Notifications
You must be signed in to change notification settings - Fork 2
Home
Peter edited this page Jan 14, 2018
·
4 revisions
Welcome to the unitex-pt-br wiki!
To version control and spreadshit editition big files is difficult to maintain.
-
To split DELAS: chunks ranging from ~1000 to ~3000 lines.
Example:grep -E "^a[a-f]" DELAS.csv | wc -l
(2605),^a[g-m]
(2185),^a[n-q]
(2023),^a[r-z]
(2743),^b
(3132),^c[a-g]
(2927),^c[h-n]
(1242), ... -
To split DELACF: chunks of ~2000 lines.
Example:grep -E "^a-m" DELAS.csv | wc -l
(2332),^[n-z]
(1745).
To test and to show convertion algorithms, use some basic samples... Need to check the most frequent ones... Electing random ones:
Most and less frequent graphs: select graph, count(*) as n from dataset.vw2_delas group by 1 order by 2 desc
graph | n |
---|---|
A201 | 9984 |
N001 | 9402 |
V005 | 9381 |
N101 | 9053 |
A301 | 3856 |
... | ... |
ADV | 2628 |
N301 | 1730 |
N004+Pr | 1723 |
A218 | 1702 |
... | |
A201D081 | 739 |
... | |
A001D024 | 1 |
A001D026A01 | 1 |
... | |
A011 | 1 |
A038 | 1 |
A039 | 1 |
... | ... |