Our full technical support staff does not monitor this forum. If you need assistance from a member of our staff, please submit your question from the Ask a Question page.


Log in or register to post/reply in the forum.

Using Split 2.6 to divide large data files into 24hour data files


StephanieNUIM Feb 29, 2012 04:12 PM

Hello,

I would like to use the Split program to divide large data files (approx. 5GB or 6 weeks) into smaller, more easily analyzed files (i.e. 24 hour periods). My version of LoggerNet is relatively old (LoggerNet 3.4.1 and Split 2.6) but it is all I have to work with.

I am recording data at a frequency of 10Hz using a CR3000 and NL115. I want to split the large datasets into 24hour data files for flux processing but I am having no joy, a sample of my data is below including the header,

"TOA5","3513","CR3000","3513","CR3000.Std.09","CPU:NUI_School.CR3","47710","ts_data"
"TIMESTAMP","RECORD","Ux","Uy","Uz","Ts","co2","h2o","fw","press","diag_csat","t_hmp","e_hmp"
"TS","RN","m/s","m/s","m/s","C","mg/m^3","g/m^3","C","kPa","unitless","C","kPa"
"","","Smp","Smp","Smp","Smp","Smp","Smp","Smp","Smp","Smp","Smp","Smp"
"2010-11-27 05:05:33.8",376201720,1.19375,1.206,0.16125,0.7920532,572.8367,6.092148,"NAN",99.88811,0,-9.46353,0.2839645
"2010-11-27 05:05:33.9",376201721,1.20875,1.2035,0.08425,0.8036499,573.0999,6.094578,"NAN",99.89777,0,-9.358082,0.2865447
"2010-11-27 05:05:34",376201722,1.3395,1.15475,0.115,0.783844,573.213,6.09756,"NAN",99.88811,0,-9.438423,0.2844805
"2010-11-27 05:05:34.1",376201723,1.30925,1.11525,0.114,0.8135376,573.346,6.099716,"NAN",99.8621,0,-9.40662,0.2852328
"2010-11-27 05:05:34.2",376201724,1.234,1.1315,0.07375,0.816864,573.2751,6.095069,"NAN",99.88811,0,-9.389883,0.2856435
"2010-11-27 05:05:34.3",376201725,1.25225,1.16925,0.06925,0.7937317,573.4069,6.101789,"NAN",99.88811,0,-9.356407,0.286406
"2010-11-27 05:05:34.4",376201726,1.28675,1.133,0.09150001,0.7788696,573.3206,6.096774,"NAN",99.87175,0,-9.398252,0.2858124
"2010-11-27 05:05:34.5",376201727,1.148,1.14525,0.0335,0.8152161,573.2344,6.098159,"NAN",99.8621,0,-9.418337,0.284895
"2010-11-27 05:05:34.6",376201728,1.19525,1.06875,0.0775,0.8085938,573.1026,6.094096,"NAN",99.89777,0,-9.286108,0.2880324
"2010-11-27 05:05:34.7",376201729,1.34625,0.941,0.09200001,0.8036499,573.0602,6.101928,"NAN",99.89777,0,-9.239243,0.2892773

I have defined the data in a .PAR file and followed the instructions from the help menu to try to split the data according to a start stop condition. An example of my attempt is as follows:
Start condition: 1[2010-12-24 23:30:00.1]
Stop condition: 1[2010-12-25 23:30:00.1]

Also it deems the date stamp as bad data despite assigning 30 as the column width for element 1. Any suggestions that I could attempt would be much appreciated.

Thanks,
Stephanie


m. rudnicki Mar 5, 2012 08:30 PM

i would really like to know how to do this as well.
Baler wont do it and card convert only goes from binary.

Mark


Danaw Mar 6, 2012 08:31 PM

Your start/stop conditions need to be in the format:

1:1:1:1
Year:day:hourmin:sec


With the following Start/Stop Condition and 15 minute data:

Start Condition (start processing at 0000 hours)

1:1:1[0000]:1

Stop Condition (stop processing at 0000 hours. Note, stop condition is not included in output)

1:1:1[0000]:1

you would get the following from a 15 minute data file:

"2010-09-23 00:00:00" 95 13.36
"2010-09-23 00:15:00" 96 13.36
"2010-09-23 00:30:00" 97 13.36

<SNIP>
"2010-09-23 23:00:00" 187 13.34
"2010-09-23 23:15:00" 188 13.34
"2010-09-23 23:30:00" 189 13.34
"2010-09-23 23:45:00" 190 13.34


You could use this with the Last Count option (found under Input File Tab, Offsets/Options button). Last Count means start processing where Split last left off. So each time Split was run, it would begin with the next 0000.

Lastly, use the output file option of Create New file (it provides a file with an incrementing number each time Split is run). Then, you would just have to keep choosing Run/Go, Run/Go for each file.

Not completely automated, but hopefully better than splitting it all up by hand.

Look in the help under "bad data" and see if any of the things listed pertain to your file.

Dana


Danaw Mar 6, 2012 08:33 PM

One other note -- Split has a run-time version called Splitr.exe, so the above operation could be automated for anyone who is handy with scripting.

Dana


m. rudnicki Mar 7, 2012 09:26 PM

thanks Dana. that got me moving forward. I am able to set the start condition and write all data, but no joy on the stop condition. it does not stop despite under "other" box i checked 'trigger on stop condition' and 'start-stop on/after time'

so if this works then from one very large file (with several days data) it should automatically generate several files each with data from one day (with 0000 start and stop hour)?

thanks for your help.
Mark


Danaw Mar 8, 2012 10:24 PM

Uncheck the "trigger on stop" condition. This is used to trigger writing time series values to a file based on a defined condition.

Use the Last Count option, and then Run the PAR file over and over until you get to the end.

Dana


Marc May 4, 2012 07:22 PM

when I do this the output file no longer has the time stamp. otherwise it looks like it works but I really need the timestamp to remain

ideas?


Marc May 4, 2012 08:03 PM

all headers don't get copied either

thanks


Danaw May 7, 2012 06:28 PM

In the Select Line, you indicate all the elements that you want to include in the output file, including the Timestamp. If you have 15 elements in your array, you can type 1..10, or 1,2,3,4,5,6,7,8,9,10.

You must create your own headers on the Output File tab of the project. There are three rows for header info.

Dana W.


MatthewBoyd Jul 30, 2014 05:36 PM

Using the same Start and Stop condition of 1:1:1[0000]:1 , as previously suggested, is only resulting in one line in the output file (the same line is triggering the start and stop). Any suggestions as to why this could be?

Specifically, I'm looking to create daily files starting from a given date by running Split repeatedly using the following settings:
Start condition = 1[2014]:1[208]:1[0000]:1[0]
Stop condition = ::1[0000]:1[0]
Select = 1..123
Start Offset = Last count

Somewhat related, I can't use "7%27" instead of "208" because I get the message "Too many "and" conditions in expression." Any help on that too would be appreciated. Thanks.


KCG Jul 31, 2014 04:01 PM

If the data file has records stored faster than one record a minute, the 1:1:1[0000]:1 Start - Stop condition will stop after one record. You could try specifying something for the seconds too. For example 1:1:1[0000]:1[0]

Using a specific Start Condition like 1[2014]:1[208]:1[0000]:1[0] with 'Last Count' Start Offset will probably not do what you want. After the first run, Split will not find the specific Start Condition again. Perhaps you could use Split to reduce the data file to a data file starting with the given date and then use something like the above to repeatedly create daily files.

When you use a time based Start Condition, Split still uses 'and' conditions to check for the start. Split allows three 'and' statements for a Start Condition. Specifying a specific year, day, hour-minute, and seconds for Start Condition will exceed the number of 'and' conditions Split supports. Depending on the data file, you might need to not specify the year (or seconds) for example.

With my testing, I get the "Too many "and" conditions in expression." error if I use the "day of year" or the "month%day" syntax if I specify all four date-time values.

Ken


MatthewBoyd Jul 31, 2014 04:46 PM

Thanks for the suggestions, I'll give them a try.


KCG Aug 1, 2014 10:23 PM

Another option occurred to me. Recent versions of LoggerNet have a utility called File Format Convert. It is not found on the LoggerNet toolbar but you can start it from the Windows Start menu. Like CardConvert, it will break a data file into smaller files but it also works with some source file formats that are not binary.

Might be worth a look.

Ken


MatthewBoyd Aug 4, 2014 12:27 PM

Thank Ken. I tried it and it does exactly what I want. I would assume not, but is there a run-time version that can be called from a batch file, or is there some other way to automate it?


KCG Aug 4, 2014 05:07 PM

Unfortunately you are correct, there is no run-time version or command line parameters to automate File Format Convert currently.

CardConvert has limited support for this but would require that the source data file be binary.

Ken

Log in or register to post/reply in the forum.