I am (short-term) monitoring the sizes of several UNIX files and can easily export the numbers to an Excel spreadsheet. The files will eventually be indexed by Splunk and that team wants to know the daily volume to expect. I am currently sampling the file sizes every hour. My file sizes generally are of two types, constantly growing or cyclical. My definition of cyclical is: the files grow, then get reset to size 0, then grow, then reset, etc. A graph of the numbers could be defined as a sawtooth pattern.
Can Excel determine which category a file belongs, either constantly growing or cyclical? The final numbers do not have to be exact, but must give a good idea of the daily volume. (I could do this manually, but I want Excel to allow me to be lazy.)
The reasons I ask are:
- If a file is reset to zero, which happens typically 3 times a day (but sometimes more often), I want to report the sum of the three largest file sizes. As a bonus, if the resets happen more often, perhaps Excel knows this and I can multiply by that constant.
- If a file is constantly growing, it's a candidate for log file rotation, and I'll report the difference between early day and late day file sizes to get daily volume. (And see about getting a rotation script running.)
- If the file size never changes, the program generating the file needs to be checked. I can discover this by comparing min() and max() values.
Here are some sample data pulled directly from my spreadsheet. You can see the top and bottom rows grow until the 5th column, 16:05. It is safe to assume the file size was reset to 0 at some point after the 15:55 sample. The middle row continues to grow.
14:05 14:55 15:05 15:55 16:05 16:55 17:05 17:55 18:05
3,307,183 3,692,602 3,872,188 4,301,423 61,010 467,733 554,456 981,851 1,068,804
36,348,105 36,888,709 37,003,076 37,541,343 37,650,979 38,182,932 38,292,337 38,838,988 38,947,009
1,666,276 1,879,624 1,945,419 2,164,186 29,410 238,368 293,786 518,526 565,455
(I cheated on my hourly sample. I wanted to trap the time the resets actually occur. Now that I know, I'll drop back to hourly monitoring and recalculate the older times with the data I've retained. I'll monitor the files for about a week.)
The output I'll report from rows one and three will be three times the maximum, the 15:55 column here. Actually, I'll probably use something like =LARGE(B1:ZZ1,1) + LARGE(B1:ZZ1,2) + LARGE(B1:ZZ1,3), but I need to know this is cyclical file growth.
The output from row two will be the size at 23:55 today minus the size at 23:55 yesterday, but I need to know this is constant file growth.
Then I'll total all my sizes to get a grand total.

