I'm quite new to machine learning/data mining and I'm struggling to find the correct path for my problem and would appreciate some guidance or criticism of my proposed solution i.e. is there a better/simpler algorithm for the problem?
The Problem
I have a number of features that describe a particular type (label) of wave (frame of audio) at a predetermined level 'v'. I want to be able to identify the level of an unknown wave and distinguish it from other types of waves that fall under the same higher level category.
Assumptions
- A group in a test set should be in increasing order of level v
- The type of wave in the group should be the same and known
Proposed Solution
Stage one: Level Selection
For a given type of wave compute the features at each level for N number of samples
For each level calculate the mean/median of each feature the N samples to create a feature vector for each level.
Normalise the feature set by subtracting the empirical mean and dividing by the variance.
Take the Euclidean/Manhattan distance of an incoming vector with the feature set and choose the closest level.
For a group with assigned levels, compare levels with neighbours and report negative differences (should be ascending) or large differences.
Stage two: type selection
- Take the Euclidean/Manhattan distance of an incoming vector with the feature set for each type at a specific level or maybe across all levels choose closest type.
Extension of Problem
Features evolve over time as well as level
Proposed Solution
Repeat the stages of the above solution for each frame.
Thanks for any help
*Update I cannot guarantee that the levels v are equivalent across the data I can only guarantee that the order is increasing. i.e. Sample A may have 5 levels v= 1,..,5 and they correspond to {1,..,5} and sample B has 10 samples v = 1,2,..10 and they correspond to {.5,1,1.5,...,10}. How can I capture this without knowing the relationship between levels and identify those which do not follow this pattern. Pleas let me know if this is not clear