Determine distributions of dataset Implement PCA and DCT methods
$30-250 USD
Paid on delivery
This is about statistical analysis of a data collection as well as different data reduction methods, and in particular, dimensionality reduction through feature extraction. You are given two datasets, each containing a data table of 1000 vector with 100 attributes (i.e., dimensions) in two files with 500 samples for each file. Each dataset is given by two tables of 500 samples each. Both datasets are given as text table files where each dataset is represented as a 1000 x 100 matrix where each row of the matrix is a vector. You are further told that for each dataset, for all the samples (i.e., vectors) the component values of each vector follow the same distribution.
1. Determine the distributions of the two vector component values for both datasets. For each dataset, randomly pick up 10 samples and report the distribution parameters for each of the 10 samples.
2. Compute the norms for all the samples for both datasets. Then determine the distributions for the norms of both datasets, respectively, and report their distribution parameters.
3. Implement PCA and DCT methods and apply them for feature extraction to the two datasets, respectively. Report the reduced dimensionalities for the two datasets after the feature extraction for PCA and DCT, respectively.
4. Compare the feature extraction results between the two methods for the two datasets, respectively, and report your comparison conclusion.
You can use whatever programming language you are comfortable with.(Preferrably c++)
Project ID: #21737534
About the project
5 freelancers are bidding on average $286 for this job
I have done multivariate statistical analysis for various domains with different tools. I can get this done in Python/R/C++.
I checked your project: Determine distributions of dataset Implement PCA and DCT methods and I am totally sure that I am able to accomplish it ! I am very good at C++ Programming, Data Mining. Hi Yunus Emre D.! Please More
I still have to analyse how complex it is to implement the PCA and DCT algorithms, before being able to give you a quote.