Determine distributions of dataset Implement PCA and DCT methods

Cancelled Posted 4 years ago Paid on delivery
Cancelled Paid on delivery

This is about statistical analysis of a data collection as well as different data reduction methods, and in particular, dimensionality reduction through feature extraction. You are given two datasets, each containing a data table of 1000 vector with 100 attributes (i.e., dimensions) in two files with 500 samples for each file. Each dataset is given by two tables of 500 samples each. Both datasets are given as text table files where each dataset is represented as a 1000 x 100 matrix where each row of the matrix is a vector. You are further told that for each dataset, for all the samples (i.e., vectors) the component values of each vector follow the same distribution.

1. Determine the distributions of the two vector component values for both datasets. For each dataset, randomly pick up 10 samples and report the distribution parameters for each of the 10 samples.

2. Compute the norms for all the samples for both datasets. Then determine the distributions for the norms of both datasets, respectively, and report their distribution parameters.

3. Implement PCA and DCT methods and apply them for feature extraction to the two datasets, respectively. Report the reduced dimensionalities for the two datasets after the feature extraction for PCA and DCT, respectively.

4. Compare the feature extraction results between the two methods for the two datasets, respectively, and report your comparison conclusion.

You can use whatever programming language you are comfortable with.(Preferrably c++)

Data Mining C++ Programming

Project ID: #21737534

About the project

5 proposals Remote project Active 4 years ago

5 freelancers are bidding on average $286 for this job

hbxfnzwpf

I am very proficient in c and c++. I have 17 years c++ developing experience now, and have worked for more than 7 years. My work is online game developing, and mainly focus on server side, using c++ under Linux environ More

$135 USD in 3 days
(117 Reviews)
6.8
kipdev13

Hi, very nice to meet you. I am Fang G. I have read your post carefully and your project is very interesting for me. I have 16 years of experience in software development and have already completed 235 projects in free More

$800 USD in 7 days
(57 Reviews)
5.9
mramalingam

I have done multivariate statistical analysis for various domains with different tools. I can get this done in Python/R/C++.

$200 USD in 20 days
(1 Review)
0.7
RavenSolutions

I checked your project: Determine distributions of dataset Implement PCA and DCT methods and I am totally sure that I am able to accomplish it ! I am very good at C++ Programming, Data Mining. Hi Yunus Emre D.! Please More

$155 USD in 1 day
(0 Reviews)
0.0
OscarAP

I still have to analyse how complex it is to implement the PCA and DCT algorithms, before being able to give you a quote.

$140 USD in 7 days
(0 Reviews)
0.0