NCSU Libraries
Search the Collection|Browse Subjects|Services|Library Information|Community |News & Events

Title page for ETD etd-10292003-085539


Type of Document Master's Thesis
Author Sun, Xuejun ,
Author's Email Address sunxx80@hotmail.com
URN etd-10292003-085539
Title ULEDS-SVMs: Upper/Lower Limits and Error Data Supposted Support Vector Machines
Degree Master of Science
Graduate Program Computer Science
Advisory Committee
Advisor Name Title
Jon Doyle Committee Chair
John Blondin Committee Member
Robert Funderlic Committee Member
Keywords
  • upper limit
  • lower limit
  • Machine Learning
  • Data Mining
  • Application: Chandra and Hubble Field
  • SVMs
  • Support Vector Machines
  • Domain: data with error
Date of Defense 2003-10-28
Availability unrestricted
Abstract
A Support Vector Machine, ULEDS-SVMs, was developed for classification in data domain which contains limits or errors.

Data with upper or lower limits are different from missing data. They provide constraints at a certain level in data classification and modeling. Data with errors may be recognized as the special case of an upper and a lower limit existing at the two boundaries at an attribute. Such kind of data quality exists widely, from

scientific data measurement, to databases resulted from integration and emerge with different quality. Including these data in training rather than dropping them or arbitrarily filling with some value is very desired to provide useful constraints in machine learning.

A simple enhanced 1R algorithm is described which may be able to handle data in such a domain, and which principle may be extendable to other machine learning methods. But this is not

favored because of its time complicity. Support Vector Machines (SVMs) treatment of the data in such a domain is, however, very promising. We provided the mathematical foundation to treat this kind of problem by recognizing the concepts of feasibilities for training, testing and predicting in SVMs. Algorithms were described by

utilizing the theorems.

For applying ULEDS-SVMs, we made an integration of a data set in astronomy (CHDF-N) based on Chandra Deep Field (CDF) and Hubble Deep Field (HDF) North observations. Classification of

the astronomical objects is interesting for the study of formation and evolution of galaxies in the deep universe. This direction contains the deepest observations made with the largest astronomical facilities currently available.

We used CHDF-N as a test bed for the ULEDS-SVMs algorithms application implemented via Matlab.

The separation between stars and extragalactic objects gets a 100% accuracy, which would be otherwise more ambiguous in determining the separation plane if limit data in extragalactic class were not included. Training and testing

using leave-one-out partition achieved 82% accuracy for separation of galaxies and active galactic nuclei (AGNs). This is better than 72.4% accuracy by using conventional R-log(F_x) plot separation method commonly used in the astronomical community. Prediction rate increased from 49.6% by using conventional SVMs to 75.5% by using ULEDS-SVMs.

Files
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  etd.pdf 1.19 Mb 00:05:29 00:02:49 00:02:28 00:01:14 00:00:06