Intelligent focused crawler: Learning which links to crawl

dc.authoridTR192086en_US
dc.authoridTR2572en_US
dc.authoridTR199826en_US
dc.contributor.authorTaylan, Duygu
dc.contributor.authorPoyraz, Mitat
dc.contributor.authorAkyokuş, Selim
dc.contributor.authorGaniz, Murat Can
dc.date.accessioned2016-01-25T09:07:45Z
dc.date.available2016-01-25T09:07:45Z
dc.date.issued2011-06
dc.departmentDoğuş Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümüen_US
dc.descriptionAkyokuş, Selim (Dogus Author) -- Ganiz, Murat C. (Dogus Author) -- Conference full title: 2011 International Symposium on Innovations in Intelligent Systems and Applications (INISTA 2011) Istanbul, Turkey, 15 - 18 June 2011en_US
dc.description.abstractA web crawler is defined as an automated program that methodically scans through Internet pages and downloads any page that can be reached via links. With the exponential growth of the Web, fetching information about a special-topic is gaining importance. A focused crawler is a web crawler that attempts to download only web pages that are relevant to a predefined topic or set of topics. In order to determine a web page is about a particular topic, focused crawlers use classification techniques. In this study we focus on the classification of links instead of downloaded web pages to determine relevancy. We combine a Naïve Bayes classifier for classification of URLs with a simple URL scoring optimization to improve the system performance. Our results demonstrate that proposed approach performs better.en_US
dc.description.sponsorshipTUBITAK, IEEE.en_US
dc.identifier.citationTaylan, D., Poyraz, M., Akyokuş, S., & Ganiz, M. C. (2011). Intelligent focused crawler: Learning which links to crawl. In 2011 International Symposium on Innovations in Intelligent Systems and Applications (INISTA) (pp. 504-508). Piscataway, NJ: IEEE. https://dx.doi.org/10.1109/INISTA.2011.5946150en_US
dc.identifier.doi10.1109/INISTA.2011.5946150
dc.identifier.endpage508en_US
dc.identifier.isbn9781612849195
dc.identifier.other12109155 (INSPEC)
dc.identifier.other5946150 (Scopus)
dc.identifier.scopus2-s2.0-79961187359en_US
dc.identifier.scopusqualityN/Aen_US
dc.identifier.startpage504en_US
dc.identifier.urihttps://dx.doi.org/10.1109/INISTA.2011.5946150
dc.identifier.urihttps://hdl.handle.net/11376/2355
dc.indekslendigikaynakScopusen_US
dc.institutionauthorAkyokuş, Selim
dc.institutionauthorGaniz, Murat Can
dc.language.isoenen_US
dc.publisherIEEEen_US
dc.relation.ispartof2011 International Symposium on Innovations in Intelligent Systems and Applications (INISTA)en_US
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectFocused Crawleren_US
dc.subjectLink Classificationen_US
dc.subjectMachine Learningen_US
dc.subjectNaive Bayesen_US
dc.subjectTurkish Web Pagesen_US
dc.subjectURL Optimizationen_US
dc.titleIntelligent focused crawler: Learning which links to crawlen_US
dc.typeConference Objecten_US

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
sakyokus_2011.pdf
Boyut:
2.05 MB
Biçim:
Adobe Portable Document Format
Açıklama:
Yayıncı Sürümü

Lisans paketi

Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
license.txt
Boyut:
1.51 KB
Biçim:
Item-specific license agreed upon to submission
Açıklama: