Книга Automatic extraction and processing of document references Kathrin Eichler

Automatic extraction and processing of document references

A CRF-based approach

Автор: Kathrin Eichler
Език: Английски език
Корици: С меки корици
Издател: Grin Publishing
Наличност: Външен склад
Изпращаме след 5-8 дни
39.43 77.11 лв
Master's Thesis from the year 2007 in the subject Computer Science - Applied, grade: 1.0, University...

Информация за книгата

Автор
Език
Английски език
Корици
Книга - С меки корици
Издадена
2010
страници
72
EAN
9783640723164
ISBN
3640723163
Enbook ID
05280199
Издател
Теглоt
104
Размери
148 x 210 x 4

Пълно описание

Master's Thesis from the year 2007 in the subject Computer Science - Applied, grade: 1.0, University of Sunderland (School of Computing and Technology), language: English, comment: Für die Arbeit wurde die Bewertung "with distinction" vergeben. , abstract: While reading documents, you often encounter text passages advising you to refer to other documents for more information about a specific topic. These references to other documents are particularly common in technical documents, written for the sole purpose of providing the reader with as much relevant information as possible, without rephrasing information that can be found elsewhere. Knowing how the documents in a system are interrelated, i.e. which other documents a document refers to or is referred by, can be extremely helpful when trying to get access to relevant information. A typicalexample of such a knowledge net providing information about document relations is CiteSeer, a digital library of academic literature. For each document in the library system, CiteSeer displays lists of related documents, such as a list of documents thatthe current document cites as well as a list of documents that the current document is cited by. The assumption that inspired this thesis is that such lists are not only helpful when reading academic literature but could also assist a reader of technical documentsstored in a company s document management system. The idea was thus to extend an existing document management system by displaying, for each document stored in the system, a list of links to documents that the current document refers to. As information about how the documents in this system are interrelated was not available,the focus of the project underlying this thesis was on the first step towards solving this task: automatically analyzing documents in order to extract names of related documents. Once all document names mentioned in a document have been extracted, the next step would then be to search for these documents in the system s database and, in case they have been successfully found, create links to the respective documents.The outcome of the project was a system that performs the extraction task. It is based on Conditional Random Fields, a machine learning technique introduced by Lafferty et al. (2001), and is able to extract document names from unseen documents, achieving high precision scores (88%) and acceptable recall scores (65%) on a test dataset.The implementation is based on a Java package provided by Sarawagi & Cohen (2005), which was adapted and extended to suit the nature of the task. As the approach is based on supervised learning, the project also involved the generation of appropriate trainingdata.

Може също да ви хареса

36.31 71.02 лв

Passage

Tony Reevy
11.80 23.08 лв
130.25 254.74 лв

Oasis of the Seas

Quinn M. Arnold
34.71 67.88 лв
116.18 227.23 лв

Typee

Herman Melville
12.90 25.24 лв
105.43 206.21 лв
31.69 61.98 лв
14.46 28.28 лв
16.57 32.41 лв
52.24 102.17 лв
176.76 345.72 лв

Restless Truth

Freya Marske
12.30 24.06 лв

Too Much

Terri Cole
20.89 40.86 лв
184.90 361.63 лв

Macbeth

William Shakespeare
6.02 11.78 лв
24.31 47.54 лв
77.25 151.09 лв
22.25 43.51 лв

Клиенти, които купиха тази книга, купиха също

Zoé

Timmerman
19.74 38.60 лв

Mitten in der Stadt

Mechtild Borrmann
9.29 18.17 лв
10.39 20.33 лв

Igre gladi - Plamen

Suzanne Collins
15.11 29.56 лв

Peer Gynt

Henrik Ibsen
8.64 16.89 лв
8.89 17.38 лв

I suicidi di Parigi

Ferdinando Petruccelli della Gattina
27.52 53.83 лв

Tuláček a Klára

Erich Jakub Groch
7.33 14.33 лв
89.66 175.36 лв