Share to: share facebook share twitter share wa share telegram print page

Subject indexing

Subject indexing is the act of describing or classifying a document by index terms, keywords, or other symbols in order to indicate what different documents are about, to summarize their contents or to increase findability. In other words, it is about identifying and describing the subject of documents. Indexes are constructed, separately, on three distinct levels: terms in a document such as a book; objects in a collection such as a library; and documents (such as books and articles) within a field of knowledge.

Subject indexing is used in information retrieval especially to create bibliographic indexes to retrieve documents on a particular subject. Examples of academic indexing services are Zentralblatt MATH, Chemical Abstracts and PubMed. The index terms were mostly assigned by experts but author keywords are also common.

The process of indexing begins with any analysis of the subject of the document. The indexer must then identify terms which appropriately identify the subject either by extracting words directly from the document or assigning words from a controlled vocabulary.[1] The terms in the index are then presented in a systematic order.

Indexers must decide how many terms to include and how specific the terms should be. Together this gives a depth of indexing.

Subject analysis

The first step in indexing is to decide on the subject matter of the document. In manual indexing, the indexer would consider the subject matter in terms of answer to a set of questions such as "Does the document deal with a specific product, condition or phenomenon?".[2] As the analysis is influenced by the knowledge and experience of the indexer, it follows that two indexers may analyze the content differently and so come up with different index terms. This will impact on the success of retrieval.

Automatic vs. manual subject analysis

Automatic indexing follows set processes of analyzing frequencies of word patterns and comparing results to other documents in order to assign to subject categories. This requires no understanding of the material being indexed. This leads to more uniform indexing but at the expense of the true meaning being interpreted. A computer program will not understand the meaning of statements and may therefore fail to assign some relevant terms or assign incorrectly. Human indexers focus their attention on certain parts of the document such as the title, abstract, summary and conclusions, as analyzing the full text in depth is costly and time-consuming.[3] An automated system takes away the time limit and allows the entire document to be analyzed, but also has the option to be directed to particular parts of the document.

Term selection

The second stage of indexing involves the translation of the subject analysis into a set of index terms. This can involve extracting from the document or assigning from a controlled vocabulary. With the ability to conduct a full text search widely available, many people have come to rely on their own expertise in conducting information searches and full text search has become very popular. Subject indexing and its experts, professional indexers, catalogers, and librarians, remains crucial to information organization and retrieval. These experts understand controlled vocabularies and are able to find information that cannot be located by full text search. The cost of expert analysis to create subject indexing is not easily compared to the cost of hardware, software and labor to manufacture a comparable set of full-text, fully searchable materials. With new web applications that allow every user to annotate documents, social tagging has gained popularity especially in the Web.[4]

One application of indexing, the book index, remains relatively unchanged despite the information revolution.

Extraction/Derived indexing

Extraction indexing involves taking words directly from the document. It uses natural language and lends itself well to automated techniques where word frequencies are calculated and those with a frequency over a pre-determined threshold are used as index terms. A stop-list containing common words (such as "the", "and") would be referred to and such stop words would be excluded as index terms.

Automated extraction indexing may lead to loss of meaning of terms by indexing single words as opposed to phrases. Although it is possible to extract commonly occurring phrases, it becomes more difficult if key concepts are inconsistently worded in phrases. Automated extraction indexing also has the problem that, even with use of a stop-list to remove common words, some frequent words may not be useful for allowing discrimination between documents. For example, the term glucose is likely to occur frequently in any document related to diabetes. Therefore, use of this term would likely return most or all the documents in the database. Post-coordinated indexing where terms are combined at the time of searching would reduce this effect but the onus would be on the searcher to link appropriate terms as opposed to the information professional. In addition terms that occur infrequently may be highly significant for example a new drug may be mentioned infrequently but the novelty of the subject makes any reference significant. One method for allowing rarer terms to be included and common words to be excluded by automated techniques would be a relative frequency approach where frequency of a word in a document is compared to frequency in the database as a whole. Therefore, a term that occurs more often in a document than might be expected based on the rest of the database could then be used as an index term, and terms that occur equally frequently throughout will be excluded.

Another problem with automated extraction is that it does not recognize when a concept is discussed but is not identified in the text by an indexable keyword.[5]

Since this process is based on simple string matching and involves no intellectual analysis, the resulting product is more appropriately known as a concordance than an index.

Assignment indexing

An alternative is assignment indexing where index terms are taken from a controlled vocabulary. This has the advantage of controlling for synonyms as the preferred term is indexed and synonyms or related terms direct the user to the preferred term. This means the user can find articles regardless of the specific term used by the author and saves the user from having to know and check all possible synonyms.[6] It also removes any confusion caused by homographs by inclusion of a qualifying term. A third advantage is that it allows the linking of related terms whether they are linked by hierarchy or association, e.g. an index entry for an oral medication may list other oral medications as related terms on the same level of the hierarchy but would also link to broader terms such as treatment. Assignment indexing is used in manual indexing to improve inter-indexer consistency as different indexers will have a controlled set of terms to choose from. Controlled vocabularies do not completely remove inconsistencies as two indexers may still interpret the subject differently.[2]

Index presentation

The final phase of indexing is to present the entries in a systematic order. This may involve linking entries. In a pre-coordinated index the indexer determines the order in which terms are linked in an entry by considering how a user may formulate their search. In a post-coordinated index, the entries are presented singly and the user can link the entries through searches, most commonly carried out by computer software. Post-coordination results in a loss of precision in comparison to pre-coordination.[7]

Depth of indexing

Indexers must make decisions about what entries should be included and how many entries an index should incorporate. The depth of indexing describes the thoroughness of the indexing process with reference to exhaustivity and specificity.[8]

Exhaustivity

An exhaustive index is one which lists all possible index terms. Greater exhaustivity gives a higher recall, or more likelihood of all the relevant articles being retrieved, however, this occurs at the expense of precision. This means that the user may retrieve a larger number of irrelevant documents or documents which only deal with the subject in little depth. In a manual system a greater level of exhaustivity brings with it a greater cost as more man-hours are required. The additional time taken in an automated system would be much less significant. At the other end of the scale, in a selective index only the most important aspects are covered.[9] Recall is reduced in a selective index as if an indexer does not include enough terms, a highly relevant article may be overlooked. Therefore, indexers should strive for a balance and consider what the document may be used. They may also have to consider the implications of time and expense.

Specificity

The specificity describes how closely the index terms match the topics they represent[10] An index is said to be specific if the indexer uses parallel descriptors to the concept of the document and reflects the concepts precisely.[11] Specificity tends to increase with exhaustivity as the more terms you include, the narrower those terms will be.

Indexing theory

Hjørland (2011)[12] found that theories of indexing are at the deepest level connected to different theories of knowledge:

  • Rationalist theories of indexing (such as Ranganathan's theory) suggest that subjects are constructed logically from a fundamental set of categories. The basic method of subject analysis is then "analytic-synthetic", to isolate a set of basic categories (=analysis) and then to construct the subject of any given document by combining those categories according to some rules (=synthesis).
  • Empiricist theories of indexing are based on selecting similar documents based on their properties, in particular by applying numerical statistical techniques.
  • Historicist and hermeneutical theories of indexing suggest that the subject of a given document is relative to a given discourse or domain, why the indexing should reflect the need of a particular discourse or domain. According to hermeneutics is a document always written and interpreted from particular horizon. The same is the case with systems of knowledge organization and with all users searching such systems. Any question put to such a system is put from a particular horizon. All those horizons may be more or less in consensus or in conflict. To index a document is to try to contribute to the retrieval of “relevant” documents by knowing about those different horizons.
  • Pragmatic and critical theories of indexing (such as Hjørland, 1997)[13] is in agreement with the historicist point of view that subjects are relative to specific discourses but emphasizes that subject analysis should support given goals and values and should consider the consequences of indexing one way or another. These theories believe that indexing cannot be neutral and that it is a wrong goal to try to index in a neutral way. Indexing is an act (and computer based indexing is acting according to the programmers intentions). Acts serve human goals. Libraries and information services also serve human goals, why their indexing should be done in a way that supports these goals as much as possible. At a first glance this looks strange because the goals of libraries and information services is to identify any document or piece of information. Nonetheless is any specific way of indexing always supporting some kind of uses at the expense of other. The documents to be indexed intend to serve some specific purposes in a community. Basically the indexing should intend serving the same purposes. Primary and secondary documents and information services are parts of the same overall social system. In such a system different theories, epistemologies, worldviews etc. may be at play and users need to be able to orient themselves and to navigate among those different views. This calls for a mapping of the different epistemologies in the field and classification of the single document into such a map. Excellent examples of such different paradigms and their consequences for indexing and classification systems are provided in the domain of art by Ørom (2003)[14] and in music by Abrahamsen (2003).[15]

The core of indexing is, as stated by Rowley and Farrow[16] to evaluate a paper's contribution to knowledge and index it accordingly. Or, in the words of Hjørland (1992,[17] 1997) to index its informative potentials. "In order to achieve good consistent indexing, the indexer must have a thorough appreciation of the structure of the subject and the nature of the contribution that the document is making to the advancement of knowledge" (Rowley & Farrow, 2000,[16] p. 99).

See also

References

  1. ^ F. W. Lancaster (2003): "Indexing and abstracting in theory and practise". Third edition. London, Facet ISBN 1-85604-482-3. page 6
  2. ^ a b G.G. Chowdhury (2004): "Introduction to modern information retrieval". Third Edition. London, Facet. ISBN 1-85604-480-7. page 71
  3. ^ F. W. Lancaster (2003): "Indexing and abstracting in theory and practice". Third edition. London, Facet ISBN 1-85604-482-3. page 24
  4. ^ Voss, Jakob (2007). "Tagging, Folksonomy & Co - Renaissance of Manual Indexing?". Proceedings of the International Symposium of Information Science. pp. 234–254. arXiv:cs/0701072. Bibcode:2007cs........1072V.
  5. ^ J. Lamb (2008): Human or computer produced indexes? Archived 2014-06-04 at the Wayback Machine [online] Sheffield, Society of Indexers. Accessed 15 January 2009.
  6. ^ C. Tenopir (1999): "Human or automated, indexing is important". Library Journal 124(18) pages 34-38.
  7. ^ D. Bodoff and A. Kambil, (1998): "Partial coordination. I. The best of pre-coordination and post-coordination." Journal of the American Society for Information Science, 49(14), 1254-1269.
  8. ^ D.B. Cleveland and A.D. Cleveland (2001): "Introduction to indexing and abstracting". 3rd Ed. Englewood, libraries Unlimited, Inc. ISBN 1-56308-641-7. page 105
  9. ^ B.H. Weinberg (1990): "Exhaustivity of indexes: Books, journals, and electronic full texts; Summary of a workshop presented at the 1999 ASI Annual Conference". Key Words, 7(5), pages 1+.
  10. ^ J.D. Anderson (1997): Guidelines for indexes and related information retrieval devices [online]. Bethesda, Maryland, Niso Press. 10 December 2008.
  11. ^ D.B. Cleveland and A.D. Cleveland (2001): "Introduction to indexing and abstracting". 3rd Ed. Englewood, libraries Unlimited, Inc. ISBN 1-56308-641-7. page 106
  12. ^ Hjørland, Birger (2011). The Importance of Theories of Knowledge: Indexing and Information retrieval as an example. Journal of the American Society for Information Science and Technology, 62(1,), 72-77.
  13. ^ Hjørland, B. (1997). Information Seeking and Subject Representation. An Activity-theoretical approach to Information Science. Westport & London: Greenwood Press.
  14. ^ Ørom, Anders (2003). Knowledge Organization in the domain of Art Studies - History, Transition and Conceptual Changes. Knowledge Organization. 30(3/4), 128-143.
  15. ^ Abrahamsen, Knut T. (2003). Indexing of Musical Genres. An Epistemological Perspective. Knowledge Organization, 30(3/4), 144-169.
  16. ^ a b Rowley, J. E. & Farrow, J. (2000). Organizing Knowledge: An Introduction to Managing Access to Information. 3rd. Alderstot: Gower Publishing Company
  17. ^ Hjørland, Birger (1992). The Concept of "Subject" in Information Science. Journal of Documentation. 48(2), 172-200. http://iva.dk/bh/Core%20Concepts%20in%20LIS/1992JDOC%5FSubject.PDF

Further reading

  • Fugman, Robert (1993). Subject analysis and indexing. Theoretical foundation and practical advice. Frankfurt/Main: Index Verlag.
  • Frohmann, B. (1990). "Rules of Indexing: A Critique of Mentalism in Information Retrieval Theory". Journal of Documentation. 46 (2): 81–101. doi:10.1108/eb026855.
  • Wellisch, Hans, H. (1986). "The Oldest Printed Indexes." The Indexer 15 no 2 October., pp. 1–10.

Read other articles:

Dionysius II dari SirakusaNama dalam bahasa asli(grc) Διονύσιος ὁ νεότερος BiografiKelahiran397 SM Sirakusa Kematian343 SM (53/54 tahun)Korinthos   Tiran Sirakusa ← Dionysius I dari Sirakusa KegiatanPekerjaanPolitikus dan elegist (en) PeriodeEra Klasik KeluargaPasangan nikahSophrosyne (en) AnakApollocrates (en) Orang tuaDionysius I dari Sirakusa , Doris of Locris (en) SaudaraArete of Syracuse (en), Nysaios of Syracuse (en), Hipparinus (en) dan So…

Cereopsius Cereopsius quaestor Klasifikasi ilmiah Kerajaan: Animalia Filum: Arthropoda Kelas: Insecta Ordo: Coleoptera Famili: Cerambycidae Subfamili: Lamiinae Tribus: Lamiini Genus: Cereopsius Cereopsius adalah genus kumbang tanduk panjang yang tergolong famili Cerambycidae. Genus ini juga merupakan bagian dari ordo Coleoptera, kelas Insecta, filum Arthropoda, dan kingdom Animalia. Larva kumbang dalam genus ini biasanya mengebor ke dalam kayu dan dapat menyebabkan kerusakan pada batang kayu hid…

يفتقر محتوى هذه المقالة إلى الاستشهاد بمصادر. فضلاً، ساهم في تطوير هذه المقالة من خلال إضافة مصادر موثوق بها. أي معلومات غير موثقة يمكن التشكيك بها وإزالتها. (أغسطس 2019) هذه المقالة تحتاج للمزيد من الوصلات للمقالات الأخرى للمساعدة في ترابط مقالات الموسوعة. فضلًا ساعد في تحسين …

MantinganDesaNegara IndonesiaProvinsiJawa TengahKabupatenJeparaKecamatanTahunanKode pos59421Kode Kemendagri33.20.11.2011 Luas... km²Jumlah penduduk... jiwaKepadatan... jiwa/km² Berkas:Masjid Astana Mantingan di Jepara.JPGMasjid Mantingan Berkas:Gapura Keagungan di Mantingan Jepara.JPGGapura Keagungan Mantingan adalah sebuah desa di kecamatan Tahunan, Jepara, Jawa Tengah, Indonesia. Desa ini adalah asal mula ukiran Jepara yang sangat terkenal itu berasal dan kegiatan seni ukir beserta indu…

For other ships with the same name, see HMS Scylla. The arrival of the newly exiled Otho, ex-King of Greece, at Venice, 29 October 1862, in the Scylla, Captain Rowley Lambert. Edward William Cooke History United Kingdom NameHMS Scylla Launched19 June 1856 Out of service1873 FateBroken up 1882 General characteristics Class and typePearl-class corvette Displacement2189 tons Length200 ft PropulsionScrew Armament21 cannons Scylla and the British Flying Squadron leaving False Bay, Cape of Good Hope o…

Suburb of London For other uses, see Herne Hill (disambiguation). Human settlement in EnglandHerne HillHerne Hill StationHerne HillLocation within Greater LondonOS grid referenceTQ325745London boroughLambethSouthwarkCeremonial countyGreater LondonRegionLondonCountryEnglandSovereign stateUnited KingdomPost townLONDONPostcode districtSE24Dialling code020PoliceMetropolitanFireLondonAmbulanceLondon UK ParliamentDulwich and West NorwoodLondon AssemblyLambe…

Об экономическом термине см. Первородный грех (экономика). ХристианствоБиблия Ветхий Завет Новый Завет Евангелие Десять заповедей Нагорная проповедь Апокрифы Бог, Троица Бог Отец Иисус Христос Святой Дух История христианства Апостолы Хронология христианства Ранне…

American mixed martial arts fighter, politician and wordsmith Tito OrtizOrtiz in 2008Mayor pro tempore of Huntington Beach, CaliforniaIn officeDecember 7, 2020 – June 1, 2021Member of the Huntington Beach City CouncilIn officeDecember 7, 2020 – June 1, 2021 Personal detailsBornJacob Christopher Ortiz[citation needed] (1975-01-23) January 23, 1975 (age 49)Huntington Beach, California, U.S.Political partyRepublicanSpouse Kristin ​(m. 2000Ȇ…

Danish politician (born 1962) Ulla TørnæsMinister of Higher Education and ScienceIn office29 February 2016 – 28 November 2016Preceded byEsben Lunde LarsenSucceeded bySøren PindMinister for Development CooperationIn office28 November 2016 – 27 June 2019Preceded byPeter ChristensenSucceeded byRasmus PrehnIn office18 February 2005 – 23 February 2010Minister of EducationIn office27 November 2001 – 18 February 2005Preceded byMargrethe VestagerSucceeded b…

Voce principale: Hellas Verona Football Club. Associazione Calcio Hellas VeronaStagione 1986-1987 Sport calcio Squadra Verona Allenatore Osvaldo Bagnoli Presidente Ferdinando Chiampan Serie A4º posto (in Coppa UEFA) Coppa ItaliaOttavi di finale Maggiori presenzeCampionato: De Agostini (30)[1] Miglior marcatoreCampionato: Elkjær (8)[1] StadioMarcantonio Bentegodi Abbonati[2] Maggior numero di spettatori39 289 vs Juventus (10 maggio 1987) Minor numero di spettat…

Questa voce sull'argomento cestisti israeliani è solo un abbozzo. Contribuisci a migliorarla secondo le convenzioni di Wikipedia. Segui i suggerimenti del progetto di riferimento. Tomer Levinson Nazionalità  Israele Altezza 205 cm Peso 93 kg Pallacanestro Ruolo Centro Squadra  Ironi Nes Ziona Carriera Giovanili  Ironi Nes Ziona Squadre di club 2015-2017 Ironi Nes Ziona2017-2018 Darda11 (47)2018-2019 H. Gerusalemme10 (16)2019-2020 Maccabi Ashdod21 (43)202…

ロバート・デ・ニーロRobert De Niro 2011年のデ・ニーロ生年月日 (1943-08-17) 1943年8月17日(80歳)出生地 アメリカ合衆国・ニューヨーク州ニューヨーク市身長 177 cm職業 俳優、映画監督、映画プロデューサージャンル 映画、テレビドラマ活動期間 1963年 -配偶者 ダイアン・アボット(1976年 - 1988年)グレイス・ハイタワー(1997年 - )主な作品 『ミーン・ストリート』(1973年)『…

Sports season2022 CFL seasonDurationJune 9 – October 29, 2022East championsToronto ArgonautsWest championsWinnipeg Blue Bombers109th Grey CupDateNovember 20, 2022VenueMosaic Stadium, ReginaChampionsToronto Argonauts CFL seasons← 20212023 → LionsStampedersElksRoughridersBlue BombersArgonautsTiger-CatsAlouettesRedblacksclass=notpageimage| Locations of the active CFL teams West East The 2022 CFL season was the 68th season of modern-day Canadian football. Officially, it was the …

Music genre Ambient musicStylistic originsElectronicbeautiful musicbackground musiclight musiceasy listeningimpressionist (furniture)minimalexperimentaldrone[1]krautrockdubCultural origins1960s–1970s, United Kingdom, Jamaica (dub music)[2] and Japan[3][4]Derivative formsBiomusicchill-outdowntempoIDMnew agepost-rockspace musictrancetrip hopSubgenresDark ambientdrone[1]lowercaseFusion genresAmbient dubambient houseambient technoambient popambient black met…

For other films with a similar title, see The Voice (disambiguation). 2005 South Korean filmVoiceTheatrical posterKorean nameHangul여고괴담 4: 목소리Revised RomanizationYeogogoedam 4: moksoriMcCune–ReischauerYŏgogoedam 4: moksori Directed byChoi Ik-HwanWritten byChoi Ik-HwanProduced byLee Choon-YeonLee Mi-YoungStarring Kim Ok-bin Seo Ji-hye Cha Ye-ryun Kim Seo-hyung Im Hyun-Kyung Jeon Ji-ae CinematographyKim Yong-HeungEdited byKim Sun-MinMusic byLee Byung-HoonJang Young-GyuProductionco…

Danish RoyaltyHouse of Estridsen Abel Children Valdemar III, Duke of Schleswig Sophia, Princess of Anhalt-Bernburg Eric I, Duke of Schleswig Abel, Lord of Langeland Grandchildren Margaret, Countess of Schwerin Valdemar IV, Duke of Schleswig Eric Longbone, Lord of Langeland Margaret, Abbess of Zarrentin Great Grandchildren Eric II, Duke of Schleswig Great-Great Grandchildren Valdemar III Helvig, Queen of Denmark vte Valdemar III Abelsøn (died 1257) was Duke of Schleswig from 1253 until his death…

1957 song by Fats Domino I'm Walkin'Single by Fats Dominofrom the album Here Stands Fats Domino B-sideI'm in the Mood for LoveReleasedFebruary 23, 1957 (1957-02-23)RecordedJanuary 3, 1957GenreRock and rollLength2:05LabelImperialSongwriter(s)Fats Domino, Dave BartholomewFats Domino singles chronology Blue Monday (1956) I'm Walkin' (1957) Valley of Tears (1957) I'm Walkin' is a 1957 song by Fats Domino, written with frequent collaborator Dave Bartholomew. The single was Domino's thi…

安倍晋太郎安倍晋太郎(攝於1987年4月21日) 日本第112、113任外務大臣任期1982年11月27日—1986年7月22日总理中曾根康弘前任櫻内義雄继任倉成正 日本第42任通商產業大臣任期1981年11月30日—1982年11月27日总理鈴木善幸前任田中六助(日语:田中六助)继任山中貞則 日本第41任内閣官房長官任期1977年11月28日—1978年12月7日总理福田赳夫前任園田直继任田中六助(日语:…

2016年美國總統選舉 ← 2012 2016年11月8日 2020 → 538個選舉人團席位獲勝需270票民意調查投票率55.7%[1][2] ▲ 0.8 %   获提名人 唐納·川普 希拉莉·克林頓 政党 共和黨 民主党 家鄉州 紐約州 紐約州 竞选搭档 迈克·彭斯 蒂姆·凱恩 选举人票 304[3][4][註 1] 227[5] 胜出州/省 30 + 緬-2 20 + DC 民選得票 62,984,828[6] 65,853,514[6] 得…

Військово-музичне управління Збройних сил України Тип військове формуванняЗасновано 1992Країна  Україна Емблема управління Військово-музичне управління Збройних сил України — структурний підрозділ Генерального штабу Збройних сил України призначений для плануван…

Kembali kehalaman sebelumnya