Definition of Competency or Understanding of Competency
Information retrieval (IR) for me is about “bringing order out of chaos,” or the process involved in information creation (design), information storage and retrieval (query), and information user and user experience (evaluation) (Tucker, 2016b, p 1).
Information retrieval design is a “wicked problem” and complex process (which simplified could be shorthanded as “design-implement-evaluate”), that begins with information creation (Weedman, 2008, pp 114-115; Weedman, 2016b), and an “information bearing entity,” a document, being represented in some way (Weedman, 2016a, p 8). “The representation of a document in an information system is called ‘metadata’ (Weedman, 2008, p116), which takes on myriad forms and formats (Tucker, 2016b, pp 6-16). Data structure emerges from choices made regarding attributes (name, value) given to a document, and the fields (name, value) given to records (Scott, 2016, p 3; Weedman, 2016d, pp 4-5) “Subject metadata” refers to an IR system that allows one to gather together documents about a specific topic using controlled vocabularies, natural language, classification, and/or social tagging (Weedman, 2016d, pp 4-11; Weedman, 2008, pp 116-117). Classification is the two-part process of aggregating and discriminating between things or objects and is related to an IR system’s domain and structure, whereas a controlled vocabulary is relating to “aboutness,” and disambiguating language (Scott, 2016, pp 11-21). Controlled vocabulary uses alphabetic order and relies “on a thesaurus to express relationships”; in classification, “related topics are near each other” and “specific topics are contained within broader classes” (Scott, 2016, pp 17-18). Coordination (post- and pre-) is about how multiple words are used in a query. In pre-coordination the indexer coordinates appropriate terms using syntactic rules; in post-coordination the indexer’s terms are “independent and may be in any order,” and it is the user who coordinates terms using Boolean operator AND (Scott, 2016, pp 22-32).
Querying in an IR system is about understanding how the system is designed and using search strategies that fit that system (Weedman, 2008, p 120). Controlled vocabularies, usually created for databases, germinate from an alphabetical list of terms, which grows into an authority list, and with additional information becomes a thesaurus, eventually developing into complex hierarchical relationships that can help the user distinguish between broader, narrower, and related terms (Weedman, 2016d, pp 12-14). If the user understands this data structure, they will have a better idea how to search, and their queries will improve. Search engines use algorithms (sorted by alphabetical index), web hierarchies, and relevance ranking, all of which are usually propriety in nature and thus hidden from the user (Weedman, 2008, p 117). Findability on the Internet has a lot to do with words, for example “in meta tags, those in section titles and descriptions, [in] the density of keywords, [and] where they are located on the page” (Weedman, 2008, p 124). A user browsing a website is not searching but rather navigating: the design of a website has a structure similar but different to the way one uses subject searching in a database (Weedman, 2016c, p 1-47). Websites also tend to use faceted classification systems where, rather than starting with a broad category and then systematically subdividing, the classification is strictly based off of the first facet/attribute (Weedman, 2016c, pp 35-43).
To evaluate an IR system’s performance, we ask if it retrieves “all and only” the information one desires, that is, relevant results: relevance is measured by how well it recalls information and with what precision (Weedman, 2008, p 123). User research is the process carried out by and for IR designers on information seeking behavior and information needs, and involves information gathering activities such as direct observation, interviews, surveys, task analysis, focus groups, card sorting, and the like (Weedman, 2016e, pp 1-13). IR evaluation is about usability of the whole system, its parts (for instance the collection, the records, and the controlled vocabulary), as well as the user’s perception and experience; search engine evaluation is more about functionality (Tucker, 2016a, pp 2-3). Controlled vocabularies will be evaluated for exhaustivity, specificity, and the ability to disambiguate, while search effectiveness would evaluate precision, recall, effectiveness, and relevance. (Tucker, 2016, pp 4-5)
Human beings have always searched for, compiled, and shared all manner of information in the form of reference works (Lynch, 2016, pp 3-5). With the rise of the computer, databases, search engines, and other information retrieval systems have come into existence. Libraries have been around since the 7th century BCE (Lynch, 2016, p 311), but it was not until the early 19th century that the modern library and librarian came into being (Shera, 1949, pp 107-18 & pp 170-181). Information retrieval systems are the way we access information in the 21st century, and librarians (or information professionals) continue to be the authorities on how to design, query, and evaluate these systems. We create, use, and judge reference material and it has always been important to stay current, as information and information formats change mercurially. Information systems have also become absolutely necessary in a globally wired world, and information organizations, to stay relevant, need to understand not only the fundamental concepts, but how the fundamentals are changing and why. Finally, our information organizations are beholden to our users, and we would be remiss if we only evaluated the systems and not the experiences of those seeking out information.
Preparation to Understand Competency E: Coursework and Work Experience
I have been a public service assistant for over 10 years in a major public library system, with access to myriad reference resources including a long list of databases. My class on Information Retrieval System Design (INFO 202) gave me a real detailed look under the hood of the systems I use every day. INFO 202 not only gave me the terminology to describe how these systems are created, used, and revised, but they helped me grasp how the component parts fit together. I have always been an information hound and I have read extensively on the history of libraries, reference books, and reference services. INFO 202 helped me connect the past to the present and understand how and why these systems came to be, and how they continue to change. The Online Searching (INFO 244) class was very helpful for naming the different search techniques and strategies I had already been using, and then introducing me to some new ones like citation and patent searching.
For my first evidentiary item I chose four different searches I performed for INFO 244; I included these because they cover retrieving information from different IR systems, and utilized unique queries for each. For my second evidentiary item I included a Libguide I created for the same class; I chose it because it shows IR content and organization, deals with a particular user group, and is concerned with usable design. The database design and subject analysis project from INFO 202 I chose because it covered several areas of the learning outcomes for this competency. I chose my overview of Google Scholar’s search capabilities and limitations from INFO 244 for my fourth evidentiary item for its evaluation of a database IR system. Finally, I am also presenting the Site Map Project from INFO 202 because of the design decisions we made for that website.
Evidence
Searching & Retrieving from Different IR Systems
Concept Mapping, ProQuest Subject Search; Web of Science; Lexis Nexis
Discussion of Evidentiary Items
These are four exercises I turned in for my INFO 202 class:
Concept Mapping:
https://documentcloud.adobe.com/link/track?uri=urn%3Aaaid%3Ascds%3AUS%3A1054036f-8c40-4cb8-9a52-22fc2aa6bf00
This assignment had me using various search strategies in both search engines and databases. I used Boolean operators and natural language searching.
ProQuest Subject Search
https://documentcloud.adobe.com/link/track?uri=urn%3Aaaid%3Ascds%3AUS%3A1e04501b-cca6-4f56-b223-7d42d98bc790
The first part of the Subject search assignment involved searching in health and medicine databases. I was tasked with finding articles on malaria vaccines that have been tested on humans, with recent results of clinical trials. I used proximity searching, truncation, Boolean searching, Subject and Abstract heading searches, and date range delimiting. I perform an effective search after analyzing my search strategies and search paths several times.
For the second subject search I was looking for strategies to manage the impact of climate change on freshwater lake ecosystems. First, I found an appropriate database that dealt with the aquatic sciences. I used Boolean and proximity searches and incorporated suggestions on phrasing and topics from the subject field. A Subject/Abstract/Title search worked well but brought up too many results. I paid attention to term frequency and in one search came across another, broader term for the study of inland waters, “paleolimnology.” Drawing this term from the controlled vocabulary helped narrow my search considerably. More proximity searching with a narrowing of the date range brought me even closer. Finally, looking at the results for relevance and citations brought up suitable articles.
For the final subject search, I was to search historical newspapers, for editorials or letters to the editor to determine how different regions covered the drama surrounding springer Jesse Owens’ record breaking efforts at the 1936 Olympics. I searched databases from a subject list using a variety of Boolean searches, a search stacking Title and Abstract headings atop the Subject heading and delimiting a document field to articles. Finally, I read through articles for relevancy.
Web of Science:
https://documentcloud.adobe.com/link/track?uri=urn%3Aaaid%3Ascds%3AUS%3A7c3027d4-db64-4205-8148-6bfc300b2946
For the Web of Science (WoS) exercise, I was tasked with answering a series of questions about navigating through this database and then asked to document my activity. For question one (in 4 parts), I:
- document the search terms & fields and delimiters I used in a search for an author, Marydee Ojala,
- name how many publications she has in the WoS index,
- name her most cited publication and the number of times it has been cited,
- of all the articles that have cited the author I was to provide the APA citation for a) the most highly cited article, and b) the number of times it was cited.
In the following three questions I provide definitions of various functions within WoS including Times Cited, Cited References, the citation network tool ‘Analyze Results,’ H-Index, and Impact Factor numbers, as well as retrieve some results with those functions/tools. I also describe the ORCID and Researcher ID services and their use for scientific researchers.
LexisNexis:
https://documentcloud.adobe.com/link/track?uri=urn%3Aaaid%3Ascds%3AUS%3A622c3503-52fa-4192-a93e-06e3bd84c3dd
This exercise consisted of four separate searches. The first search was a topic search using Nexis News. Given some preliminary information I was then supposed to name two offshore processing centers and two onshore processing centers set up by the Australian government to process refugees, and perform the same search using the delimiter ‘Index Terms’ in the Power Search field. I summarize my search path throughout, the search strategies I employed, my use of a wildcard (truncation), and the various delimiters I used. I use the “W/n” connector (w/2) in an attempt to find documents where the search words occur within n (or in this case 2) words of the other. I mention intuiting that the controlled vocabulary might use the English centre as a preferred term rather than the American word center.
The second search was a company search for the major contract holder of the detention centers, using GlobalData Reports and Reuters Knowledge Direct databases. I sought out the name of the chief executive officer (CEO) for the company and its Standard Industrial Classification (SIC) code numbers. From this effort I was able to dig up more information on the company including what type of business they deal with, a financial overview, and their gross profits for 2015. Returning to Nexis News I was able to find two recent news stories on the company by limiting my searches to Australian news sources, delimiting the date range to six months, and then separated the name of the present company (Transfield) from two previous iterations (Ferrovial and Broadspecturm) using Boolean operators and a nesting operator (e.g., (Ferrovial OR Broadspectrum)).
The third search involved searching LexisNexis for a number involved in a legal case. I went to the LexisNexis legal tab and on the search page, found a search bar for citation numbers, and via a “Citation Help” link found a partial explanation on the proper syntax needed to search by citation. With a little experimenting I was able to figure out the proper sequence and pull up the supreme court case.
My fourth LexisNexis search, using the Market Insight database, sought to find out the names of two individuals, hired by Uber, who had previously been in the news for “controversial” activities. To find an article on the individuals I searched using the term ‘Uber,” checked the ‘Natural Language’ delimiter, set a date range for the whole of 2015, searched within “all countries,” did not search within any industries, and chose to search all available sources, sorted by relevance.
These four separate exercises show that I am capable of understanding the uses of a particular database, that I can fulfill specific and detailed queries using these databases, and that I can effectively use a variety of methods to achieve results.
LibGuide for Generic County Public Library
Discussion of Evidentiary Items
https://documentcloud.adobe.com/link/track?uri=urn%3Aaaid%3Ascds%3AUS%3A63c8d5f2-2e83-407d-a58a-3fa08c1d8a24
For INFO 244 I created a LibGuide for an imaginary entity—the Generic County Public Library— that consisted of a User Guide for the Gale Virtual Reference database, and a Workbook for better advanced searching with the database. The User Guide takes the student through the database’s advanced search feature-by-feature and explaining functionality along the way. For instance, after an initial search the database brings up an index of alternative terms from the controlled vocabulary. I explain to the student that if they choose one of these alternative terms, it will appear within the search box in quotation marks (nesting operators) and that when you put the term in quotation marks it allows you to search for those words in that specific order. I describe how the student can modify their searches with various delimiters (Title, Author, Publisher, Publication date(s), subject area, target audience, or language). I explain how some delimiters like Relevance are automatic, but they can be changed to other options. I point out how searching within the “Search within Results” search box allows one to search for words within the articles already retrieved. I show the student that within a chosen article how they can open up links to related subjects, find an alphabetical list of index terms related to their topic, and sometimes hyperlinks to articles outside their search. Finally, I explain how a subject guide search works and how it acts like a thesaurus, retrieving alternative subject search terms, related terms, and preferred spellings, all of which can help you narrow or broaden a topic, or construct new search phrasings.
In the Workbook I provide search tips, and search tools with sample searches. The tips I give the students are basic but necessary. For example, conjunctions (stop words) are not recognized by the database. I describe what punctuation will or will not work, and how words are generally ordered. The search tools cover Boolean operators, proximity operators, quotation marks, truncation, and nesting operators. I conclude my tutorial by illustrating for students how to go about formulating a search strategy, and how one might puzzle through phrasing variants, to arrive at one that will bring up the results they desire.
The construction of this LibGuide and Workbook show that I can describe and apply design in the service of usability, and I can make linkages between the content and organization of an IR system and its usability.
Database Design & Subject Analysis
Discussion of Evidentiary Items
https://documentcloud.adobe.com/link/review?uri=urn:aaid:scds:US:a048c33a-5212-4f54-b2cf-8e88ef6bff4e
The purpose of the Database Design & Subject Analysis project I co-created with three other students in my INFO 202 IR system design class, was to create a database in which the target user, MLIS students, would be able to do a search and retrieve discriminating results in order to achieve relevant outcomes for their specific queries. In preparation for designing this database, I completed several exercises that were designed to mimic stages that design, test, and evaluate a database, effectively creating alpha & beta prototypes, and going through a group evaluation stage. The result of this preparation is evident in the final database design. For instance, I demonstrate that I can identify attributes of a document and then determine appropriate fields and values for the attributes in order to create a basic data structure. I prove that I can sufficiently familiarize myself with the database I wish to use. I show that I know how to write a clear, comprehensive, and concise rule for use by indexers. I participate in creating a controlled vocabulary using authority terms to determine concepts and “aboutness.” The three-part structure of this database design document (Design, Content, and Query/Evaluate), mirrors the content of Competency E, and my work on each part demonstrates that I have the ability to design, create content, and evaluate the work both in particular and in broad outlines.
For the design aspect of this project I worked on all four aspects: the controlled vocabulary, the statement of purpose (SOP), the data structure, and the indexing rules. My role as editor in the group shows in the quality of the controlled vocabulary and SOP and ensuring that the data structure and rules remain clear, precise, and logical. The same can be said of the Content (Records) section of the project, where part of my job was to ensure that all the fields and values were appropriately and aptly assigned. This step required that I be able to cross-check the data with all the previous design elements and database structure, making sure that the collection was relevant to our statement of purpose, and for the imagined indexers and users.
For the Topics and Queries section of the project I came up with two query topics and performed two searches for each in our database, one in the Descriptor field and one in the Abstract field, broke down the results data to determine relevance and accuracy of recall and precision, and then analyzed/evaluated the results. My work for this section shows I can navigate the search, storage, and retrieval phase of the design process, can conduct appropriate testing on a user interface, and can properly analyze and apply testing results on the database prototype being examined.
The Evaluation and Reflection sections of this project shows my ability to synthesize data into a coherent, summarized evaluative statement. I indicate that I understand how to keep both the user and indexer in mind throughout the process, making sure both have been properly considered through the design and testing stages, and continuing to center their participation in the final evaluation. I show that I can recall, revisit, and revaluate previous choices, outcomes, results, and effects during this stage, ask appropriate questions, and extrapolate from available evidence. I demonstrate that I have the ability to fairly judge the pros and cons of a database’s overall design in terms of its usability by both indexer and user, but also can weigh the value of the specific work performed at all three stages of the design process— design, content/testing/querying, and evaluating.
Overview of Google Scholar’s Search Capabilities and Limitations
Discussion of Evidentiary Items
https://documentcloud.adobe.com/link/track?uri=urn%3Aaaid%3Ascds%3AUS%3A4b38a929-b786-4be6-be62-968ed8f88c7f
My final project for my Online Searching class (INFO 244) was meant as a way of demonstrating my understanding of key course concepts involved in searching, managing search services, and online user experience. I chose to submit an overview of Google Scholar’s (GS) search capabilities and limitations. Drawing from informed opinion and research from the literature, I chronicle how GS’s design evolved over a decade, and then, I analyze GS in three separate areas: coverage, utility for search and research, and its bibliographic utility. I describe how GS is an odd hybrid of both database and search engine and describe how it approximates an indexing function. I point out different GS features that were designed and implemented over time including ‘citation importing,’ ‘related search,’ access to judicial case law, ‘GS Citations,’ (allowed authors to track and measure scholarly production), ‘GS Metrics,” (allowing these scholars to create editable profiles that promised a window into their journal rankings), ‘My Library,’ (which enabled users to save and tag search results), and a tool to identify primary documents.
I describe the many ways that GS lacks traits associated with databases, such as delimiters, thesauri, or a controlled vocabulary, and how it cannot perform certain functions such as shepardizing case law. I note that early comparisons of GS to databases also compared GS to discovery tools, and this comparison seems apt given that GS appears to function like a database while appearing like a search engine. I highlight how the GS’s lack of an API (application programming interface) means it cannot interact with various kinds of software, making it less like a database and more like a search engine. Other limitations of GS I draw attention to, considered as a database, are its inability to bulk export results, its lack of a browsing option, its limited search results, its lack of clarity around how GS aggregates and discriminates results, and its citation problems. I describe how all these ways GS fails to measure up to databases resulted in the conception of the typical GS user as one who did not use databases.
I show that from 2014 onward, research into GS begins to focus more on bibliometric utility, and spend some time conjecturing, from the available evidence, why this might be so. Through a review of the literature I am able to compare GS to a variety of databases in much more granular detail. I am able to show how Google Scholar’s users, through their interactions with the IR system, began to expose both GS’s benefits but also its endemic problems. On the design side of things, I was able to dig up information on why institutional repositories (IRs) were not showing up on GS results, and the answer was issues with Google’s metadata schemas were inadequate. I show that GS is effective for average users but less useful for researchers and bibliometricians, but even for general users there are problems with accessibility. I consider the strengths and weaknesses of GS’s newest bibliographic tools and compare some functionality with competitors like Microsoft’s Academic Search (MA).
I conclude my study from a retrospective view, by looking at how GS has performed over time, and asking whether or not it has achieved the results set forth by its chief designer, when he envisioned GS back in the year 2000. I summarize my research by declaring that GS may work well enough for researchers but may not work as well for researchers of the future. This project shows that I can conduct research and produce informed analysis on an IR system that evaluates the design, content, and the search and retrieval functionality of that system, and that judges that IR system’s strengths and limitations through reason and analysis.
Site Map Project
Discussion of Evidentiary Items
https://documentcloud.adobe.com/link/review?uri=urn:aaid:scds:US:2dfafe89-b3f9-483c-a9ed-f1662a21f05b
My final project for my INFO 202 class was a website redesign (of the website pointlesssites.com), where before and after site maps were a prominent feature. My contribution to this project was not in the actual construction of the site maps but in the analysis of the website we focused on, and the reasoning behind our redesign. I describe how the home page serves as a design template for both the global navigation and the alphabetically organized database, how hyperlinks create relationships between shallowly arranged pages, and how this arrangement allows for a casual browsing experience. After analyzing the site’s advantages and disadvantages I lay out various solutions that are in effect, design questions. For instance, I ask a) what the user’s needs might be versus their actual experiences, b) how the site’s overall structure affects usability, and c) how the site navigation affects browsability.
In asking these questions I determine that the website has a broad and shallow organizational structure, that there is a great deal of redundancy to the content and the global navigation elements, and that the website’s database suffers from not having a controlled vocabulary. The redesign I envision features a pared down global navigation and a column consisting of an alphabetically organized list of hyperlinked subject headings. I describe my work reducing the number of ways the user may access the database, so as to give the user a cleaner and less cluttered interface. To give the users a better browsing experience I suggest moving to a (narrower) two-tiered website architecture, hyperlinked texts for navigation, and an improved database for the search function, which ended up as a consolidation of links into an alphabetically organized subject list. I describe a subject list with links to the subject, where the links are created through social tagging; these categories would increase and improve user’s choices. Finally, I recommend user testing as a further step in the design process. My evaluation of this website’s usability shows that I understand principles of IR design and functionality, and the improvements I propose show that I am capable of redesigning a website based on these principles.
Conclusion
In the classroom and on the job, I have used a variety of databases to fulfill numerous specific and detailed enquiries using multifarious methods. I can describe how the content and organization of an IR system functions, and I can describe and apply design principles in the service of IR system usability. These are skills I can use in almost any library environment, now and in the future.
I can identify attributes of a document and then determine appropriate fields and values for databases in order to create a basic data structure. I know how to write a clear, comprehensive, and concise rule for use by indexers. I have worked on creating a controlled vocabulary using authority terms to determine concepts and “aboutness.” I know how to construct records in a basic database I designed with the aforementioned skills, and I understand both how to test the searchability and retrievability of those records, and what to look for in the results. These skills I might be able to use wherever controlled vocabularies are being built, or databases developed.
I know how to conduct research and produced informed analysis of an IR system. I am capable of evaluating the design, content, and the search and retrieval functionality of an IR system, and can judge that IR system’s strengths and weaknesses through reason and analysis. I understand principles of IR design and functionality, and the am capable of redesigning a website based on these principles. These skills can come in handy when I need to evaluate a system’s functionality or usability, regardless of the information environment I am working in.
References
Lynch, J. (2016). You could look it up: The reference shelf from Ancient Babylon to Wikipedia.
New York: Bloomsbury
Scott, A. (2016). Information retrieval system design. Retrieved from https://documentcloud.adobe.com/link/track?uri=urn%3Aaaid%3Ascds%3AUS%3A55ea8765-e0fb-491e-aec0-deddb7ba115c
Shera, J. H. (1949). Foundations of the public library: The origins of the public library
movement in New England 1629-1855. Shoe String Press.
Tucker, V.M. (2016a). Lecture 7: Evaluation
https://documentcloud.adobe.com/link/track?uri=urn%3Aaaid%3Ascds%3AUS%3Afd261419-4919-4619-b49a-19526ccf3794
Tucker, V.M. (2016b). Lecture 2: Introduction to information retrieval systems. Retrieved from https://documentcloud.adobe.com/link/track?uri=urn%3Aaaid%3Ascds%3AUS%3Ae3dad532-8136-4e27-90fc-4ea9910653df
Weedman, J. (2008). Information retrieval: Designing, querying, and evaluating information systems. The portable MLIS: insights from the experts, 112-126.
Weedman, J. (2016a). Lecture 1: Introduction to the course and overview of course concepts. Retrieved from https://documentcloud.adobe.com/link/track?uri=urn%3Aaaid%3Ascds%3AUS%3Aaa3a34bf-2666-4730-8c0f-6336acdea434
Weedman, J. (2016b). Lecture 4: The design process. Retrieved from https://documentcloud.adobe.com/link/track?uri=urn%3Aaaid%3Ascds%3AUS%3A4cdf5054-9464-4068-acf0-e8565864abbb
Weedman, J. (2016c). Lecture 8: Designing for navigation: Web site structures and classification systems. Retrieved from https://documentcloud.adobe.com/link/track?uri=urn%3Aaaid%3Ascds%3AUS%3A0332958d-05bd-43dd-b1b9-cf0ad4c4f50c
Weedman, J. (2016d). Lecture 3: Designing for search. Retrieved from https://documentcloud.adobe.com/link/track?uri=urn%3Aaaid%3Ascds%3AUS%3A8cc15394-eabd-4216-bf53-994a5a947470
Weedman, J. (2016e). Lecture 5: User research. Retrieved from https://documentcloud.adobe.com/link/track?uri=urn%3Aaaid%3Ascds%3AUS%3Ad9f81ed0-031d-4481-937a-6a766ac60930