By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email
No need to pay just yet!
About this sample
About this sample
Words: 2165 |
Pages: 5|
11 min read
Published: Jul 15, 2020
Words: 2165|Pages: 5|11 min read
Published: Jul 15, 2020
Conventional systems are inundated by the unconventional characteristics of Big Data and provide several opportunities for researchers to explore avenues coupled with Big Data research. Semantic Web provides constructs for associating semantics or meaning to data and can be employed to express and query data in triplets form. The combination of Semantic Web constructs with Big Data tools can lead to a scalable and powerful automated system. Research proposal for developing a contextual model for Big Data Analysis using Advanced Analytics has already been published. The use-case that was proposed to realize the model, addresses the Resume Analytics Problem which is to design and develop a model to find the most appropriate fit between demand description criteria and individuals from a large pool of unstructured digital resources.
Research work presented here is an attempt to develop a preliminary prototype to realize the model. Knowledge representation using Semantic technologies and searching and matching based on concepts rather than key words, to identify a conceptual relevant match has been espoused in this work. 1. Introduction The term Big Data is ubiquitous now and the issues and challenges associated with Big Data characteristics namely, volume, velocity, variety, veracity, etc. need to be addressed. Taxonomy presented in classifies Big Data Analytics based on the dimensions of time, techniques and domain. The Big Data Analysis Pipeline in particular ignites research minds to present innovative thoughts and proposals to address the data deluge concerns for the benefit of the community at large. Research in the area of Big Data has produced novel methods of processing unstructured data that constitutes a highly major sector of Big Data.
Web 2. 0 technologies, being content and manifestation centric, lack data reuse and information sharing between two or more data sources. This is where Semantic Web or Web 3. 0 pitches in and provides data formats that are suitable for data linking and sharing. Linked Data, built on standard web technologies such as HTTP, RDF and URIs, allows data to be published in a structured format over the Internet. It enables data from different sources to be interlinked and queried and presents information on web in such a way that not only humans but even computers/programs can read and extract meaning out of it. This facilitates communication between two programs without any human intervention. Semantic Web uses graph database to store data and associate meaning to it and has two main components – Resource Description Framework (RDF) and Ontology.
RDF is a W3C standard that describes resources' properties and attributes on the web in triplets form to enable machines to consume and exchange information. Ontology is a way to address heterogeneity issue at a schematic level by providing a collection of terms and concepts that can be used in modeling data to share knowledge. Organizations are often faced with the problem of finding the most befitting candidate to fill up a vacancy and at the same time, aspirants are facing challenges in identifying the appropriate job opportunity based on their knowledge, academic and professional qualifications and experience. A resume falls under semi-structured or unstructured data that is used by professionals to present their work experience, skills, qualifications, awards and achievements etc.
People's resumes and skill profiles are a kind of information that is of a particular relevance to the ExpertFinder initiative which aims to develop vocabularies, their rule extensions, best practices and recommendations towards standardization of metadata in the form of Resume-RDF ontology that would enable computer agents to find experts on particular topics. The proposed system is an archetype for demonstrating the model published in prior work. The system incorporates Resume-RDF ontology to express resumes in the form of a concept hierarchy. It exercises a graph of skills to enhance the skills section of resumes to get more meaningful search results than the traditional keyword search approach. SPARQL, a recursive acronym for SPARQL Protocol and RDF Query Language is used to retrieve the relevant resumes. Apache Spark, a distributed cluster computing engine used for large scale data processing framework, speeds up the time consuming task of matching resumes.
There are numerous software applications available for managing resumes within a company. Earlier work standardizes skill information by mapping skills from resumes to a preloaded list of skills presented in the form of a tree. The system performs candidate matching based on criteria entered by user and gets profiles based on direct matching. Where there is no direct match, it performs sibling matching and ranks the matched resumes. The system does not perform semantic analysis of data. Automated methods for parsing unstructured resumes and extracting information were proposed next. A resume parser proposed automatically segregates information in four phases using named entity clustering algorithm. However, this approach doesn't say much about matchmaking when a job criterion is given.
Another work uses self-recommendation learning engine to dynamically populate parameters of a candidate. Subsequent methods proposed ranking resumes when there are multiple matches for criteria. Authors present a research issue of developing an improved approach which will help in selecting the right resume by processing a set of similar resumes. The approach is independent of user query and helps the user to discover useful information which he is unaware of. Experimental results show that there is 50%-94% reduction in the number of features recruiter needs to look at to select appropriate resumes. The proposed method doesn't look at specialty in other sections such as “Education”, “Achievements”, “Work Experience” etc. Matchmaking strategies were then combined with the concept of ontologies.
One more research describes an approach which makes use of ontology and a deductive model to determine what kind of match exists between a job seeker and a job posting and then ranks resumes based on similarity measures if there is a partial match. The system does not however make use of standard/formal ontologies which can be used for automated sharing/updating of information between other sources. Approaches for composing and publishing resumes semantically were proposed next. The approach captures resume information through a semantically aided graphical user interface. The approach focuses only on composition of resumes.
ORP – Ontology based Resume Parser incorporates Semantic Web approach for finding suitable candidates. Approach does not handle partial matchmaking and ranking of resumes. A method presented the research details the combination of latent semantic analysis (LSA) and ontological concepts to match resumes and job description. The approach addresses issues regarding usage of ontology in building the LSA and clustering for improved matching.
Another Research work classifies linked data as a part of big data landscape. The 4th paradigm of science is Exploration, learning new facts based on existing data. Linked Data acts as a proving ground to research some of the big data challenges that use underlying ontology to describe data in terms of entities, attributes and values. Similarity based matchmaking when an exact match is not found is not considered in this approach. Also, it does not comment on the effect of having a large volume of resumes on the performance of system. Parallel research in applying graph theory principles to resume processing led to expressing resumes as a graph and applies big data tools and technologies on the resume graphs.
Overall architecture of the proposed system is shown in fig. 1 that comprises of two main modules – Resume Processing and Resume Matching. Resume Processing consists of capturing resumes, tokenization and segmentation, conversion of resumes into concept hierarchies and persisting them in a permanent storage. Resume Matching consists of accepting a search criteria, converting the criteria into a SPARQL query and fetching results. Both of them use the Skills Graph module. The three modules, Skills Graph, Resume Processing and Resume Matching are described next.
This module reads skills and their associations from a text file and builds a graph hierarchy. The skills graph groups skills into a hierarchy based on their inter-relationships. A node at a higher level is broken down into smaller nodes at the lower levels. Generic skills and concepts are at the higher level in the graph. Specific tools, technologies and languages form leaf nodes.
This module captures resumes from a web UI, performs preprocessing and generates concept hierarchy. The Preprocessing step involves converting the input resumes into a sequence of tokens which are then mapped onto one of the four sections – Personal, Work Experience, Education and Skills. The Skills section is then mapped against the Skills Graph to add context by augmenting them. Augmentation of skills results into extracting hidden meanings and associations between the skills. Each skill, mentioned in the input, is looked up in the graph. If found, all nodes from the specific node to the root of the graph are added as skills, and the resume is thus semantically augmented. For example, suppose a person has mentioned C++ in his resume. When this resume is passed through the skills graph, the system learns that C++ is an object oriented language and a new skill “Object Oriented Programming” is automatically added to his list of skills. This is done for each skill specified in a resume. The resumes are thus semantically enriched. The Concept Hierarchy Generation step involves creating concepts and representing them in the RDF format using Resume-RDF ontology. Python code, incorporating RDFLib , the Python RDF Library for working with RDF, has been implemented to: map the features extracted from the Segmentation step into the Resume class attribute. Consume the Resume-RDF namespace. Convert the resume attributes into RDF-triplets as per the Resume-RDF namespace .Persist the RDF concepts in Sesame triplet store. Sesame from Aduna, is a framework based on Java used for capturing, processing, storing and retrieving RDF data. This is a popular triplet store and supports both in-memory and on-disk mode of storing data.
This module searches for resumes and ranks them. It accepts search criteria from a web UI which are then expanded using the Skills Graph. The expanded search criteria are then fed to an Apache Spark plugin. This plugin: • converts the criteria into a SPARQL query• Submits the query to Sesame Triplet Store and fetches RDF concepts• Iterates over the result and scores each resume. Since skills are semantically enriched in the triplet store; matching of skills yields better results than the normal keyword search.
Initial snapshots demonstrate the working of the Resume Processing module. Working of the Resume Matching module is demonstrated next, by comparing the two methods of searching candidate resumes, viz, keyword search and contextual search. A User Interface is presented to capture various details that comprise a resume. The server runs the Resume Processing module and converts the input resume into Resume-RDF XML triplets. Fig. 3 shows a sample Resume-RDF XML generated by the server for an input resume. The resume, converted into triplets format, is then persisted into the Sesame Triplet Store.
One of the approaches of searching for candidates, given a skills criteria, is to perform keyword search where each skill is matched in the resume text. This method yields accurate results because the match performed is an exact match. For example, if input search criteria is 'C++' then this approach fetches all resumes containing 'C++' in the Skills section. However, this approach is not suitable for context based searches. If input search criteria contains “Object Oriented Language” then, all resumes should be fetched where one or more object oriented programming languages are mentioned in the Skills section. In other words, this search should fetch resumes with programming languages such as C++, Java, PHP, Python etc. The keyword based approach fails here because it can't infer the relationship “C++ is an object oriented language”. Graph based search facilitates context based searching. Skills are correlated with each other and a graph is formed from it. Then, this graph is used to expand the skills section of resumes. For each skill mentioned in the resume, a set of correlated skills are retrieved from the graph and added to the Skills list
Big Data, Semantic technologies and Advanced Analytics are the key enablers for progress and innovation. The work presented in this paper leverages semantic constructs for competence management. The system gathers and processes information from resumes and enhances the skills section of resumes using a skills graph. Then, a concept hierarchy is built for each resume and is expressed using RDF-triplets. The concept hierarchies are persisted in Sesame Triplet Store for retrieval during resume matching step. Given a selection criteria, resumes are shortlisted by building SPARQL queries from the given selection criteria. The system performs both exact search and context-based search based on skills. The Apache Spark plugin to match resumes has the capacity to process a large collection of resumes. The work may be extended to verify and validate the proposed system for datasets of Big Data scale. Future work would focus on the application of advanced analytic techniques in the form of machine learning algorithms to realize the model in entirety.
Browse our vast selection of original essay samples, each expertly formatted and styled