English Sentence Structure and Entity-Relationship Diagrams - Semantic Scholar
P. P. Chen, English sentence structures and entity-relationship diagrams, Information Sciences, () 4. P. P. Chen, The time-dimension in the. 12th International Conference on the Entity-Relationship Approach, Arlington, [ 10] P.P. Chen; English Sentence Structure and Entity Relationship Diagrams;. An Extended Example - the Structure of English Sentences notation of phrase structure grammars into the more general entity relationship notation. if you know already entity modelling then use this model as a means to explore formal.
Natural language processing techniques was applied as first step, domain ontology applied to improve the performance of identification process. Their tool introduces a semi-automated process. The Approach Entities, attributes, and relationships are the basic elements of ER models. An entity is an object that exists in the real world and it is distinguishable from other objects.
An entity type is a collection of similar entities with its own attributes.
A relationship is an association among two or more entities. Relationships can be derived from verbs.
Key constraint in a relationship is represents by cardinality. The information extraction system begins by sentence segmentation processing, which is a morphological analysis applied to specifications followed by tokenization process.
The result from tokenization process is words only. Chunking and parsing apply multiple possible analyses on results. Parsing is the process of using a grammar to assign a syntactic analysis to a string of words forming parsing tree. Finally, information extracted from parsing tree used to generate ER diagram. Each process is described in detail in the following subsections.
Sentence Segmentation In this step, morphological analysis is applied on the natural language text.
- Extracting Entity Relationship Diagram ( ERD ) from English Sentences
User enters the requirement specifications in the provided workspace area. Then, analyses process is performed to determine sentence boundaries, and Split text into sentences.
Usually, each sentence must end with period and this period terminates the sentence. Eliminate all non- word tokens like punctuations, removing plural suffixes in nouns, such as s, es, or ies, and converting plural entity names into singular. Figure 2 shows an example of sentence segmentation process using ER generator.
Sentence Segmentation Example As shown in Figure 2, the text of requirements is written in natural language in NL Specification section. Requirements analysis is perform when the user press on Sentence Segmentation button.
Basically, the process then determine the sentence boundaries, split text into separated sentences, eliminate all non-word tokens, such as punctuations, removing plural suffixes and converting plural entity names into singular.
For example, in Sentence 1, contains is mapped to contain. Also, libraries and authors are mapped to library and author respectively. Tokenization In tokenization process, words and numbers in each sentence are identified.
Basically, the proposed tokenization is set to break up the given sentence into units called tokens separated by spaces. For example, the sentences "I like solving interesting problems ". Such implementation similar to string. Figure 3 shows the result of performing tokenization. Tokenization process can identify each word in user input data. However, compound words that use commas and periods add complexity.
For example, a tokenizer may have to recognize that the period in "Mr. Ali" does not terminate the sentence. Tokenization Example Tokenizing the specification as shown in Figure 3 include breaking up the given text into tokens. Tokenization process can identify each word in specification. For example, each sentence shall appear without period or comma and each word split from other words in the text. Table 1 summarize list of symbols and abbreviations.
Figure 4 show the result of performing POS tagging on a given text. Perform POS Tagging 3. Chunking Chunking is the process of taking individual units of information chunks and grouping them into larger units. Tokens of a sentence are group together into larger chunks, each chunk corresponding to a syntactic unit such as a noun phrase NP or a verb phrase VP.
English Sentence Structure and Entity-Relationship Diagrams
To perform the chunking, a POS tagged set of tokens is required with tokens itself. Part of speech tagging tells whether words are nouns, verbs, adjectives, etc. Sometimes it's useful to have more information than just the parts of speech of words.
Chunking usually selects a subset of the tokens together to indicate its type noun phrase or verb phrase. Perform Chunking Chunking is a way of organizing information into familiar groups. Performing chunk process include tag tokens set with its POS. Chunking is an intermediate step towards full parsing.
Parsing Natural languages grammar is ambiguous and has multiple possible analyses. Each sentence may have many potential Parses tree. Most of them will seem easy to a human.
However, it is difficult for decide which of them is in the specification. Therefore, Parsing process determines the parse tree of a given sentence. This step helps us in identifying the main parts in a given sentence such as object, subject…etc Parsing examples are shown in Figure 6 and Figure 7.
Some parsers assume the existence of a set of grammar rules in order to parse a given sentences. Following examples of such rules, however, recent parsers are smart enough to infer the parse trees directly using complex statistical models . Parsing analysis will be able to extract nouns that are playing the role of entities or attributes, and extract verbs that act as a relationship between entities.
Also, cardinalities and multiplicities information may be extracted from determiners, adjectives, model verbs and quantifies.
MBSP is a text analysis system provides tools for tokenization, sentence splitting, part of speech tagging, chunking and relation finding.
Parser Tree for the Sentence "Ali Hit the Ball" The proposed methodology based on a set of identification rules that combine different concepts from other works as follows: A common noun may indicate an entity type [5, 9]. A proper noun may indicate an entity [5, 9]. A gerund may indicate an entity type . Identify attributes Attributes are nouns mentioned along with their entity, it may proceeded by the verbs has, have, or includes which indicate that an entity is attributed with a property.
For example, in "employee has id, name, and address", employee is detected as an entity, and name, id and address are detected as attributes. Here some rules that identify attributes in specifications. Noun phrase with genitive case may indicate an attributes .
The possessive case usually shows ownership it may indicate attribute type . Identify relationships The main verb that occurs between two entities is more likely to be a relationship. Two entities can be separated by main verb only, by main verb and an auxiliary verb, or main verb and modal verb. A transitive verb can indicate relationship type . A verb followed by a preposition such as "by", "to", "on" and "in" can indicate a relationship type .
An adverb can indicate an attribute for relationship . Adverb uniquely indicates PK of an entity . Once all words have been assigned to its ER element type, relevant information consisting of which words are entities, relationships, cardinalities and attributes are stored in text files. ERD is the first step in database design, it is also a simple technique described in a graphical way to decide which database fields, relationships and tables will be the base of any database.
ERD is a good communication tool between users and who use the system during the identification of the user information requirements process. ERD not only provides a modeling features but also it is the starting point of a safe and high quality database design wit h all of its well defined semantics .
The remainder of this paper is structured as follows: Section 3 described a real example of how the methodology can be applied into English sentences, Section 4 states the limitation. Related work is presented in Section 5, followed by conclusions and future work. This aims to map each part of English sentence to what matched by the ERD component. English sentences used in this methodology must satisfy some rules to find the main components, each of these main component heuristically correspond to a concept in the ERD, each of these concept are then represented by a symbol and labeled by their associated name.
The main structure of the proposed methodology is shown in Figure 1. This is being described according to the following subsections in details: Sentences Checker Sentence checker is the pre-process of the sentence tense, aims to check the English sentences that will be translated to the ERD. The sentences that satisfy the checker rules will be accepted to move to the next step, other sentences will be ignored.
For example if some one gives the sentence: How can I find it? This sentence will be ignored. Sentence Component Finder In this step English sentence is going throw steps to get the main component that are useful in ER-Diagram extraction. So the methodology dived this step into two sub-steps: Sentence Part of Speech Finder: The need to know the part of speech the words belong to aims to use them correctly; English sentence has eight parts of speech, Nouns, Pronouns, Adjectives, Verbs, Adverbs, Conjunctions, Prepositions and Interjections.
The main ER-Diagram components that this step aims to find from English sentence are a list of Nouns, Verbs, Adjectives and Adverbs, and save them as a table that contain the word and what part of speech it is, this table will be saved as a text file which then ready to use in the next step.
Sentence Part of Numeric Operation Finder: Many of the numeric that are used in the real life take a format so this will easy dealing with them, as we know both Social security number and Phone number in each country take a format, so the proposed methodology can select the country as a first step of work, so it is easy to identify the format for both Social security number and Phone number that country use them.
For example in Jordan If the format of the numeric is 9 name the attribute Social security number, and if the format of the numeric is 07 name the attribute Phone number. Add the word numeric and what name it takes to the table that establishes in step 2. ER- Extractor ER- extractor aims to extract entities, relationships and attributes from the English sentence.
Each English sentence will have its main ERD feature.Part 2.1 - Entity relationship model diagram in dbms in hindi introduction and basics syllabus
Many rules were used to match between ERD features and English sentence part heur istically, based on works [4, 5, 6 and 7], Table 2 summarized these rules. So the base is the equivalent form, based on work , Table 3 summarized these format and how to deal with them. The equivalent form X will be treated as a relationship between Y and Z, so both Y and Z considered to be an entities.
X will be treated as a relationship between Y and Z, so both Y and Z considered to be an entities. Applying cardinalities constraints on the output of ER-extractor is the main goal of step of ERtranslator step, according to this there is need to use predicate.
N one-to-many or N: In ER- translator step, after translating English sentence into the correspondence FOL the following rules must be applied: Every English sentence get a number ER numberwhere number represent unique number for each sentence.
There was a problem providing the content you requested
Destination entity, S for singular and P for plural. According to this step, Table 4 firstly translated the sentence into the FOL, after that, the sentence represented according to ER-translator rules as shown in Figure 2.
Chen's notation Symbolizes for the ERD attributes by oval symbols, relationship types are represented by a diamond between two entities and put the cardinalities, and the entities are symbolized by a box with entity name. According to the ER- translator, the symbol P represents in cardinality Nand the symbol S represents in cardinality 1. If there are two cardinalities represented in the symbol N and related to the same relationship, one of them must be named N and the other named as M.
Examples To illustrate the steps of extracting ERD from English sentences, a list of examples were introduced. Table 5 gives some examples of finding the English language concept. Table 6 explains in examples how each part of English sentence from Table 5 is represented according the rules of representation. So sentences in Table 6 represented as shown in Table 7.
A 30 year-old engineer works on a project with a project number for a percentage of his time. A 2, Project number, E2.
Entity-Relationship Diagrams and English Sentence Structure. | BibSonomy
A 3, percentage of time, R1. A The average salary. Color of the car is green. There are 25 students in the class.
Extracting Entity Relationship Diagram ( ERD ) from English Sentences - Semantic Scholar
Discussion The good ERD should satisfy some basic rules such as all relationships and attributes must be connected, the entity name should be unique, each entity at least should have one relation, it is impossible for a relationship to be connected directly to another relationship and for every entity there should be at least one attribute. The examples introduced here aims to show how the methodology applied on the English sentences, not to extract the good ERD that satisfy these basic rules.
The introduced examples show the steps of extracting ERD from English sentences context, by applying a heuristics for language syntax rules.