15 May, 2022

Heart Disease Prediction using Matlab

Introduction
Despite its importance, cardiology is also one of the most difficult specialties in medicine to practice well. If cardiac disease is not detected in its early stages, it has the potential to be lethal. In the United States, according to the Centers for Disease Control and Prevention (CDC), over 610000 people die from heart disease each year, accounting for one out of every four deaths, and this figure is increasing. An investigation by the Centers for Disease Control and Prevention indicated that men were responsible for more than half of all similar diseases discovered in 2009.
Visualization
Target is our response variable, which contains binary data in the form of either 0 or 1. The number 1 indicates that the individual has heart disease, whereas the number 0 shows that the individual does not have heart disease.


Based on these data, it was determined that the normal distribution applied to all four variables in this investigation. For the purpose of visualizing the relationship, the terms "target" and "all other variables" were plotted on a correlation matrix. We selected to focus on four independent factors since the number of available independent variables was large. These are the variables that physicians give the most attention to when attempting to diagnose the condition (cholesterol, heart rate, chest pain, and rest ECG test). The following is a representation of the correlation matrix:

Although doctors have focused more on those four indicators, we can see from the matrix that cholesterol, resting heart rate, and an ECG do not have a significant correlation with the dependent variable, despite the increased emphasis.
Prediction
Logistic regression model
When attempting to forecast the relationship between a category answer variable and a number of categorical independent variables, Logistic Regression was used. Approximately 75% of the dataset was utilized for training, while the remaining 25% was used for testing purposes.

According to the findings of the study, factors such as gender, chest pain, the number of visible major arteries, and the maximum heart rate all have an influence on the development of heart disease. Following that, we took the choice to create a new dataset that included only statistically significant elements.

Before applying the model, it is possible to utilize the variance inflation factor to identify whether or not there is multicollinearity in the data.

VIF seems to be modest, and the data does not appear to be multicollinear. The logit model can now be applied to significant variables data, thus we may proceed.

As can be seen, all of the criteria are significant, which is beneficial for making predictions. Let's have a look at how the connection works.

According to the above figure type 2 chest pain, according to the research shown above, is the type of chest pain that is most strongly connected with heart disease. When a person has chest pain, they may exhibit any or all of the signs of cardiac illness. Exercise-induced angina and an increase in the size of the major arteries help to reduce the risk of heart disease. Despite the fact that we achieved good results for our variance inflation factor, we nevertheless utilized cross validation using the trainControl() method to avoid overfitting.

The accuracy of the model is 0.8421 that can be translated to 84.21%.
Conclusion
A prediction model based on our study has an accuracy of 84.21%, and it can inform us whether or not we should take steps to lower our risk of having heart disease and stroke. In order to predict the connection between a categorical response variable and a number of categorical independent variables, we employed Logistic Regression. It is possible to increase the importance of various characteristics and to broaden the scope of our data collection in order to get better results.

Ozone Data Analysis Using Ozone Dataset

1 Data exploration
Several skewed variables may be seen in this dataset

We anticipate collinearity in the data due to the nature of the data.

A generic linear model of the data may also be used to detect collinearity and skewness, which we will discuss in more detail below. Take note of the collinearity in the Residuals vs Fitted plot, as well as the skewness in the Residuals vs Leverage plot, which are both shown in the figure.

2 Analysis
HourAverageMax and visibility were skewed to the right while pressure500Height is skewed in the ozone dataset. These variables are collinear, such as tempSandburg and inversionBaseTemp. Considering that the data was collected on a daily basis and weather patterns tend to stick around for a while, this is expected.




Because of this, penalized regression and decision trees were employed to select the best model for predicting the answer. When dealing with collinearity, penalized regression is an alternative to classical subset selection. As a result of this integration, decision trees are resistant to skewness. In order to decrease the correlations in our predictions, we may utilize random forests to pick from a subset of variables at each node. It's conceivable that we'll get it right this time.
Using ten-fold cross-validation, the models were evaluated. The optimal settings for penalized regression were determined to be:
Boosted and bagged random forest decision trees were also tuned using ten-fold cross-validation. The boosted models were used to assess shrinkage and interaction.depth. For the random forest model, the number of predictors, or mtry, was adjusted.
The Double CV was used to evaluate the following models, which are listed below:
1. Penalized regression with alpha =.06 and lambda = 0.062662 using the glmnet function with alpha =.06 and lambda = 0.062662
2. It is possible to increase the performance of decision trees by using the gbm function with shrinkage =.001 and interaction depth = 4.
3. In order to build a random forest, randomForests is utilized in conjunction with two predictors.
We employed the double CV strategy to arrive at Model 3, the Random Forest model, which resulted in the following results.
Double CV evaluation on random forest model (3)
A respectable amount of variance was explained by the selected model, which was demonstrated to account for around 75% of the variation in the test data.
3 Random forest model results and conclusion


In the area of Upland, California, the variable significance plot reveals that temperature is the most important factor to consider when determining the concentration of ozone. Pressure and humidity are right on its heels, following closely after. This is due to the fact that ozone's activity under pressure is to condense, and so these critical characteristics are self-evident. When there is a high quantity of humidity in the air, ozone has a difficult time dissipating from the atmosphere.

Finally, I'd urge app developers that use our algorithm to make sure they convey the accuracy of the approach to their users in an appropriate manner. Although a forecasting program with an estimated accuracy of 75% is acceptable, ozone levels more than 0.1 parts per million (ppm) constitute a significant hazard to human health and should be avoided. The app should provide a warning and give links to more up-to-date information if the algorithm is not updated in real time, so order to avoid widespread fear or a false sense of security. Additional data must be collected in order to obtain a more accurate forecast of the response variable. Thus, app developers would be able to provide the general public with an accurate and very precise ozone alert forecasting system as a consequence of this.

09 June, 2021

Transitioning to NOSQL databases from Relational Database Management System

Introduction
NoSQL database is the presentation of data in a different form from the tables with relations. The NoSQL as the name suggests the database doesn’t apply the usage of SQL queries. The relational database management system on the other hand applies SQL queries on tables with relationships between the different entities (Poojary, Poriya, & Nayak, 2015). The application of NoSQL has been as a result of increase level of data due to the internet. The internet has allowed organization to access huge levels of data which are crucial towards ensuring that better decision making process is achieved. The Relational Database Management System has been application by organization for an extensive period of time and it doesn’t suit the huge influx of data due to utilization of internet. The Relational Database Management System negatively affects the ability of an organization to identify patterns thus limiting the opportunities available.
Research problems
The Relational Database Management System has problem when handling huge volumes of data. The huge volumes of data are a big constraint to organizations. The huge volumes of data are as a result of data from different sources that are crucial to business operations. The different data types of information collected by organization are impossible to be stored in the relational database system. The other problem is the huge velocity of data thus making the relational database system unsuitable to handle the data (Poojary, Poriya, & Nayak, 2015).
Research questions
Does NoSQL solve the problem facing the Relational Database Management System?
What are the benefits of using NoSQL database for business operations?
Benefits of switching to NoSQL database
According to Mukherjee NoSQL database is beneficial as compared to RDBMS since it allows the storage of unstructured, semi-structured and structured data (Mukherjee, 2020). The NoSQL allows the data to be stored in a format closer to the actual data thus making it easier for interpretation by the organization. The ability of the NoSQL database to store data in different formats is important in ensuring that business organizations are able to gain a better understanding of a business situation because of the available data. The different type’s format of data allows better patterns on the data to be identified thus allowing improved decision making process. The NoSQL offers a better ability to the individual to perform updates and change of the fields and schema. As opposed to Relational Database Management System the NoSQL database is more flexible and allows the developer to keep changing the schema and field to suit the source of the data. The flexibility is important in ensuring that the data collected from the consumers is more advanced and suitable to the needs of the organization (Teodoro, Wei-Kleiner, Sundvall, Karlsson, & Lambrix, 2016). The consumer trends from the data keeps on changing thus it is important for the schema to change to suit the raw data. NoSQL fully utilizes the cloud resources thus reducing the downtime that might be experienced by the organizations. The Relational Database Management System is doesn’t fully utilize the cloud resources thus they are impacted by downtimes (Ali, Shafique, Raza, & Majeed, 2019). Downtimes in organization negatively affect the business processes of organizations. It results in delays of business operations thus negatively impacts the customer satisfaction level.
References
Ali, W., Shafique, M. U.,Raza, A., & Majeed, M. A. (2019). Comparison between SQL and NoSQL Databases and Their Relationship with Big Data Analytics. Asian Journal of Computer Science and Information Technology 4(2) , 1 - 10.
Mukherjee, S. (2020). The Battle between NoSQL Databases and RDBMS. SSRN Journal, Retrieved from https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3393986 , 1 - 7.
Poojary, D., Poriya, A., & Nayak, A. (2015). Type of nosql databases and its comparison with relational databases. International Journal of Applied Information Systems , 16 - 19.
Teodoro, D., Wei-Kleiner, F., Sundvall, E., Karlsson, D., & Lambrix, P. (2016). Comparing the Performance of NoSQL Approaches for Managing Archetype-Based Electronic Health Record Data. Plos One , 1 - 16.

15 October, 2020

Write a loop that subtracts 1 from each element in lowerScores.




1. Write a loop that subtracts 1 from each element inlowerScores. If the element was already 0 or negative, assign 0 tothe element. Ex: lowerScores = {5, 0, 2, -3} becomes {4, 0, 1,0}.

what i am given

import java.util.Scanner;
public class StudentScores {
public static void main (String [] args) {
Scanner scnr = new Scanner(System.in);
final int SCORES_SIZE = 4;
int[] lowerScores = new int[SCORES_SIZE];
int i;

for (i = 0; i < lowerScores.length; ++i) {
lowerScores[i] = scnr.nextInt();
}

/* Your solution goes here */

for (i = 0; i < lowerScores.length; ++i) {
System.out.print(lowerScores[i] + " ");
}
System.out.println();
}
}

2.

Write a loop that sets newScores to oldScores shifted once left,with element 0 copied to the end. Ex: If oldScores = {10, 20, 30,40}, then newScores = {20, 30, 40, 10}.

Note: These activities may test code with different test values.This activity will perform two tests, both with a 4-element array(int oldScores[4]). See "How to Use zyBooks".

Also note: If the submitted code tries to access an invalid arrayelement, such as newScores[9] for a 4-element array, the test maygenerate strange results. Or the test may crash and report "Programend never reached", in which case the system doesn't print the testcase that caused the reported message.
what i am give.

import java.util.Scanner;
public class StudentScores {
public static void main (String [] args) {
Scanner scnr = new Scanner(System.in);
final int SCORES_SIZE = 4;
int[] oldScores = new int[SCORES_SIZE];
int[] newScores = new int[SCORES_SIZE];
int i;

for (i = 0; i < oldScores.length; ++i) {
oldScores[i] = scnr.nextInt();
}

/* Your solution goes here */

for (i = 0; i < newScores.length; ++i) {
System.out.print(newScores[i] + " ");
}
System.out.println();
}
}

3. Write a loop that sets each array element to the sum ofitself and the next element, except for the last element whichstays the same. Be careful not to index beyond the last element.Ex:

Initial scores: 10, 20, 30, 40Scores after the loop: 30, 50, 70, 40

The first element is 30 or 10 + 20, the second element is 50 or20 + 30, and the third element is 70 or 30 + 40. The last elementremains the same.

what i am given

import java.util.Scanner;
public class StudentScores {
public static void main (String [] args) {
Scanner scnr = new Scanner(System.in);
final int SCORES_SIZE = 4;
int[] bonusScores = new int[SCORES_SIZE];
int i;

for (i = 0; i < bonusScores.length; ++i) {
bonusScores[i] = scnr.nextInt();
}

/* Your solution goes here */

for (i = 0; i < bonusScores.length; ++i) {
System.out.print(bonusScores[i] + " ");
}
System.out.println();
}
}

27 May, 2020

University of South Australia 
School of ITMS 
INFS 1021: Systems Analysis (SA) A3 Assignment – 
Group Length: 2000 words,  Weighting: 25%,  Due Date: 11 pm Sunday 31st May


INSTRUCTIONS TO STUDENTS
This task is to be done in groups which must have been approved by your tutor. Assignments will be returned to you within two to three weeks of submission. Feedback on this assignment will be provided via a rubric which will be available on the course website. Please read the assessment summary and assessment details sections of your course outline booklet carefully for further information relating to assessment in this course.
SPECIFICATIONS 
This assignment will enable you to improve your skills as a systems analyst and carry out various activities in the systems analysis phase of the systems development life cycle. It requires you to investigate and document system requirements, identify and document use cases, carry out domain modelling and use case modelling.
The assignment should be created using the template provided on the course website.
Each group member is also required to fill out the “Declaration of Contribution” form which is available in the assignment template document. As stated in the form, if any contribution does not meet the assessment requirements, the course coordinator may adjust individual marks up or down, depending on the level of contribution made.
Marking Criteria 
The rubric that will be used when assessing your work will be available on the course website. The word limit will not be checked for this assessment. This task will assess completeness of the objectives listed below.
• Completeness, suitability and thoroughness of responses to the questions
• Technical correctness of the various models created;
• Adequate presentation and format;
• Use of the template provided;
• Correct spelling and grammar;
• Clarity of expression;
• Clearly labelled questions and answers;
2 Submission Instructions 
The assignment MUST be submitted via Learnonline through the course website or via MyUniSA. Please submit ONLY ONE assignment per group – nominate someone to submit the assignment on the group’s behalf. Include as part of your submission ONE .pdf document containing the responses to the questions relating to the given scenario. Individual submissions will NOT be considered or marked. Refer to your course outline for further information regarding extensions. Late submissions will not be accepted for this course unless an extension has been approved by the course coordinator (see section on extensions in your course outline for further details). Late submissions that have not been approved will receive a mark of zero.
***IMPORTANT INFORMATION***
It is up to each group to make sure that the submitted work does not contain any parts copied from another group in this or any previous year, from this or any similar course; or from a common source such as a textbook or website. The assignment must be your own collective work, and not contracted to a substitute person. If we are suspicious, we reserve the right to call you in and to test your understanding of what you have submitted in an oral examination. If plagiarism is detected it will be investigated and appropriate consequences will follow.
SCENARIO
A Medical Practice has tasked your team to develop the requirements for their medical practice management system. The system is to allow the practice to record patient appointments and allow doctors to capture the medical history of their patients. A patient’s primary doctor can access the details of their patients at any time. However if a patient temporarily sees a different doctor, that doctor can only access the patient’s medical records during the consultation. The system must record the times when doctors are available during business hours from Monday to Friday, 8 am to 9 pm, and Saturday and Sunday from 9 am to 4 pm. The system must be available with 98% certainty during these business hours. Response times for accessing a patient’s record, which can be a maximum of 100 megabytes, must not exceed 7 seconds. At times doctors will also be able to be rostered as a locum service in shifts from 6am to 2pm, 2 pm to 10pm or 10pm to 6am. A doctor may not be rostered on for more than 40 hours per week and if rostered as a locum shift may not take another shift without a 14 hour break. The medical practice has ten doctors. Only one doctor will be rostered on a locum shift at a time. During business hours two nurses must also be rostered on and again may not work more than 40 hours per week. A doctor consults with patients in their office and must be able to access the system from their office. The history of all updates is logged and the system is backed up daily. The system logs a person out after ten minutes without use. Doctors on the locum roster are to be provided with a tablet device with mobile phone and global positioning system connections. The application on the device is to contain an emergency call button that contacts police in the event of an emergency and transmits the address that the doctor is consulting at. The tablet device application is to be able to connect with the patient records stored at the main medical practice but only show the record of the patient that the doctor is consulting with at that time. Doctors consult with patients in either fifteen minute or 30 minute blocks. In addition to a patient’s medical history, a patient’s record must include their name, address, date of birth, emergency contacts, any current medications, a photograph of their face and key details such as hair colour, eye colour, height, weight and gender. Associated with a patient but stored separately is their financial records – bills incurred, outstanding and medical insurance details. The reception staff must be able to manage appointments in person or on the phone for new and existing patients. After an appointment, the reception staff must be able to call up the billing information for a patient based on what services a doctor has provided, accept cash or electronic payment and print out invoices for the patient or their parent, guardian or carer. Note that a patient without private medical insurance receives a Medicare rebate on their services and a patient with private medical insurance receives an additional rebate but there typically is some gap fee that remains to be paid. The business manager for the Practice must also be able to access the system for basic accounting activities, transferring the data into a standard accounting package for analysis. The business manager must be able to identify the total Medicare rebate as well as the Private Insurance company rebates and invoice accordingly. The business manager must also be able to identify the costs of the business associated with the time sheets that doctors complete each week. Doctors enter their time sheets via their office computer. Finally doctors must be able to access the Internet from their terminals and as well use the specialist Pharmacy application provided by AAA Pharmacy Pty Ltd. Each doctor must be able to print scripts for patients from their office or the surgery, the receptionists print invoices for patients and the business manager prints report information. It is imperative that patient privacy is strictly respected so there must be suitable security measures in place to ensure this. The Practice has one IT administrator that can perform general IT duties for the terminals and servers but must not be able to access patient records without an additional password entered by the most senior doctor of the practice. The system is expected to handle approximately 300 patients.
QUESTIONS 1. 
SYSTEM REQUIREMENTS
 a. Who are the stakeholders for the system? Provide your response using the grid format and/or template given in the previous assignment. For each stakeholder identified, list what aspects of the system are of particular interest to them. Express stakeholders’ interests using the following template: “As a , I want to so that .” 
b. To collect information on the functional requirements for the system, what are some techniques that might be used? Identify what information would need to be obtained through interviews. Who would you interview? Include ten sample interview questions you would ask to obtain the required information. Ensure you obtain sufficient information to define use cases and create models. c. What are the primary functional requirements for the system? Provide your response using the format specified in lectures and tutorials. d. Identify and describe non-functional requirements for the system. Provide your response using the format specified in lectures and tutorials.

2. USE CASES 
a. Identify all the actors who will be using the system. 
b. Prepare a table containing all use cases and a brief use case description (1-2 sentences) for each use case. 
c. Draw a use case diagram for the system representing the actors and use cases identified. 

3. DOMAIN MODELLING 
a. List the domain classes for the system and their attributes. 
b. Based on the domain classes identified, develop a domain model class diagram showing domain classes with attributes, primary keys, relationships, and multiplicity constraints. 
c. Associations are the naturally occurring relationships between classes. They apply in two directions and can be read separately each way. For example two classes called Customer and Order could have the following associations: 
• A Customer can place zero or more Orders. 
• An Order must be placed by exactly one Customer. List all of the associations (in both directions) between the classes for the domain model in part b using the format specified above. 
4. USE CASE MODELLING 
a. Write a fully developed use case description for the use case Book Appointment. Assume the patient is new and making the appointment in person at the Practice. 
b. Develop an activity diagram for the use case in part a. 
c. Prepare a system sequence diagram for the use case in part a. 
d. Based on the domain model and the list of use cases developed, do a CRUD analysis for each of the identified classes. Provide your response in the CRUD table format given in Figure 5-13 of the textbook, pg 147, which shows use cases down the left and domain classes across the top of the table. 
5 Important Notes: 
• Justify and document any decisions and assumptions made. 
• You must use the templates/layouts provided in the course notes and/or textbook for any answers, tables, models, diagrams or use case descriptions created. Diagrams must conform to the UML notation used in this course. 
• Diagrams may be produced using a UML case tool or hand-drawn provided that they are legible and well laid out. It is the group’s responsibility to ensure these are appropriately presented and incorporated into the document template. 
• DO NOT include the scenario or question text in the document template. Use the template document as is. Do not modify any headings and include your responses in the area indicated. 
• Post any questions in relation to the scenario or assignment on the SA A3 Assignment – Group Q&A Forum on the course website.

22 May, 2020

Test of Association Assignment


Name________________________________
This assignment is worth 100 points. Each problem/part is worth 5 points each.
Due Date: ThursdayMay 21, 2020 by 11:59p.m. EST.
You do not have to type your responses. If you wish to provide hand-written work and scan your document, that is fine. Upload your final document with complete responses to the ‘Test of Association Assignment’ in Canvas.

1.       Some students in the masters of analytics program at Harrisburg University claim they are really good at soccer. To see whether there is an association among students and faculty in terms of the goals they can score in a soccer game, the analytics faculty invited students and teaching assistants (Ph.D. students) to play in soccer matches.  Every player was allowed to play in only one match. Over many matches, we counted the number of players who scored goals.

a.       The soccer.txt file is attached to the ‘Test of Association Assignment’ in Canvas. Import this file into R. Copy and paste your code below that shows where you imported the data. Name the data frame that you create ‘soccer.’
library(readr)
soccer <- read_csv("kochu/soccer.txt")        
b.      When you examine the data frame you imported into R, you will notice that it is not in the appropriate format for performing a chi-square test. Using some of the R categorical functions that you have learned in the course, convert the soccer data frame that you imported in part a in to a table such that the Job variable is on the rows and the Score variable is on the columns. Name the table you create ‘soccer_t’. Copy and paste your code below that created the table.
soccer_t <- read.delim("soccer.txt")
Table create below

c.       Develop the appropriate null and alternative hypothesis for testing an association among these variables. State the null and alternative hypothesis in the context of this problem.

H0: There is no relationship between occupation with the frequency of scoring in a soccer match.
H1: There is a relationship between occupation with the frequency of scoring in a soccer match.


d.      Run the chi-square test in R using the assocstats() function. Copy and paste your code and the output below. What is the value of the chi-square test statistic? What is the p-value related to the chi-square test statistic?
tab<- table(job,freq)
summary(assocstats(tab))



Chi-square test statistic is 12
The p-value is 0.2851
e.      Compute the χ2 test statistic that you obtained from the R output in part d.  You should show your computations. Show the computations of the expected counts. Provide a final table that includes the observed counts with the expected counts in parenthesis for each cell. The final table should have row and column margin totals and a grand total.
summary(chisq.test(tab))


f.        Does the test statistic you found by hand in part e match the χ2 test statistic from the assocstats() output in part d? State Yes or No.

No


g.       Use the pchisq() function in R to find the p-value associated with the χ2 test statistic. Copy and paste your code below.  Does the p-value you found with the pchisq() function match the p-value from the assocstats() output in part d? State Yes or No.
YES
1-pchisq(12, 10, ncp = 0, lower.tail = TRUE, log.p = FALSE)
P value is 0.2850565 to four decimal places is 0.2851 
hence it is similar to the p-value using assocstats ()


h.      Using the p-value for this test, do you reject or not reject the null hypothesis?

Accept the null hypothesis


i.        State a conclusion back in terms of the context of the problem.

The null hypothesis is accepted because the p-value is greater than 0.05.









12 March, 2020

Final Research

Write a brief research report (up to about 7 pages, not including title page, abstract, and references), based on an analysis of the data file. Choose a hypothesis, cite at least three references to justify your hypothesis, test your hypothesis with an analysis of the DATA540.SAV file, and then report and discuss the results. Your results should include both descriptive and inferential statistics. 
 
Use APA format, including an introduction, abstract, method, results, and discussion section. Please refer to the APA manual: American Psychological Association   (2010).   Publication manual of the American Psychological Association.   (6th ed.).   Washington, D.C.