Search Results
737 results found with an empty search
- Classification:Basic Concepts,Decision Trees, and Model Evaluation.
Classification, which is the task of assigning objects to one of several predefined categories, is a pervasive problem that encompasses many diverse applications. Examples include detecting spam email messages based upon the message header and content, categorizing cells as malignant or benign based upon the results of MRI scans, and classifying galaxies based upon their shapes. This chapter introduces the basic concepts of classification, describes some of the key issues such as model overfitting, and presents methods for evaluating and comparing the performance of a classification technique. While it focuses mainly on a technique known as decision tree induction, most of the discussion in this chapter is also applicable to other classification techniques, many of which are covered in Chapter 5. To download full research paper click on the link below. If you need implementation of this research paper or any of its variants, feel free contact us on contact@codersarts.com.
- Research Paper Implementation : Towards Effective Recommender Systems.
Abstract User modeling and recommender systems are often seen as key success factors for companies such as Google, Amazon, and Netflix. However, while user-modeling and recommender systems successfully utilize items like emails, news, social tags, and movies, they widely neglect mind-maps as a source for user modeling. We consider this a serious shortcoming since we assume user modeling based on mind maps to be equally effective as user modeling based on other items. Hence, millions of mind-mapping users could benefit from user-modeling applications such as recommender systems. The objective of this doctoral thesis is to develop an effective user-modeling approach based on mind maps. To achieve this objective, we integrate a research-paper recommender system in our mind-mapping and reference-management software Docear. The recommender system builds user models based on the users' mind maps, and recommends research papers based on the user models. As part of our research, we identify several variables relating to mind-map-based user modeling, and evaluate the variables' impact on user-modeling effectiveness with an offline evaluation, a user study, and an online evaluation based on 430,893 recommendations displayed to 4,700 users. We find, among others, that the number of analyzed nodes, the time when nodes were modified, the visibility of nodes, the relations between nodes, and the number of children and siblings of a node affect the effectiveness of user modeling. When all variables are combined in a favorable way, this novel user-modeling approach achieves click-through rates of 7.20%, which is nearly twice as effective as the best baseline. In addition, we show that user modeling based on mind maps performs about as well as user modeling based on other items, namely the research articles users downloaded or cited. Our findings let us to conclude that user modeling based on mind maps is a promising research field, and that developers of mind-mapping applications should integrate recommender systems into their applications. Such systems could create additional value for millions of mindmapping users. As part of our research, we also address the question of how to evaluate recommender systems adequately. This question is highly discussed in the recommender-system community, and we provide some new results and arguments. Among others, we show that offline evaluations often cannot predict results of online evaluations and user studies in the field of research-paper recommender systems. We also show that click-through rate and user rating correlate well (r=0.78). We discuss these findings, including some inherent problems of offline evaluations, and conclude that offline evaluations are probably unsuitable for evaluating research-paper recommender systems, while both user studies and online evaluations are adequate evaluation methods. We also introduce a new weighting scheme, TF-IDuF, which could be relevant for recommender systems in general. In addition, we are first to compare the weighting scheme CC-IDF against CC only, and we research concept drift in the context of researchpaper recommender systems, with the result that interests of researchers seem to shift after about four months. Last, but not least, we publish the architecture of Docear’s recommender system, as well as four datasets relating to the users, recommendations, and document corpus of Docear and its recommender systemAbstract User modeling and recommender systems are often seen as key success factors for companies such as Google, Amazon, and Netflix. However, while user-modeling and recommender systems successfully utilize items like emails, news, social tags, and movies, they widely neglect mind-maps as a source for user modeling. We consider this a serious shortcoming since we assume user modeling based on mind maps to be equally effective as user modeling based on other items. Hence, millions of mind-mapping users could benefit from user-modeling applications such as recommender systems. The objective of this doctoral thesis is to develop an effective user-modeling approach based on mind maps. To achieve this objective, we integrate a research-paper recommender system in our mind-mapping and reference-management software Docear. The recommender system builds user models based on the users' mind maps, and recommends research papers based on the user models. As part of our research, we identify several variables relating to mind-map-based user modeling, and evaluate the variables' impact on user-modeling effectiveness with an offline evaluation, a user study, and an online evaluation based on 430,893 recommendations displayed to 4,700 users. We find, among others, that the number of analyzed nodes, the time when nodes were modified, the visibility of nodes, the relations between nodes, and the number of children and siblings of a node affect the effectiveness of user modeling. When all variables are combined in a favorable way, this novel user-modeling approach achieves click-through rates of 7.20%, which is nearly twice as effective as the best baseline. In addition, we show that user modeling based on mind maps performs about as well as user modeling based on other items, namely the research articles users downloaded or cited. Our findings let us to conclude that user modeling based on mind maps is a promising research field, and that developers of mind-mapping applications should integrate recommender systems into their applications. Such systems could create additional value for millions of mindmapping users. As part of our research, we also address the question of how to evaluate recommender systems adequately. This question is highly discussed in the recommender-system community, and we provide some new results and arguments. Among others, we show that offline evaluations often cannot predict results of online evaluations and user studies in the field of research-paper recommender systems. We also show that click-through rate and user rating correlate well (r=0.78). We discuss these findings, including some inherent problems of offline evaluations, and conclude that offline evaluations are probably unsuitable for evaluating research-paper recommender systems, while both user studies and online evaluations are adequate evaluation methods. We also introduce a new weighting scheme, TF-IDuF, which could be relevant for recommender systems in general. In addition, we are first to compare the weighting scheme CC-IDF against CC only, and we research concept drift in the context of researchpaper recommender systems, with the result that interests of researchers seem to shift after about four months. Last, but not least, we publish the architecture of Docear’s recommender system, as well as four datasets relating to the users, recommendations, and document corpus of Docear and its recommender system To download full research paper click on the link below. If you need implementation of this research paper or any of its variants, feel free contact us on contact@codersarts.com.
- Analyzing, Parsing, Pre-Processing, extracting Semistructured Textual Data
Task 1: Parsing Text Files This assessment touches the very first step of analyzing textual data, i.e., extracting data from semi-structured text files. Each student is provided with a data-set that contains information about COVID-19 related tweets (please find your own directory “part1” from here). Each text file contains information about the tweets, i.e., “id”, “text”, and “created_at” attributes. Your task is to extract the data and transform the data into the XML format with the following elements: id: is a 19-digit number. text: is the actual tweet. Created_at: is the date and time that the tweet was created The XML file must be in the same structure as the sample folder. Please note that, as we are dealing with large datasets, the manual checking of outputs is impossible and output files would be processed and marked automatically therefore, any deviation from the XML structure (i.e. sample.xml) and any deviation from this structure (e.g. wrong key names which can be caused by different spelling, different upper/lower case, etc., wrong hierarchy, not handling the XML special characters,...) will result in receiving zero for the output mark as the marking script would fail to load your file. (hint: run your code on the provided example and make sure that your code results in the exact same output as the sample output. You can also use the “xmltodict” package to make sure that your XML is loadable). Beside the XML structure, the following constraints must also be satisfied: The “id”s must be unique, so if there are multiple instances of the same tweets, you must only keep one of them in your final XML file. The non-english tweets should be filtered out from the dataset and the final XML should only contain the tweets in the English language. For the sake of consistency, you must use the langid package to classify the language of a tweet. The re, os, and the langid packages in Python are the only packages that you are allowed to use for the task 1 of this assessment (e.g., “pandas” is not allowed!). Any other packages that you need to “import” before usage is not allowed The output and the documentation will be marked separated in this task, and each carries its own mark. Output: See sample.xml for detailed information about the output structure. The following must be performed to complete the assessment. Designing efficient regular expressions in order to extract the data from your dataset and submitting the extracted data into an XML file, .xml following the format of sample.xml Explaining your code and your methodology in task1_.ipynb A pdf file, “task1_.pdf ”. You can first clean all the output in the jupyter notebook task1_.ipynb and then export it as a pdf file. This pdf will be passed to Turnitin for plagiarism check. Methodology The report should demonstrate the methodology (including all steps) to achieve the correct results. Documentation The solution to get the output must be explained in a well-formatted report (with appropriate sections and subsections). Please remember that the report must explain both the obtained results and the approach to produce those results. You need to explain both the designed regular expression and the approach that you have taken in order to design such an expression. Task 2: Text Pre-Processing This assessment touches on the next step of analyzing textual data, i.e., converting the extracted data into a proper format. In this assessment, you are required to write Python code to preprocess a set of tweets and convert them into numerical representations (which are suitable for input into recommender-systems/ information-retrieval algorithms). The data-set that we provide contains 80+ days of COVID-19 related tweets (from late March to mid July 2020). Please find your .xlsx file from the folder “part2” from this link. The excel file contains 80+ sheets where each sheet contains 2000 tweets. Your task is to extract and transform the information of the excel file performing the following task: Generate the corpus vocabulary with the same structure as sample_vocab.txt. Please note that the vocabulary must be sorted alphabetically. For each day (i.e., sheet in your excel file), calculate the top 100 frequent unigram and top-100 frequent bigrams according to the structure of the sample_100uni.txt and sample_100bi.txt. If you have less than 100 bigrams for a particular day, just include the top-n bigrams for that day (n<100). Generate the sparse representation (i.e., doc-term matrix) of the excel file according to the structure of the sample_countVec.txt Please note that the following steps must be performed (not necessarily in the same order) to complete the assessment Using the “langid” package, only keeps the tweets that are in English language. The word tokenization must use the following regular expression, "[a-zA-Z]+(?:[-'][a-zA-Z]+)?" The context-independent and context-dependent (with the threshold set to more than 60 days) stop words must be removed from the vocab. The provided context-independent stop words list (i.e, stopwords_en.txt) must be used. Tokens should be stemmed using the Porter stemmer. Rare tokens (with the threshold set to less than 5 days) must be removed from the vocab. Creating sparse matrix using countvectorizer. Tokens with the length less than 3 should be removed from the vocab. First 200 meaningful bigrams (i.e., collocations) must be included in the vocab using PMI measure. Please note that you are allowed to use any Python packages as you see fit to complete the task 2 of this assessment. The output and the documentation will be marked separately in this task, and each carries its own mark. Output: The output of this task must contain the following files: task2_.ipynb which contains your report explaining the code and the methodology A pdf file, “task2_.pdf ”. You can first clean all the output in the jupyter notebook task2_.ipynb and then export it as a pdf file. This pdf will be passed to Turnitin for plagiarism check. _vocab.txt: It contains the bigrams and unigrams tokens in the following format of sample_vocab.txt. Words in the vocabulary must be sorted in alphabetical order. _countVec.txt: Each line in the txt file contains the sparse representations of one day of the tweet data in the format of sample_countVec.txt _100uni.txt and _100bi.txt : Each line in the txt file contains the top 100 most frequent uni/bigrams of one day of the tweet data in the format of sample_100uni.txt and sample_100bi.txt Similar to task 1, in task 2, any deviation from the sample output structures may result in receiving zero for the output. So please be careful. Methodology The report should demonstrate the methodology (including all steps) to achieve the correct results. Documentation The solution to get the output must be explained in a well-formatted report (with appropriate sections and subsections). Please remember that the report must explain both the obtained results and the approach to produce those results. Note: all submissions will be put through a plagiarism detection software which automatically checks for their similarity with respect to other submissions. Any plagiarism found will trigger the Faculty’s relevant procedures and may result in severe penalties, up to and including exclusion from the university. Send your assignment or project details at below contact details if you need any help in machine learning, python: contact@codersarts.com
- DNA Outbreak Investigation Using Machine Learning
You are given a data set consisting of DNA sequences (the file is available here) of the same length. Each DNA sequence is a string of characters from the alphabet ‘A’,’C’,’T’,’G’, and it represents a particular viral strain sampled from an infected individual. Your goal is to write a code that helps to identify transmission clusters corresponding to outbreaks. The sequences should be considered as feature vectors and characters - as features. The data set is stored as a fasta file, which is essentially a text file that has the following form: >Name of Sequence1 AAGCACAGGATGTAATGGTGGGGCCGACCGCCTATTATTCTGATGATTACTTGAGGCCCTCGGAGAGGAAGGGG >Name of Sequence2 AAGCACAGGATGTAATGGTGGGGCCGACCGCCTATTATTCTGATGATTACTTGAGGCCCTCGGAGAGGAAGGGG >Name of Sequence3 AAGCACAGGATGTAATGGTGGGGCCGACCGCCTATTATTCTGATGATTACTTGAGGCCCTCGGAGAGGAAGGGG ….. Here each line starting with ‘>’ symbol contains the name of a sequence followed by the sequence itself in the next line. You may proceed as follows: 1) Read sequences from the file. 2) Calculate pairwise distances between sequences. Use Hamming distance: it is the number of positions at which the sequences are different (see https://en.wikipedia.org/wiki/Hamming_distance) 3) Project the sequences in 2-D space using Multidimensional Scaling (MDS) based on Hamming distance matrix. 4) Plot the obtained 2-D data points. Estimate the number of clusters K by visual inspection. 5) Use k-means algorithm to cluster the 2-D data points. You may use library functions to read data from the file and perform MDS. For multidimensional scaling in python, see e.g. https://scikit-learn.org/stable/modules/generated/sklearn.manifold.MDS.html K-means clustering should be implemented from scratch. Your submission should contain: The code of your script Visualization plots for MDS with different clusters highlighted in different colors. Please do not hesitate to ask questions. Contact us to get instant help: contact@codersarts.com
- How To Connect With Anydesk
Anydesk is light weight client-server software which is used share and edit screen to other. First need to download it from official website: https://anydesk.com/en It look like that: click on "Free Download" one AnyDesk.exe file is download. Now click on this .exe file: You have see 9 digit id(This Desk), which is your system id. Now you can connect with any other machine(Remote Desk) then you will need to enter another machine(Remote desk) id at the black Remote id field and click on connect button. Now request send to Remote Desk and user of Remote desk need to accept request, when he accept request you can see remote machine in your screen and you can also edit anything in remote machine. If you look or need help in any tools or software related any help then you can contact us at given below contact id: contact@codersarts.com
- Chatbot
What is a Chatbot? A chatbot is a computer program that allows a machine to imitate human conversation through text, speech, touch or gesture. Different chatbots have varying degree of intelligence. A basic chatbot can be a solution to answering FAQs, while chatbots built using some current bot frameworks might offer more services like placing order, time slotting, making simple transactions etc. But the AI chatbots steal the limelight as they are the ones that have the intelligence and capability that can deliver trailblazing services that various industries are looking for. NOTE: In this blog the words bots and chatbots are used interchangeably. Historic Outline Before we move further with chatbots let us take a look at its history. Though chatbots have gained popularity recently, the idea of chatbots are as old as computing itself. Chatbot is a short form for the term “chatterbot,” which was coined by inventor Michael Loren Mauldin in 1994. He developed the prototype of the chatterbot "Julia" in 1994. This prototype version was refined and developed in 1997 and a stand-alone virtual person called Sylvie, was beta-tested to the public. It was well received by public and after several versions, the Verbally Enhanced Software Robot—or, Verbot was deployed in the year 2000. You would be surprised to know that the first ever chatbot was created when the term chatterbot was not even coined. Yes, the very first chatbot was ELIZA created by Joseph Weizenbaum at MIT in 1966. ELIZA was designed to imitate a therapist. To Weizenbaum's surprise many people who got to interact with ELIZA (including his own assistant) developed feelings for it, so much so that they refused to believe it wasn't a machine. ELIZA is considered to be the first program to pass the Turing test ( a test of machine's ability to portray intelligent behaviour similar to humans). It laid the foundation for modern day chatbots. After ELIZA a number of chatbots have been created some of the more prominent ones are: PARRY (1972), created by Kenneth Colby Jabberwacky (1988) created by Rollo Carpenter, the chatbot was designed to “simulate natural human chat in an interesting, entertaining and humorous manner”, or to simply act like Jabberwacky from the book "Alice in wonderland", A.L.I.C.E. (Artificial Linguistic Internet Computer Entity) (1995), Elbot (2000) created by Fred Roberts and Artificial Solutions, it uses sarcasm, witty remarks and irony to entertain humans. Smarterchild (2001) created by Robert Hoffer, Timothy Kay and Peter Levitan, it was considered a precursor to Apple’s Siri and Samsung’s S Voice. Mitsuku (2005) created by Steve Worswick, it impersonates a teenage female from England and can play games and do magic as well. More recent and developed chatbots which we are familiar with are IBM Watson (2006), Siri (2010), Google Now (2012), Alexa (2015), Cortana (2015) etc. Why are chatbots important? In recent years with the advent of technology, a large number of services are available to everyone globally. This makes the businesses data heavy. Keeping track of such a large database and providing the relevant solution in time is not humanly possible. Walking the way through complicated menus isn’t the fast and effortless user experience needs to be delivered by businesses today. Also, consumers don't want to be restricted in availing services due to the limitations of an organisation. They want to have an interface with technology across a wide number of channels. This is where chatbots come in. They can perform several tasks quickly and more efficiently than their human counterparts. For example checking the weather, ordering a pizza or hiring cabs can be done more efficiently with chatbots. Chatbots lets customers to simply ask for whatever they need, across multiple channels, wherever they are, night or day. Moreover, businesses can use chatbots to automate tasks such as inventory ordering and management. They can be used to provide enhanced customer services. How do they work? Chatbots work by analysing the intent of a user's request to provide relevant solutions. If voice is used instead of a text to communicate with a chatbot, it first converts the voice input into text form using Automatic Speech Recognition (ASR) technology and then after processing the text it delivers the solution. The solution can be in any form: text, voice ( by using Text To Speech (TTS) tools, gesture or it can be indicated by completion of a task. We all know that machines can only understand binary language i.e. combinations of 1 and 0. Therefore, to interpret the texts in the right sense the chatbots make use of several methods of classification which are as follows: Natural Language Processing (NLP) It is a branch of artificial intelligence that helps chatbot to understand, interpret and manipulate human language. It converts the user input into sentences and words. It also processes the text through a series of techniques, for example, converting it all to lowercase or correcting spelling mistakes before determining if the word is an adjective or verb. Natural Language Processing (NLP) comprises of the below steps: Tokenization –The NLP filters set of words in the form of tokens. Sentiment Analysis –The bot interprets the user responses to align with their emotions. Normalization –It checks the typo errors that can alter the meaning of the user query. Entity Recognition –The bot looks for different categories of information required. Dependency Parsing –The chatbot searches for common phrases that users want to convey. Natural Language Understanding (NLU) It is a subtopic of natural-language processing in artificial intelligence that deals with machine reading comprehension. It helps the chatbot to understand what the user is trying to say using language objects such as lexicons, synonyms and themes. These objects are used in algorithms to produce dialogue flows that tell the chatbot how to respond. NLU is the process of converting the input text into structured data that can be worked upon by the machines to produce results. It follows three main concepts: entities, context, and expectations. Entities – represents the sub units of a request which may contain key information. Common examples of entities include locations, names of organisations, and prices. Context – when a natural language understanding algorithm identifies the request and it has no historical backdrop of conversation, it will not be able to recall the request to give the response. Therefore, an algorithm components designed to learn from sequences is used to provide a context. Sentences are sequences in the sense that order matters and that each word is used in the context of the other words. Thus, understanding a sentence properly involves understanding how each word relates to others. Expectations – chatbot must be able to fulfil the customer expectations when they make a request or ask a query customer say sends an inquiry. Natural Language Generation (NLG) It is a sub unit of artificial intelligence. It is a software process that automatically transforms data into plain-English content. It enables the chatbot to analyse data repositories, including integrated back-end systems and third-party databases, to use that information in creating a meaningful response. . Types of Chatbots There are a lot of chatbots around these days. They come in different shapes and sizes and can serve a variety of purposes. They can be broadly classified into two categories: Linguistic Based (Rule-Based) Chatbots These types of chatbots are created based on a certain prewritten set of rules. These set of rules follows a basic 'if' and 'then' logic. These are the most common types of bots, which are widely used. These types of chatbots are used in cases where the questions and their answers are known in advance and to automate them to check the quality of a system based on several tests . These can be fine-tuned to serve a specific purpose. They can be programmed (using NLP) to analyse the order of words and synonyms in a question and to respond with the same answers to questions that carries similar meaning. The drawback of these types of chatbots is that they can be very rigid and works well only if the input is specific and corresponding to their set of rules. They are slow to develop and are highly labour-intensive. Also, they are not able to mimic human conversations well enough. One example of this type of chatbot can be seen on this website as well, in the bottom right corner of the screen where an icon says "we are here!" Machine learning (AI Chatbots) These types of chatbots make use of artificial intelligence and are more complex. They tend to produce response by analysing data and making predictions. They are more conversational, personalised, interactive and spontaneous. They are closer to mimicking human conversation, and with enough time and data they grow more aware and are able to understand the context of input sentences. They can make predictions to give a customized experience to users. It learns from patterns and past experiences. The drawback of such type of chatbots is that they require a tremendous amount of data and hours of training to perform even a simple task. It requires highly skilled people to work on such bots. In case something goes wrong with the model it can be very troublesome to rectify it. These are not cost-effective and thus are not relevant to many industries. There is also a rise in Hybrid chatbots that use both the linguistic and the machine learning approach to overcome the drawbacks of both. Apart from the above mentioned categories we can also classify chatbots based on functionality and usage, let us discuss them: Menu/Button based chatbot: It is one of the simplest form of chatbots. It has some predefined options available. Users just have to choose from the available options. It is quite straightforward to use such chatbots. If your query is not present in the predefined options then the chatbot won't be able to help you. It is constrained to certain question-answers only. Keyword recognition based chatbot: It is more advanced than the type mentioned above as it utilises NLP to give a better service. When a user asks a question, the question is analysed using NLP, it is matched against keywords and a suitable response is delivered. They don't work well when a lot of similar questions are asked which causes keyword redundancy. Contextual Chatbot: These types of chatbots overcome the drawback of the previous mentioned types as they utilise artificial intelligence to find the context behind the questions instead of jumping to predetermined answers. It stores up unique searches from various users and will refer to this information to provide an apt response in future. In simple words these chatbots remembers previous conversations and provides answers keeping them in mind to offer a better service. They are smart and have the ability to self-improve. Voice-based chatbot: The name is self-explanatory. These types of chatbots take voice of the user as an input rather than typed inputs. Service chatbot: These types of chatbots are service oriented. They ask questions regarding the user's needs and provides necessary information. These are popular in service based industries for example airlines, customer support etc. Social messaging chatbot: These types of chatbots can be integrated with social media platforms like Facebook Messenger, Whatsapp, Telegram etc. They enable users to clarify their doubts specific to a social media platform. They help in reducing the efforts required by the users. The above mentioned types are just a few examples. The chatbots can also be classified in a lot of other categories. Moreover, when we are discussing the types of chatbots we should also take a look at one more set of categorization which is based on the ethical use of chatbots: the GOOD and the BAD chatbots. While there are so many positive aspects of having chatbots around there are also a few negative ones. The bots which are used for the benefit of mankind are termed as good bots. A few examples of it are: Chatbots, Crawlers, Transactional bots, Informational bots, Entertainment bots: Art bots, Game bots etc. Whereas the bots used for causing harm are termed as bad bots. A few examples of it are: Hackers, Spammers, Scrapers, Impersonators. As a matter of fact, we can't really say that the bots are bad because it is us humans that program the bots to act in a certain way. One of the main drawback that comes from the introduction of bots in the industries is that more people are losing their jobs. With the advancement of technology the day isn't far when we will have a companion like Jarvis from The Iron Man. For any guidance on above mentioned topics, feel free to contact us on contact@coders.com.
- FINANCE APP
INTRODUCTION : - A personal finance app is an app that you can download on your smartphone or tablet that helps you manage all aspects of your personal finances. These apps can help track your spending, saving, and investing. They can also track bill payments and keep you up to date on credit score changes. You can connect personal finance apps to your financial institution so you can see where the money from your bank account is being spent. A personal finance app makes it easy to keep track of your finances on the go. Typically, a personal finance app will have different features such as a shared wallet, bill reminders, auto bill pay, and even managing subscriptions. Goals Help people to have more interaction with their money. Provide a solution for users to set financial goals and achieve them by creating a platform that helps keep them keep track of their goals. Monthly Contribution Target Balance The purpose of this is to achieve goals easier. By setting goals with a monthly contribution you will have an automatic payment to that “bucket” every single month. When you set the payment to Target Balance it’s free, however, you can set deadlines by which date you want to achieve that. User Interview Here are the main pain points and insights which he gathered through those interviews. #1: The home page doesn’t provide much information When conducting the user interviews, we found out that most users use it in order to monitor their monthly expenses with the goal of saving more money. All of the users he interviewed wished that the home page would provide more insights and information about their spending patterns. Even the budget card only shows up to 4 budget categories, and users have to click to another screen to see an overview of their budget status. #2: Unable to customize spending categories Although it has a large range of default categories (35 in total), some users still prefer to personalize their transactions to their liking. #3: Unable to key in transfers from one account to another When the user creates a transfer transaction in an account, it only appears in that particular account. And the user would need to manually indicate whether it’s a debit or credit transaction. This would cause confusion as the users might find missing transactions afterwards if they forgot to do so. #4: Unable to create recurring transactions Some of the users suggested that the app could allow for recurring transactions. This applies to users who are manually adding in each transaction themselves, and will help them save a lot of time. Ideation Design Principles The overall approach for the redesign was to improve the user experience without radically changing an already popular and familiar platform. Some key principles were: Improving user experience by removing known pain points in the user journey Help users to reach their goal of saving more money Allow users to better understand their spending habits and patterns RESEARCH User research — Collaboratively created a survey and conducted interviews. The purpose of these questions is to gain insight into key data and see if there is available information that would answer the questions. There was no relevant information around the topics of interest so I went about creating a survey to gather some real-world data. Survey — I created a Google Form and send it out to some people in my email list that I have. The survey was filled by 197 people User story mapping — Collaboratively created a user story map. Wireframing — My partner worked on the on-boarding process, dashboard, and past goals page. While I worked on the goal creation process, current goals, and track my spending page. Low-Fidelity Prototype with Invision — Created a prototype using Invision for our user testing. User testing — Created a user test scenario. Individually found candidates to test. High-Fidelity design — My partner worked on the on-boarding pages. We both worked on the re-design of the dashboard. I worked on the goal creation process and the color palette of the design. Key changes and rationale: As most users want more information on their financial status, he thought it would be good to include their income & balance on the home page as well to give them a fast update on their financial health. The insight card aims to give smart pieces of advice and updates on the user’s spending patterns. It will also alert you if you are going over your budget. Removing the “latest transaction” card and replaced it with the same pie chart component that is shown on the reports section. This would allow users to have an overview of their spending breakdown. To design a recurring transaction option based on the users’ feedback. When creating recurring transactions, the transaction will only be created on the selected date (e.g when the transaction is created on 12 Feb, the recurring transaction will then be automatically created on 12 Mar). When editing a recurring transaction, users will have the option to edit the specific transaction or for all following transactions. Having a sub-category option where users are able to create their own personalized sub-categories. The sub-category will be tagged to the main category so that it wouldn’t affect the reporting. The sub-category will also be used as the description if the user left that field empty. This would help the user save time when creating repeated transactions. Conclusion There is no doubt that our team spent countless hours on the current app design. This project was made for all the people who want to have control of their financial life, but for this, there are different types of people and how each manages their own money. After all this process I could observe some flow improvements but overall the users gave me good feedback regarding the usability and desirability of the app. Hire Figma Experts for any kind of projects – urgent bug fixes, minor enhancement, full time and part time projects, If you need any type project hep, Our expert will help you start designing immediately. T H A N K Y O U ! !
- Display Form Content In Angular -Example 1
In this example we will discussing and learn about angular form content, and how to display form content without page refresh. First need to import "angular.min.js" in script. Example: Enter Text: Your Text Which You Entered: {{ text1 }} Output: Explanation: ng-app: This directory defines an AngularJS application. ng-model: he ng-model directive binds the value of HTML controls (input, select, textarea) to application data.
- Hello World in Angular
Angular is a TypeScript-based open-source web application framework led by the Angular Team at Google and by a community of individuals and corporations. Angular is a complete rewrite from the same team that built AngularJS. Ensure that you are not already in an Angular workspace folder. For example, if you have previously created the Getting Started workspace, change to the parent of that folder. Run the CLI command ng new and provide the name Hello-world, as shown here: Create the Project in Angular ng new Login app.component.html the component template, written in HTML. Hello World in Angular app.component.ts the component class code, written in TypeScript. import { Component } from '@angular/core'; @Component({ selector: 'app-root', templateUrl: './app.component.html', styleUrls: ['./app.component.css'] }) export class AppComponent { title = 'hello-world'; } app.module.ts the main parent component app.module.ts import { BrowserModule } from '@angular/platform-browser'; import { NgModule } from '@angular/core'; import { AppRoutingModule } from './app-routing.module'; import { AppComponent } from './app.component'; @NgModule({ declarations: [ AppComponent ], imports: [ BrowserModule, AppRoutingModule ], providers: [], bootstrap: [AppComponent] }) export class AppModule { } To Run the file The ng serve command launches the server, watches your files, and rebuilds the app as you make changes to those files. The --open (or just -o) option automatically opens your browser to http://localhost:4200/. ng server -o Output Contact Us Now for more
- Important Topics of Machine Learning
Answer the below topics 1: Understand the measures that are used to evaluate the results of classification, describe what these are: Confusion matrix Precision, Recall, Accuracy rate Precision-Recall Curve ROC curve Explain in simple terms the concept of n-fold cross validation Ans Confusion Matrix It is a performance measurement for machine learning classification problem. It is represented by N*N matrix. Where N is the number of target classes. TP: It called “true positive” FP: It called “false positive” FN: It called “false negative” TN: It called “True negative” Precision, Recall, Accuracy rate These are also the metrices for measurement for accuracy. By Using above confusion matrices we can easily find the these metrices easily with the help of below formulas: It can be represented by mathematical formula: Precision= True positive/(True positive + False positive) Recall = True positive/(True positive + False negative) Accuracy Rate = 2*((Precision * Recall)/(Precision + Recall)) Precision-Recall Curve & ROC curve Precision-Recall Curve These curves are recommended for highly skewed domains where ROC curves may provide an excessively optimistic view of the performance. This curve can be calculated in scikit-learn using the precision_recall_curve() function that takes the class labels and predicted probabilities for the minority class and returns the precision, recall, and thresholds. ROC curve An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds. This curve plots two parameters: True Positive Rate False Positive Rate True Positive Rate (TPR) is a synonym for recall and is therefore defined as follows: TPR = TP/(TP + FN) False Positive Rate (FPR) is defined as follows: FPR = FP/(FP + TN) Explain in simple terms the concept of n-fold cross validation Cross-validation is a technique to evaluate predictive models by partitioning the original sample into training and test set which is used for: A training set is used to train the model, And a test set to evaluate it. In k-fold cross-validation, the original sample is randomly partitioned into k equal size subsamples Steps which is used for this: Split your entire dataset into k”folds” For each k-fold, build your model on k – 1 folds of the dataset. Record the error you see on each of the predictions Repeat this until each of the k-folds has served as the test set The average of your k recorded errors is called the cross-validation error and will serve as your performance metric for the model Answer the below topics 2: Linear Regression What is the cost function for Linear Regression Polynomial Regression Describe how polynomial regression works based on linear regression. Ans: Linear Regression : Cost Function of Linear Regression Linear Regression is a machine learning algorithm based on supervised learning. It used to predicts a real-valued output based on an input value. Cost function(F) of Linear Regression is the Root Mean Squared Error (RMSE) between predicted y value (pred) and true y value (y). Where pred_i is predicted value and y_i is actual value Polynomial Regression Polynomial regression is a special case of linear regression where we fit a polynomial equation on the data with a curvilinear relationship between the target variable and the independent variables. Equation for Linear Regression: where, Y is the target, x is the predictor, 𝜃0 is the bias, and 𝜃1 is the weight in the regression equation This linear equation can be used to represent a linear relationship. But, in polynomial regression, we have a polynomial equation of degree n represented as: Equation for Polynomial Regression : Answer the below topics 3: Logistic Regression The formula that updates the weights of attributes for each iteration Softmax Regression What is the purpose of Softmax Regression? Given a softmax Regression model, please calculate the probability that the input attribute belongs to each class. Support Vector Machine Compare with logistic regression, what is the advantage of Support Vector Machine? Ans: Logistic Regression The formula that updates the weights of attributes for each iteration Logistic regression uses an equation as the representation, very much like linear regression. Input values (X) are combined linearly using weights or coefficient values to predict an output value (y). Formula: Softmax Regression Softmax Regression (synonyms: Multinomial Logistic, Maximum Entropy Classifier, or just Multi-class Logistic Regression) is a generalization of logistic regression that we can use for multi-class classification. In softmax regression (SMR), we replace the sigmoid logistic function by the so-called softmax function φ: where we define the net input z as Support Vector Machine Logistic regression and support vector machines are supervised machine learning algorithms. They are both used to solve classification problems. SVM tries to finds the “best” margin that separates the classes and this reduces the risk of error on the data, while logistic regression does not, instead it can have different decision boundaries with different weights that are near the optimal point. Advantages of Support Vector Machine (SVM) 1. Regularization capabilities: SVM has L2 Regularization feature. So, it has good generalization capabilities which prevent it from over-fitting. 2. Handles non-linear data efficiently: SVM can efficiently handle non-linear data using Kernel trick. 3. Solves both Classification and Regression problems: SVM can be used to solve both classification and regression problems. SVM is used for classification problems while SVR (Support Vector Regression) is used for regression problems. Answer the below topics 4: Decision Tree How a Decision Tree Model is trained? How to make predication on an new instance and how to calculate the prediction probability? What is Gini Impurity Measure? How to calculate Gini Impurity Measure? What is Regularization? What are the typical way to regularize a tree model Random Forest How a Random Forest is trained? Ans: Decision Tree How a Decision Tree Model is trained? Below some basic steps which is used(Step 1- Step 3) before train the decision tree: Step 1: Loading the Libraries and Dataset Example: import numpy as np import matplotlib.pyplot as plt from sklearn.metrics import f1_score from sklearn.model_selection import train_test_split # Importing dataset df=pd.read_csv('dataset.csv') df.head() Step 2: Data Preprocessing The most important part of Data Science is data preprocessing and feature engineering In this we will dealing with the categorical variables in the data and also imputing the missing values. Step 3: Creating Train and Test Sets In this we split the data set in train and test set for predicting the result by using selecting target variable Step 4: Building and Evaluating the Model(Train Model) By using both the training and testing sets, it’s time to train our models and classify data. First, we will train a decision tree on this dataset: Example: from sklearn.tree import DecisionTreeClassifier dt = DecisionTreeClassifier(criterion = 'entropy', random_state = 42) dt.fit(X_train, Y_train) dt_pred_train = dt.predict(X_train) What is Gini Impurity Measure? Gini Impurity measures the disorder of a set of elements. It is calculated as the probability of mislabeling an element assuming that the element is randomly labeled according the the distribution of all the classes in the set. Formula: Where p1, p2 are class 1 , 2 probabilities. How to calculate Gini Impurity Measure? Let suppose 3 apples, 3 bananas and 6 cherries are given then we will find the GI as per below mathematical example apples bananas cherries count = 3 3 6 p = 3/12 3/12 6/12 = 1/4 1/4 1/2 GI = 1 - [ (1/4)^2 + (1/4)^2 + (1/2)^2 ] = 1 - [ 1/16 + 1/16 + 1/4 ] = 1 - 6/16 = 10/16 = 0.625 What is Regularization? What are the typical way to regularize a tree model? Regularization? It is used to reduce the complexity of the regression function without actually reducing the degree of the underlying polynomial function. Or We can say it is attempt to solve the overfitting problem in statistical models. What are the typical way to regularize a tree model? There are several simple regularization methods: minimum number of points per cell: require that each cell (i.e., each leaf node) covers a given minimum number of training points. maximum number of cells: limit the maximum number of cells of the partition (i.e., leaf nodes). maximum depth: limit the maximum depth of the tree How a Random Forest is trained? Below example which is used to train the random forest in machine learning: Example: from sklearn.ensemble import RandomForestClassifier rfc = RandomForestClassifier(criterion = 'entropy', random_state = 42) rfc.fit(X_train, Y_train) # Evaluating on Training set rfc_pred_train = rfc.predict(X_train) print('Training Set Evaluation F1-Score=>',f1_score(Y_train,rfc_pred_train)) Contact us to get machine learning project help, machine learning assignment help, or other help related to python, Contact Us NOW
- Socket Programming In Python
Goal and learning objectives COVIDSafe is a digital contact tracing app announced by the Australian Government to help combat the ongoing COVID-19 pandemic, which has infected more than 6 million people and killed more than 373,000. The pandemic has also caused major society and economy disruptions around the world, and effective contact tracing is one of conditions that we can go back to ``normal’’ life before the development of a COVID-19 vaccine. The app is based on the BlueTrace protocol developed by the Singaporean Government. In this assignment, you will have the opportunity to implement a BlueTrace protocol simulator and investigate its behaviours. Your simulator is based on a client server model consisting of one server and multiple smartphone clients. The clients communicate with the server using TCP and among themselves using UDP. The server is mainly used to authenticate the clients and retrieve the contact logs in the event of a positive COVID-19 detection since the logs are stored locally in the smartphones for 21 days only to protect the privacy of the users. The clients broadcast and receive beacons (UDP packets) when they are in the communication range. Learning Objectives On completing this assignment, you will gain sufficient expertise in the following skills: Detailed understanding of how client-server and client-client interactions work. Expertise in socket programming. Insights into implementing an application layer protocol. Assignment Specification The base specification of the assignment is worth 20 marks. The specification is structured in two parts. The first part covers the basic interactions between the clients and server and includes functionality for clients to communicate with the server. The second part asks you implement additional functionality whereby two clients can exchange messages with each other directly in a peer-to-peer fashion. This first part is self-contained (Sections 3.2 – 3.4) and is worth 15 marks. Implementing peer-to-peer messaging (beaconing) (Section 3.5) is worth 5 marks. The assignment includes 2 major modules, the server program and the client program. The server program will be run first followed by multiple instances of the client program (Each instance supports one client). They will be run from the terminals on the same and/or different hosts. BlueTrace1 Overview One of the core considerations of BlueTrace is to preserve user privacy. To this end, personal information is collected just once at point of registration and is only used to contact potentially infected patients. Contact tracing is done entirely locally on a client device using Bluetooth Low Energy (BLE) or UDP for this assignment, storing all encounters in a contact history log chronicling encounters for the past 21 days. Users in the contact log are identified using anonymous ``temporary IDs’’ (TempIDs) issued by the health authority server. This means a user's identity cannot be ascertained by anyone except the health authority with which they are registered. Furthermore, since TempIDs change randomly on a regular basis (e.g., every 15 minutes), malicious third parties cannot track users by observing log entries over time. The protocol is focused on two areas: locally logging registered users in the vicinity of a device (i.e., peer to peer communication or P2P), and the transmission of the log to the operating health authority (i.e., client to server communication or C2S). In the context of this assignment, the P2P component operates on top of the unreliable UDP communication, defining how two devices acknowledge each other's presence. The C2S component uses reliable TCP to communicate a timeline of visits to a centralised server owned by a health authority once a user has tested positive for COVID-19. The health authority can then, using the log, notify the users who came in contact with the infected patient. Server The reporting server is responsible for handling initial registration, provisioning unique user identifiers, and collecting contact logs created by the P2P part of the protocol. When the user first launches a BlueTrace app they will be asked to create a UserID (internationally format mobile phone number, e.g., +61-410-888-888) and password. This phone number is later used if the user has registered an encounter in an infected patient's contact log. Once registered, users are provisioned TempID uniquely identifying them to other devices. Each TempID has a lifetime of a defined period (e.g., 15 minutes) to prevent malicious parties from performing replay attacks or tracking users over time with static unique identifiers. Therefore, the server has the following responsibilities. User Authentication - When a client requests for a connection to the server, e.g., for obtaining a TempID or uploading contact logs after being tested as a COVID-19 positive, the server should prompt the user to input the username and password and authenticate the user. The valid username and password combinations will be stored in a file called credentials.txt which will be in the same directory as the server program. An example credentials.txt file is provided on the assignment page. Username and passwords are case-sensitive. We may use a different file for testing so DO NOT hardcode this information in your program. You may assume that each username and password will be on a separate line and that there will be one white space between the two. If the credentials are correct, the client is considered to be logged in and a welcome message is displayed. When all tasks)are done (e.g., TempID has been obtained or the contact log has been uploaded), a user should be able to logout from the server. On entering invalid credentials, the user is prompted to retry. After 3 consecutive failed attempts, the user is blocked for a duration of block_duration seconds (block_duration is a command line argument supplied to the server) and cannot login during this duration (even from another IP address). TempID Generation - TempIDs are generated as a 20-byte random number, and the server uses a file (tempIDs.txt, which will be in the same directory as the server program) to associate the relationship between TempIDs and the static UserIDs. An example tempIDs.txt file is provided on the assignment page. Contact log checking - Once a user has been tested as a CVOID-19 positive, he/she will upload his/her contact log to the reporting server. The contact log is in the following format: TempID (20 bytes), start time (19 bytes) and expiry time (19 bytes). Then, the server will map the TempID to reveal the UserID and retrieve start time and expiry time (with help of the tempIDs.txt file). The health authority can then contact the UserID (phone number) to inform a user of potential contact with an infected patient. Therefore, you program will print out a list of phone numbers and encounter timestamps. Client The client has the following responsibilities - Authentication - The client should provide a login prompt to enable the user to authenticate with the server. Download TempID - After authentication, the client should be able to download a TempID from the server and display it. Upload contact log - The client should be able to upload the contact logs to the server after authentication. For NON-CSE students, you may read the contact logs from a static file (_contactlog.txt). For CSE students, you should generate the content of _contactlog.txt file dynamically as discussed in Section 3.5 below. Commands supported by the client After a user is logged in, the client should support all the commands shown in the table below. For the following, assume that commands were run by user A. Command Description Download_tempID Download TempID from the server. Upload_contact_log Upload contact logs to the server. logout log out user A. Any command that is not listed above should result in an error message being displayed to the user. The interaction with the user should be via the terminal (i.e. console). We do not mandate the exact text that should be displayed by the client to the user for the various commands. However, you must make sure that the displayed text is easy to comprehend. Please make sure that you DO NOT print any debugging information on the client terminal. Some examples illustrating client server interaction using the above commands are provided in Section 8. Peer to Peer Communication Protocol (beaconing) The P2P part of the protocol defines how two devices communicate and log their contact. We will simulate BLE encounters with UDP in this section. Each device is in one of two states, Central or Peripheral. The peripheral device sends a packet with the following information to the central device: TempID (20 bytes), start time (19 bytes), expiry time (19 bytes) and BlueTrace protocol version (1 byte). After receiving the packet/beacon, the central device will compare current timestamp with the start time and expiry time information in the beacon. If the timing information is valid (i.e., current timestamp is between start time and expiry time), the central device will store the beacon information in a local file (_contactlog.txt) for 3 minutes2 . Note that a client can behave in either Central or Peripheral states. To implement this functionality your client should support the following command and remove the outdated (i.e., older than 3 minutes) contact log automatically (in addition to those listed in Section 3.4) File Names & Execution The main code for the server and client should be contained in the following files: server.c, or Server.java or server.py, and client.c or Client.java or client.py. You are free to create additional files such as header files or other class files and name them as you wish. The server should accept the following two arguments: • server_port: this is the port number which the server will use to communicate with the clients. Recall that a TCP socket is NOT uniquely identified by the server port number. So it is possible for multiple TCP connections to use the same server-side port number. • block_duration: this is the duration in seconds for which a user should be blocked after three unsuccessful authentication attempts. The server should be executed before any of the clients. It should be initiated as follows: If you use Java: java Server server_port block_duration If you use C: ./server server_port block_duration If you use Python: python server.py server_port block_duration Note that, you do not have to specify the TCP port to be used by the client. You should allow the OS to pick a random available port. Similarly, you should allow the OS to pick a random available UDP source port for the UDP client. Each client should be initiated in a separate terminal as follows: If you use Java: java Client server_IP server_port client_udp_port If you use C: ./client server_IP server_port client_udp_port If you use Python: python client.py server_IP server_port client_udp_port Note: When you are testing your assignment, you can run the server and multiple clients on the same machine on separate terminals. In this case, use 127.0.0.1 (local host) as the server IP address. Additional Notes Tips on getting started: The best way to tackle a complex implementation task is to do it in stages. A good place to start would be to implement the functionality to allow a single user to login with the server. Next, add the blocking functionality for 3 unsuccessful attempts. Then extend this to handle multiple clients. Once your server can support multiple clients, implement the functions for download TempID and upload contact logs. Note that, this may require changing the implementation of some of the functionality that you have already implemented. Once the communication with the server is working perfectly, you can move on to peer-to-peer communication. It is imperative that you rigorously test your code to ensure that all possible (and logical) interactions can be correctly executed. Test, test and test. Application Layer Protocol: Remember that you are implementing an application layer protocol for realising contact trace service to counter the COVID-19 pandemic. We are only considered with the end result, i.e. the functionality outlined above. You may wish to revisit some of the application layer protocols that we have studied (HTTP, SMTP, etc.) to see examples of message format, actions taken, etc. Transport Layer Protocol: You should use TCP for the communication between each client and server, and UDP for P2P communication. The TCP connection should be setup by the client during the login phase and should remain active until the user logs out, while there is no such requirement for UDP. The server port of the server is specified as a command line argument. Similarly, the server port number of UDP is specified as a command parameter of the client. The client ports for both TCP and UDP do not need to be specified. Your client program should let the OS pick up random available TCP or UDP ports. Backup and Versioning: We strongly recommend you to back-up your programs frequently. CSE backups all user accounts nightly. If you are developing code on your personal machine, it is strongly recommended that you undertake daily backups. We also recommend using a good versioning system such as github or bitbucket so that you can roll back and recover from any inadvertent changes. There are many services available for both which are easy to use. We will NOT entertain any requests for special consideration due to issues related to computer failure, lost files, etc. Language and Platform: You are free to use C, JAVA or Python to implement this assignment. Please choose a language that you are comfortable with. The programs will be tested on CSE Linux machines. So please make sure that your entire application runs correctly on these machines (i.e. your lab computers) or using VLAB. This is especially important if you plan to develop and test the programs on your personal computers (which may possibly use a different OS or version or IDE). Note that CSE machines support the following: gcc version 8.2, Java 11, Python 2.7 and 3.7. If you are using Python, please clearly mention in your report which version of Python we should use to test your code. You may only use the basic socket programming APIs providing in your programming language of choice. You may not use any special ready-to-use libraries or APIs that implement certain functions of the spec for you. There is no requirement that you must use the same text for the various messages displayed to the user on the terminal as illustrated in the examples in Section 9. However, please make sure that the text is clear and unambiguous. You are encouraged to use the forums on WebCMS to ask questions and to discuss different approaches to solve the problem. However, you should not post your solution or any code fragments on the forums. We will arrange for additional consultation hours in Weeks 7, 8 and 9 to assist you with assignment related questions if needed. Submission Please ensure that you use the mandated file name. You may of course have additional header files and/or helper files. If you are using C, then you MUST submit a makefile/script along with your code (not necessary with Java or Python). This is because we need to know how to resolve the dependencies among all the files that you have provided. After running your makefile we should have the following executable files: server and client. In addition, you should submit a small report, report.pdf (no more than 3 pages) describing the program design, the application layer message format and a brief description of how your system works. Also discuss any design tradeoffs considered and made. Describe possible improvements and extensions to your program and indicate how you could realise them. If your program does not work under any particular circumstances, please report this here. Also indicate any segments of code that you have borrowed from the Web or other books. You are required to submit your source code and report.pdf. You can submit your assignment using the give command in a terminal from any CSE machine (or using VLAB or connecting via SSH to the CSE login servers). Make sure you are in the same directory as your code and report, and then do the following: 1. Type tar -cvf assign.tar filenames e.g. tar -cvf assign.tar *.java report.pdf 2. When you are ready to submit, at the bash prompt type 3331 3. Next, type: give cs3331 assign assign.tar (You should receive a message stating the result of your submission). Note that, COMP9331 students should also use this command. Alternately, you can also submit the tar file via the WebCMS3 interface on the assignment page. Important notes The system will only accept assign.tar submission name. All other names will be rejected. Ensure that your program/s are tested in CSE Linux machine (or VLAB) before submission. In the past, there were cases where tutors were unable to compile and run students’ programs while marking. To avoid any disruption, please ensure that you test your program in CSE Linux-based machine (or VLAB) before submitting the assignment. Note that, we will be unable to award any significant marks if the submitted code does not run during marking. You may submit as many times before the deadline. A later submission will override the earlier submission, so make sure you submit the correct file. Do not leave until the last moment to submit, as there may be technical, or network errors and you will not have time to rectify it. Late Submission Penalty: Late penalty will be applied as follows: 1 day after deadline: 10% reduction 2 days after deadline: 20% reduction 3 days after deadline: 30% reduction 4 days after deadline: 40% reduction 5 or more days late: NOT accepted NOTE: The above penalty is applied to your final total. For example, if you submit your assignment 1 day late and your score on the assignment is 10, then your final mark will be 10 – 1 (10% penalty) = 9. Sample Interaction Note that the following list is not exhaustive but should be useful to get a sense of what is expected. We are assuming Java as the implementation language. Case 1: Successful Login Terminal 1 >java Server 4000 60 Terminal 2 >java Client 10.11.0.3 4000 8000(assume that server is executing on 10.11.0.3) >Username: +61410888888 >Password: comp3331 >Welcome to the BlueTrace Simulator! Case 2: Unsuccessful Login (assume server is running on Terminal 1 as in Case 1) Terminal 2 >java Client 10.11.0.3 4000 8000 (assume that server is executing on 10.11.0.3) >Username: +61410888888 >Password: comp9331 >Invalid Password. Please try again >Password: comp8331 >Invalid Password. Please try again >Password: comp7331 >Invalid Password. Your account has been blocked. Please try again later The user should now be blocked for 60 seconds (since block_time is 60). The terminal should shut down at this point. Terminal 2 (reopened before 60 seconds are over) >java Client 10.11.0.3 4000 8000 (assume that server is executing on 10.11.0.3) >Username: +61410888888 >Password: comp3331 >Your account is blocked due to multiple login failures. Please try again later Terminal 2 (reopened after 60 seconds are over) >java Client 10.11.0.3 4000 8000 (assume that server is executing on 10.11.0.3) >Username: +61410888888 >Password: comp3331 > Welcome to the BlueTrace Simulator! > Contact us to get python or java socket programming assignment help at:
- Breast Cancer Analysis Using Machine Learning
Breast Cancer Analysis and Prediction is one of the most popular problems in Machine Learning. It is one of the finest for ML Practicisers. In this Project, All the columns present in the dataset are broadly discussed. DATA DESCRIPTION: Radius: Distance from the center to the perimeter Perimeter: The value of the core tumor. The total distance between the points gives a perimeter. Area: Area of cancer cells. Smoothness: this gives the local variation in the radius lengths. The smoothness is given by the difference of radial length and means lengths of the lines around it. Compactness: It is the value of estimation of perimeter and area, it is given by perimeter^2 / area - 1.0 Concavity: The severity of concave points is given. Smaller chords encapsulate small concavities better. This feature is affected by the length Concave points: The concavity measures the magnitude of contour concavities while concave points measure the number of concave points Symmetry: The longest chord is taken as a major axis. The length difference between the line perpendicular to the major axis is taken. This is known as the symmetry. Fractal dimension: It is a measure of nonlinear growth. As the ruler used to measure the perimeter increases, the precision decreases, and hence the perimeter decreases. This data is plotted using log scale and the downward slope gives us an approximation of fractal dimension Texture: standard derivation of the grayscale area. This is helpful to find out the variation. So now we have to go through the steps to build the model on Breast cancer. Step:1 Importing the Libraries: The very first step is to Import the Libraries. For this one, we have imported some analytical libraries such as NumPy and pandas and some visualization libraries such as matplotlib seaborn and plotly. # Python libraries import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt %matplotlib inline import itertools from itertools import chain from sklearn.preprocessing import StandardScaler import warnings import plotly.offline as py py.init_notebook_mode(connected=True) import plotly.graph_objs as go import plotly.tools as tls import plotly.figure_factory as ff warnings.filterwarnings('ignore') #ignore warning messages STEP:2 LOAD THE DATA: In this step, the data is loaded. This dataset is already present in sklearn library so we can directly import the dataset from there. #loading the dataset and converting it into dataframe as dataframes are easier to manipulate and analyse the data from sklearn.datasets import load_breast_cancer data = load_breast_cancer() a=np.c_[data.data, data.target] columns = np.append(data.feature_names, ["target"]) df_cancer=pd.DataFrame(a,columns=columns) Step:3 Analyzing the dataset: In this step let's analyze the first 5 rows, NAN values(if there), and some important features and labels. #The head() shows top 5 rows of our dataframe, using this to see if the data has been correctly converted into dataframe and different column names and getting the sense of our data df_cancer.head() df_cancer['target'].value_counts() O/p: 0 357 1 212 In this dataset, our label is the target and other columns are the features. in the Target label, there are all total 352 zeros and 212 ones are present. 1 indicates the malignant and indicates the Benign. # Dividing the data into two classes malignant, according to our dataset benign has target value 1 and malignant has target value 0 Malignant=df_cancer[df_cancer['target'] ==0] Benign=df_cancer[df_cancer['target'] ==1] In this above line of code, we have created two DataFrame named Malignant(Which is extracted the target values is equal to 1),Benign(Which is extracted the target values is equal to 0). Step:3 Visualization: Let's Visualize the Target column. CountChart: #------------COUNT----------------------- trace = go.Bar(x = (len(Malignant), len(Benign)), y = ['Malignant', 'Benign'], orientation = 'h', opacity = 0.8, marker=dict( color=[ 'gold', 'lightskyblue'], line=dict(color='#000000',width=1.5))) layout = dict(title = 'Count of diagnosis variable') fig = dict(data = [trace], layout=layout) py.iplot(fig PiePlot: The Below code is to plot the percentage plot inside the PiePlot of Malignant and Benign of the Target Variable. #------------PERCENTAGE------------------- trace = go.Pie(labels = ['benign','malignant'], values = data['diagnosis'].value_counts(), textfont=dict(size=15), opacity = 0.8, marker=dict(colors=['lightskyblue', 'gold'], line=dict(color='#000000', width=1.5))) layout = dict(title = 'Distribution of diagnosis variable') fig = dict(data = [trace], layout=layout) py.iplot(fig) # Creating lists for the names of features by dividing them into three categories mean_features= ['mean radius','mean texture','mean perimeter','mean area','mean smoothness','mean compactness', 'mean concavity','mean concave points','mean symmetry', 'mean fractal dimension'] error_features=['radius error', 'texture error', 'perimeter error', 'area error', 'smoothness error', 'compactness error', 'concavity error', 'concave points error', 'symmetry error', 'fractal dimension error'] worst_features=['worst radius', 'worst texture', 'worst perimeter', 'worst area', 'worst smoothness', 'worst compactness', 'worst concavity', 'worst concave points', 'worst symmetry', 'worst fractal dimension'] In this above line of code three features are created of consisting the mean_features, error_feature, and worst_features. After this one function is created for the histogram plot. # Created a function to plot histograms with 10 subplots, creating functions for tasks reduces space complexity bins = 20 #Number of bins is set to 20, bins are specified to divide the range of values into intervals def histogram(features): plt.figure(figsize=(10,15)) for i, feature in enumerate(features): plt.subplot(5, 2, i+1) #subplot function the number of rows are given as 5 and number of columns as 2, the value i+1 gives the subplot number, subplot numbers start with 1 sns.distplot(Malignant[feature], bins=bins, color='red', label='Malignant'); sns.distplot(Benign[feature], bins=bins, color='green', label='Benign'); plt.title(str(' Density Plot of: ')+str(feature)) plt.xlabel('X variable') plt.ylabel('Density Function') plt.legend(loc='upper right') plt.tight_layout() plt.show() After the function has been created individual features are plotted by calling this Histogram function. Mean_Fetaures #Calling the function with the parameter mean features histogram(mean_features) 2. Error_features: histogram(error_features) 3. Worst_features: histogram(worst_features) Then We can plot the ROC Plot after this def ROC_curve(X,Y,string): X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.4) # Splitting the data for training and testing in 60/40 ratio model=LogisticRegression(solver='liblinear') #Using logistic regression model model.fit(X_train,y_train) probability=model.predict_proba(X_test) #Predicting probability fpr, tpr, thresholds = roc_curve(y_test, probability[:,1]) #False positive rate, True Positive Rate and Threshold is returned using this function roc_auc = auc(fpr, tpr) #The area under the curve is given by this function plt.figure() plt.plot(fpr, tpr, lw=1, color='green', label=f'AUC = {roc_auc:.3f}') plt.plot([0,1],[0,1],linestyle='--',label='Baseline') #Plotting the baseline plt.title(string) plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate ') plt.legend() plt.show() According to ROC Curves, the features with the highest area under the ROC Curve show the highest accuracy. Here we can see the worst and mean features are showing very high accuracy. When studying the histograms we need to find the ones which have the least intersecting area between the two classes. As it will be easier to classify if there is a clear distinction of X variable values. A huge density difference between values of two classes can also help in the distinction which shows most of the objects lie in a particular class for a particular X variable The top 5 features according to this are: Worst area Worst perimeter Worst radius Mean Concave Points Mean Concavity Then we print The mean of all the instances of all features for both Benign and Malignant classes. mean radius 17.462830 mean texture 21.604906 mean perimeter 115.365377 mean area 978.376415 mean smoothness 0.102898 type: float64 Then we can create features and targets. #Creating X and Y, where X has all the features and Y contains target X=df_cancer.drop(['target'],axis=1) Y=df_cancer['target'] The max_depth for a decision tree should be equal to or less than the square root of the instances for most optimum case, hence I chose the range of 1 to 24. If the depth is too large we see overfitting and if too low we see underfitting. The min_samples_leaf gives the minimum samples to become a leaf node. A too low value will give overfitting and too large a value will make it computationally expensive, hence I took the range to be 1 to 20. from imblearn.over_sampling import SMOTE from imblearn.pipeline import Pipeline max_depth = list(range(1,24)) min_leaf=list(range(1,20)) params = [{'classifier__max_depth':max_depth,'classifier__min_samples_leaf':min_leaf}] #Defining parameters for the grid search X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.4) pipe=Pipeline([('sc',StandardScaler()),('smt',SMOTE()),('classifier',DecisionTreeClassifier(random_state=0,min_samples_split=6,max_features=10))]) #Creating a pipeline grid_search_cv = GridSearchCV(pipe,params,scoring='accuracy',refit=True, verbose=1,cv=5) #Grid Search function which will put different combinations of the parameters grid_search_cv.fit(X_train,y_train) O/p: GridSearchCV(cv=5, error_score=nan,estimator=Pipeline(memory=None,steps=[('sc', StandardScaler(copy=True,with_mean=True,with_std=True)), model=grid_search_cv.best_estimator_ #Finding the best model from grid search from sklearn.metrics import accuracy_score model.fit(X_train,y_train) #Fitting the model test_pred = model.predict(X_test) print(accuracy_score(y_test, test_pred)) #accuracy score function, to print the accuracy of the model y_test.value_counts() O/p: 0.9429824561403509 Then let's calculate the accuracy. from sklearn.metrics import confusion_matrix from sklearn.metrics import classification_report matrix=np.array(confusion_matrix(y_test,test_pred,labels=[0,1])) #Creating confusion matrix pd.DataFrame(matrix,index=['Cancer','No Cancer'],columns=['Predicted_Cancer','Predicted_No_Cancer']) #Labelling the matrix It is a tree flowchart, each observation splits according to some feature. There are two ways to go from each node if the condition is true it goes one way and if false it goes the other way. The first line(here X7),gives the feature and compares it to some value. The second row gives us the value of gini index at every node.Gini index is computed mathematically. Gini index=0 means the node is perfect and we get definite class. The sample row gives us the number of samples being considered. The value row in each node gives us the number of samples in each class. In all the nodes the features are considered but the feature which gives best gini index is chosen. from sklearn import tree plt.figure(figsize=(40,40)) tree.plot_tree(model['classifier']) #function used to plot decision tree For plotting Important features we have to extract from the classifier. feat_importances = pd.Series(model['classifier'].feature_importances_, index=X.columns) #function to save the most important features feat_importances = feat_importances.nlargest(5) #as we need only 5 features nlargest() is used feat_importances.plot(kind='barh',figsize=(12,8),title='Most Important Features') #plotting bar graph imp_features=list(feat_importances.index) print(feat_importances) The support vector defines the hyperplane which maximizes the margin between two classes. In a diagram, the support vector shows the margin of the hyperplane. k=1 plt.figure(figsize=(20,40)) for i in range(0,4): for j in range(1,5): inp=pd.concat([X[imp_features[i]],X[imp_features[j]]],axis=1) s=svc['classifier'].fit(inp,Y) decision_function = svc['classifier'].decision_function(inp) plt.subplot(4, 4, k) k=k+1 plt.scatter(X[imp_features[i]], X[imp_features[j]], c=Y, s=30, cmap=plt.cm.Paired) ax = plt.gca() xlim = ax.get_xlim() ylim = ax.get_ylim() xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 50),np.linspace(ylim[0], ylim[1], 50)) xy = np.vstack([xx.ravel(), yy.ravel()]).T Z = svc['classifier'].decision_function(xy).reshape(xx.shape) Z = Z.reshape(xx.shape) plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, levels=[-1, 0, 1], alpha=0.5,linestyles=['--', '-', '--']) ax.scatter(s.support_vectors_[:, 0], s.support_vectors_[:, 1], s=10,linewidth=1, facecolors='none', edgecolors='k') #Showing support vectors plt.title(str(imp_features[i])+' & '+str(imp_features[j])) So in this way, we can build a Breast Cancer. For code: https://github.com/CodersArts2017/Jupyter-Notebooks/blob/master/Breast_Cancer.ipynb Thank You! Happy Coding ;)











