Search Results

Blog Posts (737)

Other Pages (191)

Forum Posts (16)

737 results found with an empty search

Networkx Stock Market Example | Networkx Assignment Help In Machine Learning | Codersarts
First importing all the libraries: #import all libraries import networkx as nx import warnings import numpy as np import pandas as pd import matplotlib.pyplot as plt %matplotlib inline warnings.filterwarnings('ignore') Reading Data #Reading data market=pd.DataFrame.from_csv("nasdaq.csv") market.shape Introduction Negative shock spread in stock markets during the financial crisis. To model the stock market using network analysis, different stocks are represented as different nodes. However, it is not straightforward to define the connections between nodes. A traditional way to create edges is to look at the correlation of some defined attributes over a selected time frame. If the correlation is larger or lower (negative) than some threshold, the edges exit, like what we discussed in the section of the importance of nodes. DataFrame "market" includes daily returns of 2000 stocks of Nasdaq from 2013-2018. Pair of stocks have a connection if the absolute value of their correlation is high enough. We first calculate the correlation matrix of 2000 stocks and get the histogram of correlation Problem 1: Use the Louvain method to find clusters of newmarket. Take 4 largest clusters, check if they are correlated with financial measures? for example, overall return, mean of daily return, volatility, yearly Sharpe ratio (Suppose risk-free rate is 0%), etc? In other words, if there exist clusters that take high values in one of these measures, or very low value in one of these measures? If this is true, it will definitely help us select stocks Problem 2 Take the third largest cluster you obtained in problem 1, named as G2. Can you compute the distance (unweighted) of nodes of G2. Do multidimensional scaling (MDS) of this cluster into 2 dimensional space and use kmeans method to cluster G2 into 4 communities. Plot G2 using coordinates obtained in MDS. Nodes from Different community has different colors. Problem 3 How is the performance of network model in identifying good stocks for investment How to define edges are extremely important to find pattern between clusters and stock performance. Could you provide one definition of links in network of stocks and explain why you think network with such kind of links can help identify good stocks or good pattern for investment in stock market? Answer: With higher dimensions of coordinates, this method is more likely group nodes connected with shortest path. If edge betweenness is higher, the link is weak. in the above graph cluster 1 ,edges are close to each othere which belongs to same group so these are good stocks pattern Get any types of advanced machine learning mathematics assignment help with the help of our expert, below we add the contact details so you can contact at here to get help: contact@codersarts.com or codersarts@gmail.com
Networkx Expert Help | Networkx Assignment Help
INTRODUCTION The NetworkX is python package which offers graph functionality and basic operations like graph creation, adding nodes, edged between two node, adding weight to edge, finding degree, in degree, out degree, Search, calculations and graph algorithms. in short, NetworkX provides data structures for graphs (or networks) along with graph algorithms The NetworkX package provides classes for graph objects, generators to create standard graphs, IO routines for reading in existing datasets, algorithms to analyze the resulting networks and some basic drawing tools. Most of the NetworkX API is provided by functions which take a graph object as an argument. Methods of the graph object are limited to basic manipulation and reporting. NetworkX Basics After starting Python, import the networkx module with (the recommended way) import networkx as nx To save repetition, in the documentation we assume that NetworkX has been imported this way. If importing networkx fails, it means that Python cannot find the installed module. Check your installation and your PYTHONPATH. Install Install the latest version of NetworkX: $ pip install networkx The following basic graph types are provided as Python classes: Graph: This class implements an undirected graph. It ignores multiple edges between two nodes. It does allow self-loop edges between a node and itself. DiGraph: Directed graphs, that is, graphs with directed edges. Provides operations common to directed graphs, (a subclass of Graph). MultiGraph: A flexible graph class that allows multiple undirected edges between pairs of nodes. The additional flexibility leads to some degradation in performance, though usually not significant. MultiDiGraph: A directed version of a MultiGraph. Empty graph-like objects are created with G = nx.Graph() G = nx.DiGraph() G = nx.MultiGraph() G = nx.MultiDiGraph() All graph classes allow any hashable object as a node. Hashable objects include strings, tuples, integers, and more. Arbitrary edge attributes such as weights and labels can be associated with an edge. The graph internal data structures are based on an adjacency list representation and implemented using Python dictionary data structures. The graph adjacency structure is implemented as a Python dictionary of dictionaries; the outer dictionary is keyed by nodes to values that are themselves dictionaries keyed by neighboring node to the edge attributes associated with that edge. This "dict-of-dicts” structure allows fast addition, deletion, and lookup of nodes and neighbors in large graphs. The underlying data structure is accessed directly by methods (the programming interface “API”) in the class definitions. All functions, on the other hand, manipulate graph-like objects solely via those API methods and not by acting directly on the data structure. This design allows for possible replacement of the ‘dicts-of-dicts’-based data structure with an alternative data structure that implements the same methods. Graphs The basic graph classes are named: Graph, DiGraph, MultiGraph, and MultiDiGraph After creating object of NetworkX class the first choice to be made is what type of graph object to use. A graph (network) is a collection of nodes together with a collection of edges that are pairs of nodes. Attributes are often associated with nodes and/or edges. NetworkX graph objects come in different flavors depending on two main properties of the network: Directed: Are the edges directed? Does the order of the edge pairs (𝑢, 𝑣) matter? A directed graph is specified by the “Di” prefix in the class name, e.g. DiGraph(). We make this distinction because many classical graph properties are defined differently for directed graphs. Multi-edges: Are multiple edges allowed between each pair of nodes? As you might imagine, multiple edges requires a different data structure, though clever users could design edge data attributes to support this functionality. We provide a standard data structure and interface for this type of graph using the prefix “Multi”, e.g., MultiGraph().
Networkx Analysis In Machine Learning | Python Machine Learning Assignment Help | Codersarts
Before starting the networkx first, we know what is a graph? and why we use graphs? In mathematics we will learn the graph and their applications like that: It denoted by edge and vertices: V = {A, B, C, D, F} E = {((A,B), (B,C), etc} Now we can say: "Graphs are mathematical structures used to study pairwise relationships between objects and entities." In data Science, it created using a package called "networkx" that makes it easy to draw the graphs. Graphs in python We will be using the networkx package in Python. It can install using the pip command. Now we will creating simple graph uisng: Step 1: In first step import networkx libraries import networkx as nx Step 2: Creating Graph G = nx.Graph() Step 3: Add a node # Add a node G.add_node(1) #Adding Multiple Nodes G.add_nodes_from([2,3]) Step 4: Adding Edges # Add edges G.add_edge(1,2) Other Useful methods which is used to create graphs subgraph(G, nbunch) - induced subgraph view of G on nodes in nbunch union(G1,G2) - graph union disjoint_union(G1,G2) - graph union assuming all nodes are different cartesian_product(G1,G2) - return Cartesian product graph compose(G1,G2) - combine graphs identifying nodes common to both complement(G) - graph complement create_empty_copy(G) - return an empty copy of the same graph class convert_to_undirected(G) - return an undirected representation of G convert_to_directed(G) - return a directed representation of G Accessing edges and nodes Nodes and Edges can be accessed together using the G.nodes() and G.edges() G.nodes() Output: NodeView((1, 2, 3)) G.edges() Output: EdgeView([(1, 2), (1, 3), (2, 3)]) Graph Visualization Networkx provides basic functionality for visualizing graphs. matplotlib offers some convenience functions. "GraphViz" is probably the best tool for us as it offers a Python interface in the form of "PyGrapgViz" %matplotlib inline import matplotlib.pyplot as plt nx.draw(G) Now working with graphViz, which is Install from Graphviz from the website: import pygraphviz as pgv d={'1': {'2': None}, '2': {'1': None, '3': None}, '3': {'1': None}} A = pgv.AGraph(data=d) print(A) # This is the 'string' or simple representation of the Graph Output: strict graph "" { 1 -- 2; 2 -- 3; 3 -- 1; } Need help Using realtime dataset you can contact us at below contact details: contact@codersarts.com or codersarts@gmail.com
Top Python Assignment Topics For Beginners | Python Programming Help | Codersarts
In this blog, we will learn the most important python topics which will help to become the python expert. Now day python is the top demandable programming language. It is the most highly paid job language. It also covers AI and Machine learning. At the stage of beginners the main problem is that which topics are important to read so here we have collected important topics of python which is listed below: Strings In Python Access characters in string by index Check if a string contains a sub string Iterate over the characters in string Find occurrence a sub-string in string Compare strings in Python Replace characters in a string And more others Dictionary In Python Introduction to Dictionaries Creating Dictionaries in Python Iterating over dictionaries Check if a key exists in dictionary Get all the keys in Dictionary Get all the Values in a Dictionary And more others Tuple In Python Create a Tuple Find an element in Tuple Add, update & delete in tuple And More Python : Functions Global variables Variable number of arguments Unpack tuple / dictionary Lambda functions And more others Date & Time In Python Get Current date and timestamp Convert string to a datetime And More others Python -Directories Create a Directory in python Check if a file or directory exist Check if a directory is empty Get list of files in a directory Delete a directory recursively And More others Iterators & Generators In Python Iterator vs Iterable vs Iteration Make a class Iterable Yield Keyword & Generators And more others Numpy Using Python Create a numpy array from list, tuple or list of lists Find the index of value in Numpy Array Select element or sub array by index from numpy array Select rows / columns by index from a 2D numpy array Select elements by conditions from Numpy Array Create a Numpy Array of evenly spaced numbers And more Pandas Dataframes Using Python Create DataFrame from dictionary in pandas Get list of column and row names in DataFrame Change Column & Row names in DataFrame Select Rows & Columns using loc & iloc in DataFrame Select Rows based on conditions And More other topics Multi-threading in Python Create a Thread using function Create a Thread using a Class And more others If you learn more others important python topics related to advanced python then you can send your request directly at here: codersarts@gmail.com
Top Machine Learning Topics For Beginners | Codersarts
Here the top 10 machine learning projects which help to learn the basic concept of machine learning. After completion of this we will learn some advanced topics of machine learning. 1: Understanding Pandas Before start learning machine learning first know about the pandas, here we will learn pandas with the help of some questions which is given below: Consider the following Python dictionary data and Python list labels: data = {'birds': ['Cranes', 'Cranes', 'plovers', 'spoonbills', 'spoonbills', 'Cranes', 'plovers', 'Cranes', 'spoonbills', 'spoonbills'], 'age': [3.5, 4, 1.5, np.nan, 6, 3, 5.5, np.nan, 8, 4], 'visits': [2, 4, 3, 4, 3, 4, 2, 2, 3, 2], 'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']} labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'] Create a DataFrame birds from this dictionary data which has the index labels. Sol. #import libraries import pandas as pd import numpy as np Read data: data = {'birds': ['Cranes', 'Cranes', 'plovers', 'spoonbills', 'spoonbills', 'Cranes', 'plovers', 'Cranes', 'spoonbills', 'spoonbills'], 'age': [3.5, 4, 1.5, np.nan, 6, 3, 5.5, np.nan, 8, 4], 'visits': [2, 4, 3, 4, 3, 4, 2, 2, 3, 2], 'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']} Creating a data frame: birds = pd.DataFrame(data, index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']) birds Do itself Display a summary of the basic information about birds DataFrame and its data. Print all the rows with only 'birds' and 'age' columns from the dataframe select [2, 3, 7] rows and in columns ['birds', 'age', 'visits'] select the rows where the number of visits is less than 4 select the rows with columns ['birds', 'visits'] where the age is missing i.e NaN Select the rows where the birds is a Cranes and the age is less than 4 Select the rows the age is between 2 and 4(inclusive) Find the total number of visits of the bird Cranes Calculate the mean age for each different birds in dataframe. 2: Practice to write the functions in Python After learning pandas next one is how to write code using functions in python because the function in python machine learning is an important part, here we have provided some questions so you can practice here if you face any issue you can ask with codersarts experts. Do it itself: 1. Write a function that inputs a number and prints the multiplication table of that number 2. Write a program to print twin primes less than 1000. If two consecutive odd numbers are both primes then they are known as twin primes 3. Write a program to find out the prime factors of a number. Example: prime factors of 56 - 2, 2, 2, 7 4. Write a program to implement these formulae of permutations and combinations. Number of permutations of n objects taken r at a time: p(n, r) = n! / (n-r)!. Number of combinations of n objects taken r at a time is: c(n, r) = n! / (r!*(n-r)!) = p(n,r) / r! 5. Write a function that converts a decimal number to binary number 6. Write a function cubesum() that accepts an integer and returns the sum of the cubes of individual digits of that number. Use this function to make functions PrintArmstrong() and isArmstrong() to print Armstrong numbers and to find whether is an Armstrong number. 7. Write a function prodDigits() that inputs a number and returns the product of digits of that number. 8. If all digits of a number n are multiplied by each other repeating with the product, the one digit number obtained at last is called the multiplicative digital root of n. The number of times digits need to be multiplied to reach one digit is called the multiplicative persistance of n. Example: 86 -> 48 -> 32 -> 6 (MDR 6, MPersistence 3) 341 -> 12->2 (MDR 2, MPersistence 2) Using the function prodDigits() of previous exercise write functions MDR() and MPersistence() that input a number and return its multiplicative digital root and multiplicative persistence respectively 9. Write a function sumPdivisors() that finds the sum of proper divisors of a number. Proper divisors of a number are those numbers by which the number is divisible, except the number itself. For example proper divisors of 36 are 1, 2, 3, 4, 6, 9, 18 10. A number is called perfect if the sum of proper divisors of that number is equal to the number. For example 28 is perfect number, since 1+2+4+7+14=28. Write a program to print all the perfect numbers in a given range 11. Two different numbers are called amicable numbers if the sum of the proper divisors of each is equal to the other number. For example 220 and 284 are amicable numbers. Sum of proper divisors of 220 = 1+2+4+5+10+11+20+22+44+55+110 = 284 Sum of proper divisors of 284 = 1+2+4+71+142 = 220 12. Write a function to print pairs of amicable numbers in a range 3. Practice to write code in Python: without numpy or sklearn Now we will practice writing code without any libraries like NumPy or sklearn. Do it itself: Print the product of two matrices without any libraries Select a number randomly with probability proportional to its magnitude from the given array of n elements Replace the digits in the string with # Find the closest points: Consider you are given n data points in the form of list of tuples like S=[(x1,y1),(x2,y2),(x3,y3),(x4,y4),(x5,y5),..,(xn,yn)] and a point P=(p,q) Your task is to find 5 closest points(based on cosine distance) in S from P If you need any help related to any of the above topics which is most important in machine learning then you can contact us here: contact@codersarts.com
"User Identification In Tor Network" In Machine Learning | Machine Learning Assignment Help
INTRODUCTION USER IDENTIFICATION IN TOR NETWORKS Unlike conventional World Wide Web technologies, the Tor Darknet onion routing technologies give users a real chance to remain anonymous. Many users have jumped at this chance – some did so to protect themselves or out of curiosity, while others developed a false sense of impunity, and saw an opportunity to do clandestine business anonymously: selling banned goods, distributing illegal content, etc. However, further developments, such as the detention of the maker of the Silk Road site, have conclusively demonstrated that these businesses were less anonymous than most assumed. Intelligence services have not disclosed any technical details of how they detained cybercriminals who created Tor sites to distribute illegal goods; in particular, they are not giving any clues how they identify cybercriminals who act anonymously. This may mean that the implementation of the Tor Darknet contains some vulnerabilities and/or configuration defects that make it possible to unmask any Tor user. TOR NETWORK Tor is software that allows users to browse the Web anonymously. Developed by the Tor Project, a nonprofit organization that advocates for anonymity on the internet, Tor was originally called The Onion Router because it uses a technique called onion routing to conceal information about user activity. Tor Browser offers the best anonymous web browsing available today, and researchers are hard at work improving Tor's anonymity properties. Tor is an Internet networking protocol designed to anonymize the data relayed across it. Using Tors software will make it difficult, if not impossible, for any snoops to see your webmail, search history, social media posts or other online activity. How does Tor work? The Tor network runs through the computer servers of thousands of volunteers spread throughout the world. The data is bundled into an encrypted packet when it enters the Tor network. Then, unlike the case with normal Internet connections, Tor strips away part of the packets header, which is a part of the addressing information that could be used to learn things about the sender such as the operating system from which the message was sent. Finally, Tor encrypts the rest of the addressing information, called the packet wrapper. Regular Internet connections dont do this. The modified and encrypted data packet is then routed through many of these servers, called relays, on the way to its final destination.The roundabout way packets travel through the Tor network is akin to a person taking a roundabout path through a city to shake a pursuer. DEEP NEURAL NETWORK-BASED USER IDENTIFICATION SYSTEM Deep Neural Networks (DNNs), also called convolutional networks, are composed of multiple levels of nonlinear operations, such as neural nets with many hidden layers. Deep learning methods aim at learning feature hierarchies, where features at higher levels of the hierarchy are formed using the features at lower. It is a neural network with a certain level of complexity, a neural network with more than two layers. Deep neural networks use sophisticated mathematical modeling to process data in complex ways. SOME ADVANTAGES OF USING THIS APPROACH OVER OTHERS Has best-in-class performance on problems that significantly outperforms other solutions in multiple domains. Reduces the need for feature engineering, one of the most time-consuming parts of machine learning practice. It is an architecture that can be adapted to new problems relatively easily. SOME DISADVANTAGES OF THIS SYSTEM Requires a large amount of data It is extremely computationally expensive to train. The most complex models take weeks to train using hundreds of machines equipped with expensive GPUs. Do not have much in the way of a strong theoretical foundation. This leads to the next disadvantage. Determining the topology/flavor/training method/hyperparameters for deep learning is a black art with no theory to guide you. What is learned is not easy to comprehend. Other classifiers (e.g. decision trees, logistic regression, etc) make it much easier to understand what’s going on. Requirement specification 3.1.1 KERAS Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research. KERAS is an Open Source Neural Network library. It is designed to be modular, fast, and easy to use. It was developed by François Chollet, a Google engineer. Keras doesn't handle low-level computation. Instead, it uses another library to do it, called the Backend. So Keras is a high-level API wrapper for the low-level API, capable of running on top of TensorFlow, CNTK, or Theano. It wraps the efficient numerical computation libraries Theano and TensorFlow and allows you to define and train neural network models in just a few lines of code. 3.1.2 PANDAS pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with structured (tabular, multidimensional, potentially heterogeneous) and time-series data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real-world data analysis in Python.DataFrames in Python are very similar, they come with the Pandas library, and they are defined as two-dimensional labeled data structures with columns of potentially different types. Pandas DataFrame consists of three main components: the data, the index, and the columns. Firstly, the DataFrame can contain data Besides data, you can also specify the index and column names for your DataFrame. Steps To Do This Data Collection This step could be executed in multiple ways, one of these ways is by using Wireshark to track the live data. A brief description of Wireshark is given below: Data Preprocessing The next stage in our analysis is the preprocessing phase. Data pre-processing is a data mining technique that involves transforming raw data into an understandable format. Real-world data is often incomplete, inconsistent, and/or lacking in certain behaviors or trends, and is likely to contain many errors. Data pre-processing is a proven method of resolving such issues. Data pre-processing prepare raw data for further processing. Data pre-processing is used database-driven applications such as customer relationship management and rule-based applications (like neural networks). Data goes through a series of steps during pre-processing: 1. Data Cleaning: Data is cleansed through processes such as filling in missing values, smoothing the noisy data, or resolving the inconsistencies in the data. 2. Data Integration: Data with different representations are put together and conflicts within the data are resolved. 3. Data Transformation: Data is normalized, aggregated, and generalized. 4. Data Reduction: This step aims to present a reduced representation of the data in a data warehouse. 5. Data Discretization: Involves the reduction of a number of values of a continuous attribute by dividing the range of attribute intervals. Once we transformed the tor data in accordance with these standards, there were still some inconsistencies which could be defined as follows: Inaccurate data (missing data) —There are many reasons for missing data such as data is not continuously collected, a mistake in data entry, technical problems with bio-metrics, and much more. The presence of noisy data (erroneous data and outliers) — the reasons for the existence of noisy data could be a technological problem of gadget that gathers data, a human mistake during data entry and much more. Inconsistent data — the presence of inconsistencies are due to the reasons such that existence of duplication within data, human data entry, containing mistakes in codes or names, i.e., violation of data constraints and much more. Therefore, to handle raw data, Data Pre-processing is performed. Feature Selection Feature Selection is one of the core concepts in machine learning which hugely impacts the performance of your model. The data features that you use to train your machine learning models have a huge influence on the performance you can achieve. Feature Selection Algorithms There are three general classes of feature selection algorithms: filter methods, wrapper methods, and embedded methods. 1. Filter Methods: Filter feature selection methods apply a statistical measure to assign a scoring to each feature. The features are ranked by the score and either selected to be kept or removed from the dataset. The methods are often univariate and consider the feature independently, or with regard to the dependent variable. 2. Wrapper Methods: Wrapper methods consider the selection of a set of features as a search problem, where different combinations are prepared, evaluated and compared to other combinations. A predictive model is used to evaluate a combination of features and assign a score based on model accuracy. 3. Embedded Methods: Embedded methods learn which features best contribute to the accuracy of the model while the model is being created. The most common type of embedded feature selection method is regularization methods. How to select features and what are the Benefits of performing feature selection before modeling your data? Reduces Overfitting: Less redundant data means less opportunity to make decisions based on noise. Improves Accuracy: Less misleading data means modeling accuracy improves. Reduces Training Time: fewer data points reduce algorithm complexity and algorithms train faster. As shot description of the feature set we have used is given in the snippet below: Model Selection Model selection is a process that can be applied both across different types of models (e.g. logistic regression, SVM, KNN, etc.) and across models of the same type configured with different model hyperparameters (e.g. different kernels in an SVM). There are many approaches for model selection such as: Model Selection Using (SRM): In any ML problem, we specify a hypothesis class H, which we believe includes a good predictor for the learning task at hand. In the SRM paradigm, we specify a weight function which, assigns a weight to each hypothesis class such that a higher weight reflects a stronger preference for the hypothesis class. So the bottom line is that model selection is the process of selecting one final machine learning model from among a collection of candidate machine learning models for a training dataset. For this project, we built a deep learning model from neural networks. The main architecture of this model is shown in the snippet below: Model Evaluation The model evaluation shows how well the model has performed based on the accuracy and loss achieved by the model. The following shows the loss and accuracy of mode for just four epochs. If you need any coding help related to any machine learning realtime project assignment help then you can send mail at codersarts@gmail.com
What is Decision Tree Algorithm In Machine Learning | Codersarts
What is a decision tree? A decision tree is a map of the possible outcomes of a series of related choices. It allows an individual or organization to weigh possible actions against one another based on their costs, probabilities, and benefits. They can be used either to drive informal discussion or to map out an algorithm that predicts the best choice mathematically. A decision tree typically starts with a single node, which branches into possible outcomes. Each of those outcomes leads to additional nodes, which branch off into other possibilities. This gives it a treelike shape. There are three different types of nodes: chance nodes, decision nodes, and end nodes. A chance node, represented by a circle, shows the probabilities of certain results. A decision node, represented by a square, shows a decision to be made, and an end node shows the final outcome of a decision path. Decision trees can also be drawn with flowchart symbols, which some people find easier to read and understand. Decision tree symbols Here list of some important symbol which is used to create a decision tree. There is other more symbol which is you can search using google, Now we go ahead and read the next point regarding decision tree: Important Terminology related to Decision Trees Root Node: It represents the entire population or sample and this further gets divided into two or more homogeneous sets. Splitting: It is a process of dividing a node into two or more sub-nodes. Decision Node: When a sub-node splits into further sub-nodes, then it is called the decision node. Leaf / Terminal Node: Nodes do not split is called Leaf or Terminal node. Pruning: When we remove sub-nodes of a decision node, this process is called pruning. You can say the opposite process of splitting. Branch / Sub-Tree: A subsection of the entire tree is called branch or sub-tree. Parent and Child Node: A node, which is divided into sub-nodes is called a parent node of sub-nodes whereas sub-nodes are the child of a parent node. Types of Decision Trees Types of the decision tree are based on the type of target variable we have. It can be of two types: 1. Categorical Variable Decision Tree: Decision Tree which has categorical target variable then it called a categorical variable decision tree. E.g.:- In the above scenario of student problem, where the target variable was “Student will play cricket or not” i.e. YES or NO. 2. Continuous Variable Decision Tree: Decision Tree has a continuous target variable then it is called Continuous Variable Decision Tree. Assumptions while creating Decision Tree Some of the assumptions we make while using Decision tree: In the beginning, the whole training set is considered as the root. Feature values are preferred to be categorical. If the values are continuous then they are discretized prior to building the model. Records are distributed recursively on the basis of attribute values. Order to placing attributes as root or internal node of the tree is done by using some statistical approach. Advantages of Decision Tree: Easy to Understand: Decision tree output is very easy to understand even for people from the non-analytical background. It does not require any statistical knowledge to read and interpret them. Its graphical representation is very intuitive and users can easily relate their hypothesis. Useful in Data exploration: Decision tree is one of the fastest ways to identify the most significant variables and the relation between two or more variables. With the help of decision trees, we can create new variables/features that have a better power to predict the target variable. It can also be used in the data exploration stage. For e.g., we are working on a problem where we have information available in hundreds of variables, their decision tree will help to identify the most significant variable. Decision trees implicitly perform variable screening or feature selection. Decision trees require relatively little effort from users for data preparation. Less data cleaning required: It requires less data cleaning compared to some other modeling techniques. It is not influenced by outliers and missing values to a fair degree. The data type is not a constraint: It can handle both numerical and categorical variables. It can also handle multi-output problems. Non-Parametric Method: Decision tree is considered to be a non-parametric method. This means that decision trees have no assumptions about space distribution and the classifier structure. Disadvantages of Decision Tree: Overfitting: Decision-tree learners can create over-complex trees that do not generalize the data well. This is called overfitting. Overfitting is one of the most practical difficulties for decision tree models. This problem gets solved by setting constraints on model parameters and pruning. Not fit for continuous variables: While working with continuous numerical variables, the decision tree loses information, when it categorizes variables in different categories. Decision trees can be unstable because small variations in the data might result in a completely different tree being generated. This is called variance, which needs to be lowered by methods like bagging and boosting. Greedy algorithms cannot guarantee to return the globally optimal decision tree. This can be mitigated by training multiple trees, where the features and samples are randomly sampled with replacement. Decision tree learners create biased trees if some classes dominate. It is therefore recommended to balance the data set prior to fitting with the decision tree. Information gain in a decision tree with categorical variables gives a biased response for attributes with greater no. of categories. How to draw a decision tree To draw a decision tree, first pick a medium. You can draw it by hand on paper or a whiteboard, or you can use special decision tree software. In either case, here are the steps to follow: 1. Start with the main decision. Draw a small box to represent this point, then draw a line from the box to the right for each possible solution or action. Label them accordingly. 2. Add chance and decision nodes to expand the tree as follows: If another decision is necessary, draw another box. If the outcome is uncertain, draw a circle (circles represent chance nodes). If the problem is solved, leave it blank (for now). From each decision node, draw possible solutions. From each chance node, draw lines representing possible outcomes. If you intend to analyze your options numerically, include the probability of each outcome and the cost of each action. 3. Continue to expand until every line reaches an endpoint, meaning that there are no more choices to be made or chance outcomes to consider. Then, assign a value to each possible outcome. It could be an abstract score or a financial value. Add triangles to signify endpoints. With a complete decision tree, you’re now ready to begin analyzing the decision you face. If you need real-life coding example which is implemented using sklearn or from scratch then you can cotact us at codersarts@gmail.com
Python Coding Standards - Codersarts
The purpose of these coding standards is to make programs readable and maintainable. In the “real world” you may need to update your own code more than 6 months after having written the original – or worse, you might need to update someone else’s. For this reason, every programming department has a set of standards or conventions that programmers are expected to follow. Neatness counts! Part of every project and homework grade is how well these standards are followed. It is your responsibility to understand these standards. If you have questions, ask any of the TAs or the instructors. Naming Conventions Use meaningful variable names!! For example, if your program needs a variable to represent the radius of a circle, call it radius, not r and not rad. The use of obvious, common, meaningful abbreviations is permitted. For example, ‘number’ can be abbreviated as num as in num_students. The use of single letter variables is for bidden except in loops. Begin variable and function names with lowercase letters. Names of constants should be in all caps with underscores between words. e.g., EURO_TO_USD = 1.20 or MAX_NUM_STUDENTS = 100 Separate “words” within identifiers (function names and variable names) with underscores eg., grand_total (Fun fact: this is called snake case because it makes the variables look kinda like snakes!) Do not use global variables! Use of global variables is forbidden. Use of Whitespace The prudent use of whitespace goes a long way to making your program readable. Horizontal whitespace (spaces between characters) will make it easier to read your code. Vertical whitespace (blank lines between lines of code) will help you to organize it. Use a single blank line to separate major parts of a function. Use two blank lines to separate functions. Indentation should be 4 spaces long. Using Tab in emacs will accomplish this. Use spaces around all operators. For example, write x = y + 5, NOT x=y+5. Lines of code should be no longer than 79 characters (the default size of an emacs window). Code that “wraps” around a line is difficult to read. Line Length Avoid lines of code longer than 79 characters, since they’re not handled well by many terminals, and often make your code more difficult to read. If a line of your code is longer than 80 characters, you may be doing too much in one line of code, or you may have nested too deep with loops and conditionals. If you have a line of code that is unavoidably longer than 79 characters, you can continue the code on the next line by putting a “\” (backslash) after a breakpoint in the code (e.g., after a “+”, after a comma, etc.). If you’re using emacs, it will automatically indent the rest of the line of code following the backslash. For example: choice = int(input("Please enter a number between " + str(min n) + " and " + str(maxx) + ", inclusive: ")) Can become: choice = int(input("Please enter a number between " + \ str(minn) + " and " + str(maxx) + ", inclusive: ")) def where_is_the_cat(cat, location_tried): """" A function to look for the cat :param cat: the name of the cat to look for :param location_tried: the integer IDs I'll look in :return: the integer ids for places I looked """ print("I don't know,", TOTALLY_NORMAL_HUMAN_NAME) return location_tried if __name__ == '__main__': # introduces the programmer print("Hello, my name is perfectly reasonable. \ Say, do you know where the cat is?") where_is_the_cat("Jules", [1, 2]) Use of Constants To improve readability, you should use constants whenever you are dealing with hard- coded values. Your code shouldn't have any “magic numbers,” or numbers whose meaning is unknown. Your code should also avoid “magic strings,” or strings that have a specific use within the program (e.g., choices a user could make such as “yes,” “STOP”, etc.). For example: total = subtotal + subtotal * .06 In the code above, .06 is a magic number. What is it? The number itself tells us nothing; at the very least, this code would require a comment. However, if we use a constant, the number's meaning becomes obvious, the code becomes more readable, and no comment is required. Constants are typically declared near the top of the program so that if their value ever changes they are easy to locate to modify. Constants may be placed before the if __name__ == '__main__': statement – this makes them global constants, which means everything in the file has access to them. (Global variables are only allowed for constants!) Here’s the updated code: TAX_RATE = .06 if __name__ == '__main__': # lots of code goes here total = subtotal + subtotal * TAX_RATE # other code goes here print("Maryland has a sales tax rate of", TAX_RATE, "percent") Comments Programmers rely on comments to help document the project and parts of the project. Generally, we categorize comments as one of three types: File Header Comments Function Header Comments In-Line Comments (1) and (2) will use triple quotes ("""") A.K.A. docstrings. (3) will use pound signs (#). 1. File Header Comments Every file should contain a comment at the top describing the contents of the file and other pertinent information. This "file header comment" MUST include the following information. The file name Your name The date the file was created Your section number Your school e-mail address A brief description of the contents of the file For example: """ File: lab2.py Author: YOUR NAME Date: THE DATE Lab Section: YOUR LAB Section Email: YOUREMAIL@school.edu Description: This program shows the layout of code in a Python file, and greets the user with the name of the programmer """ 2. Function Header Comments Every single function must have a header comment that includes the following: A description of what the function does :param parameter_name (name, type and short description) :return: description of what is returned For example: def where_is_the_cat(cat, location_tried): """ A function to look for the cat :param cat: the name of the cat to look for :param location_tried: the integer IDs I'll look in :return: the integer ids for places I looked """ print("I don't know,", TOTALLY_NORMAL_HUMAN_NAME) return location_tried 3. In-Line Comments In-line comments are comments within the code itself. They are normally comments for the line(s) of code directly below them. Well-structured code will be broken into logical sections that perform a simple task. Each of these sections of code (often starting with an 'if' statement, or a loop) should be documented. Any “confusing looking” code should also be commented. Do not comment every line of code. Trivial comments (e.g., # increment x ) clutter up your code and are worse than no comments at all. In-line comments are used to clarify what your code does, not how it does it. An in-line comment appears above the code to which it applies. It is also indented to the same level as the code it is a comment for; comments that are not correctly indented make the code less readable. For example: # go over the list of numbers given by the user for num in userNumList: # if it's odd, print it, if it's even, do nothing if num % 2 == 1: print(num) Built-In Functions and Functionality Python has many useful language features, built-in modules, and built-in functions that easily let a programmer perform a variety of tasks. However, due to the introductory nature of this course, you are not permitted to use any Python construct, built-in module, or third-party library that is not explicitly covered in the lecture slides. You are also not permitted to use anything that has not yet been covered in lecture. Using a built-in function or functionality to solve a problem by having Python do the work for you does not show that you have mastered the concepts behind it, and hence does not fulfill the assignment. If we do not show you how to use it in class, you can assume that it is off limits. If you find yourself unsure if you are allowed to use something, please consult with a member of the CMSC 201 course staff for clarification. Break and Continue Using break, pass, or continue is not allowed in any of your code for this class. Using these statements damages the readability of your code. Readability is a quality necessary for easy code maintenance. Using any of these will lead to an immediate deduction of points.
Creating "Neighborhood" Database and performing queries using MySql
This schema consists of a database named "Neighborhood" and various tables are created in this database. These tables helps in managing all sorts of data in a systematic way. By performing different kinds of queries we can easily extract the required data. But before performing any queries, we need a database and tables to perform queries on. So first, we have to create the database "Neighborhood" : mysql> CREATE DATABASE 'Neighborhood'; After creating the database, then we have to use that database.: USE 'Neighborhood'; In the database we have to create different tables: Creating table Cafe CREATE TABLE `cafe` ( `census_tract` bigint(11) default NULL, `county` varchar(9) default NULL, `dunkin_donuts` int(1) default NULL, `starbucks` int(1) default NULL ) Creating table Houseprice CREATE TABLE `houseprice` ( `census_tract` bigint(11) NOT NULL default '0', `house_price_index` decimal(5,2) default NULL, `median_income` int(6) default NULL, `population` int(4) default NULL, PRIMARY KEY (`census_tract`) ) After creating all the tables, we have to insert records into these tables. After inserting records the table will look like this: The Cafe table: The Houseprice table: Performing Queries: After creating the database as well as creating tables and inserting records into them, we have to perform queries on these tables: Problem#1: Write a SQL query to select all columns of the Cafe table to inspect the table. Problem#2: Write a SQL query to calculate the total and average number of Starbucks and Dunkin Donuts in Boston. Problem#3: Write a SQL query to calculate the total and average number of Starbucks and Dunkin Donuts in each county. Problem#4: Write a SQL query to select the neighborhood (census tract) with the most Starbucks stores in each county. Problem#5: Write a SQL query to select all columns of the HousePrice table to inspect the table. Problem#6: After joining Cafe and HousePrice tables, write a SQL query to calculate average house price index, average income, and average population of the neighborhoods where there is at least one Starbucks store. How about those of the neighborhoods without Starbucks? (You can write another query) Problem#7: After joining Cafe and HousePrice tables, write a SQL query to calculate average house price index, average income, and average population of the neighborhoods with median income higher than the average of Boston where there is at least one Dunkin Donuts store. How about those of the neighborhoods without Dunkin Donuts? (You can write another query) Problem#8: One might argue that neighborhoods where Starbucks are located are relatively rich with higher house prices. If Starbucks stores are simply located in higher-income neighborhoods, rather than increasing house prices, we would expect to observe no significant relationship between Starbucks and house prices in higher-income neighborhoods. After joining Cafe and HousePrice tables, write a SQL query to calculate average house price index, average income, and average population of the neighborhoods with median income higher than the average of Boston where there is at least one Starbucks store. How about those of the neighborhoods without Starbucks? (You can write another query) Problem#9: After joining Cafe and HousePrice tables, write a SQL query to calculate average house price index, average income, and average population of the neighborhoods with median income higher than the average of Boston where there is at least one Dunkin Donuts store. How about those of the neighborhoods without Dunkin Donuts? (You can write another query) Problem#10: Based on your analysis with SQL, do you agree or disagree that Starbucks is the bellwether of rise in house prices of the neighborhood? How about Dunkin Donuts? To get solution for the above queries you can contact us on contact@codersarts.com
Creating "2016 Presidential Election" Database and performing queries using MySql
This schema consists of a database named "2016 Presidential Election" and various tables are created in this database. These tables helps in managing all sorts of data in a systematic way. Elections data are usually very large, so we can not get any particular data just by looking at the data. Hence to get the required data we need to perform queries. By performing different kinds of queries we can easily extract the required data. But before performing any queries, we need a database and tables to perform queries on. So first, we have to create the database "2016 Presidential Election" : mysql> CREATE DATABASE '2016 Presidential Election'; After creating the database, then we have to use that database.: USE '2016 Presidential Election'; In the database we have to create different tables: Creating table Demographics CREATE TABLE IF NOT EXISTS `Demographics` ( `CountyID` int(5) DEFAULT NULL, `Name` varchar(20) DEFAULT NULL, `State` varchar(20) DEFAULT NULL, `Total_Population` int(7) DEFAULT NULL, `Percent_White` int(3) DEFAULT NULL, `Percent_Black` int(2) DEFAULT NULL, `Percent_Asian` int(2) DEFAULT NULL, `Percent_Hispanic` int(2) DEFAULT NULL, `Per_Capita_Income` int(5) DEFAULT NULL, `Median_Rent` int(4) DEFAULT NULL, `Median_Age` decimal(3,1) DEFAULT NULL ) Creating table GoogleTrends CREATE TABLE IF NOT EXISTS `GoogleTrends` ( `State` varchar(20) DEFAULT NULL, `Google_Donald_Trump` decimal(4,2) DEFAULT NULL, `Google_Hillary_Clinton` decimal(4,2) DEFAULT NULL ) Creating table Votes CREATE TABLE IF NOT EXISTS `Votes` ( `CountyID` int(5) DEFAULT NULL, `Democrats` int(7) DEFAULT NULL, `Republican` int(6) DEFAULT NULL, `Others` int(6) DEFAULT NULL ) After creating all the tables, we have to insert records into these tables. After inserting records the table will look like this: The Demographics table: The GoogleTrends table: The Votes table: Performing Queries: After creating the database as well as creating tables and inserting records into them, we have to perform queries on these tables: Problem#1: Write a SQL query to calculate total population, average percentage of white, black, and Asian population, average income per capita, and average median rent, by states. Sort by total population in descending order. Problem#2: Write a SQL query to calculate total votes for Democrats, Republican, and Others by states in 2016 Presidential Election. Problem#3: Write a SQL query to calculate total population, average percentage of white, black, and Asian population, average income per capita, and average median rent in counties where the votes for Republican were more than those for Democrats. Problem#4: Write a SQL query to calculate total population, average percentage of white, black, and Asian population, average income per capita, and average median rent in counties where the votes for Democrats were more than those for Republican. Problem#5: Write a SQL query to count the number of counties by states where the votes for Republican were more than those for Democrats. Sort by the number of counties in descending order. Problem#6: Write a SQL query to select the county with the most votes for Republican in each state. Problem#7: Write a SQL query to calculate total votes for Democrats, Republican, and Others, along with average Google search volume for the candidates, by states. Problem#8: Write a SQL query to calculate total votes for Democrats, Republican, and Others in counties where people searched Donald Trump more than the average search volume on Google. Problem#9: Write a SQL query to calculate total votes for Democrats, Republican, and Others in counties where people searched Donald Trump less than the average search volume on Google. Problem#10: Based on your analyses with SQL, write your opinion on the role of data in politics. To get solution for the above queries you can contact us on contact@codersarts.com
Adaptive threshold, Threshold, Canny Image Filter In Machine learning OpenCV | Codersarts
Adaptive Threshold Filter It is used to solve the problem of a simple threshold were the problem for the different region which has different lighting values. Basically, adaptive thresholding is used to calculate the threshold value of smaller regions, it will be different threshold values for different regions. This method used in machine learning OpenCV to clarify the visual effects of images. In OpenCV, the adaptive threshold method is used is: adaptiveThreshold() of the Imgproc class. Syntax: adaptiveThreshold(src, dst, maxValue, adaptiveMethod, thresholdType, blockSize, C) Parameters: There are different types of parameters which is used in “adaptiveThreshold” methods are as: src: Input image array, source 8-bit single-channel image dst: destination image of the same size and the same type as src. maxValue: Maximum value that can be assigned to a pixel. adaptiveMethod: A variable of integer the type representing the adaptive method to be used It used two types of values: ADAPTIVE_THRESH_MEAN_C ADAPTIVE_THRESH_GAUSSIAN_C threshodType: A variable of integer type representing the type of threshold to be used blockSize: A variable of the integer type representing size of the pixelneighborhood used to calculate the threshold value. C: A variable of double type representing the constant used in the both methods (subtracted from the mean or weighted mean). Threshold Image Filter This method is simple and straightforward, If pixel value is greater than a threshold value, it is assigned one value (maybe white), else it is assigned another value (may be black). For better understating, lets we suppose the threshold value is 125(out of 255), then value is 125 or under 125 can be converted to black or 0. And if the value is above 125 can be converted to white or 1. Syntax: threshold (src, dst, thresh, maxval, type) Parameters: src: Input image array, source 8-bit single-channel image dst: destination image of the same size and the same type as src. thresh: threshold value maxval: Maximum value that can be assigned to a pixel. type: thresholding type There are different types of threshold types: THRESH_BINARY THRESH_BINARY_INV THRESH_TRUNC THRESH_TOZERO THRESH_OTSU THRESH_TRIANGLE Canny Image Filter Canny Edge Detection is a popular edge detection algorithm. It was developed by John F. Canny in 1986. Canny Edge Detection is used to detect the edges in an image. It accepts a grayscale image as input and it uses a multi-stage algorithm. Canny edge detection process: It uses four steps to detect the edge: Noise Reduction - 5x5 Gaussian filter Calculating gradients - Finding Intensity Gradient of the Image No maximum suppression - upper threshold Thresholding with hysteresis - upper/lower threshold Syntax: Canny(image, edges, threshold1, threshold2) Parameters: image: A object representing the source (input image) for this operation. edges: A object representing the destination (edges) for this operation. threshold1: A variable of the type double representing the first threshold for the hysteresis procedure. threshold2: A variable of the type double representing the second threshold for the hysteresis procedure. Get instant help in any types of programming and project assignment at "coderstarts" with the affordable prices, contact us, or send your requirement: "codersarts@gmail.com" or Submit your requirement details at here
Machine Learning Assignment Help | Machine Learning Mathematical Concept Topics | Codersarts
Now a day machine learning uses all the mathematical concept so it is necessary to have a strong mathematical concept. In this blog, we will list all the important terms and concepts of mathematics which is related to machine learning. Start with probability ( Conditional Basic Marginal etc …) Formula => P(Event) = Favourable Outcomes / Total Possible Outcomes . Let's look at some. Examples: Problem: Throwing a Dice (1 time ) — Means [1,2,3,4,5,6] ie. total possible outcomes = 6. What is the probability of getting 5 on throwing a dice ? Ans : 1 / 6 . Mathematical Series and Convergence, Numerical methods for Analysis Mostly it is defined using the limit. Examples: Imagine a sequence as such: X0 = 1 X1 = 0.1 X2 = 0.01 X3 = 0.001 X4 =0.0001 ... Xn = 1/(10^n) This means that Xn = 1/(10^5) converges to 0. As in "it can get closer and closer to zero" as much as we want. Bayesian Statistics Typically, one draws on Bayesian models for one or more of a variety of reasons, such as: Having relatively few data points Having strong prior intuitions Having high levels of uncertainty Calculus Calculus is an important field in mathematics and it plays an integral role in many machine learning algorithms. Markov Process and Chains Markov chains are a fairly common, and relatively simple, way to statistically model random processes. They have been used in many different domains, ranging from text generation to financial modeling Other topics: Stochastic Models Here the list of all stochastic models: Poisson processes Random Walk and Brownian motion processes Gaussian Processes etc. Differential Equations Differential Equations are very relevant for a number of machine learning methods. Dynamic Programming and Optimization Techniques Dynamic programming works on the same lines as machine learning. It will explore each possibility and select the one which looks most probable at every step of the computation. Most of the reinforcement learning algorithms use dynamic programming. Examples can be bots that need to decide for each step which action to take further when exploring. Other being genetic algorithms, in-game theory, software agents or even algorithms for compressing and communicating data. Fourier's and Wavelengths The Fourier transform (FT) decomposes a signal into the frequencies that make it up. Mainly, the Fourier transform is represented as an indefinite integral. Contact us at here: codersarts@gmail.com Get any types of programming assignment help by codersarts expert at affordable prices.