Search Results
737 results found with an empty search
- Ethics in Generative AI: Navigating Ethical Frontiers in the Age of Generative AI
Introduction The rapid ascent of Generative AI technology has captivated the attention of diverse stakeholders, including corporate leaders, academics, and policymakers. This technology's potential to reshape the way we learn, work, and interact has ignited considerable interest. Within the business realm, Generative AI is hailed as a tool capable of revolutionizing customer interactions and fueling growth. However, as organizations eagerly embrace the promises of Generative AI, a host of ethical considerations loom large, underscoring the necessity of a measured and responsible approach. Generative AI's Impact and Ethical Considerations Generative AI's emergence has fundamentally transformed machines' capabilities, enabling them to create novel content across various forms, leveraging existing materials like text, visuals, and sound. This technology's impact spans multiple industries, from media to healthcare and education. Nevertheless, this technological transformation comes with a set of ethical challenges that demand careful consideration and responsible deployment. Ethical Challenges: Spreading Harmful Content and Copyright Issues The efficiency-enhancing capabilities of Generative AI systems also present a double-edged sword. These systems, while creating content at unprecedented rates, can inadvertently produce harmful or objectionable material, including deepfakes that propagate misinformation and hate speech. This underlines the urgency of ensuring responsible usage. In addition, the very datasets that these systems learn from pose potential risks in terms of copyright infringement and intellectual property rights. The use of data to educate Generative AI models may inadvertently cross legal boundaries, giving rise to legal and reputational liabilities for organizations embracing this technology. Ethical Implications: Data Privacy and Unintended Disclosure The datasets that fuel the learning process of Generative AI models are often brimming with sensitive information. This situation presents a pressing ethical concern – the potential for data privacy violations and misuse of personal information. As organizations delve into the capabilities of this technology, safeguarding personal data becomes paramount to maintain public trust. Furthermore, the natural curiosity surrounding AI tools might inadvertently lead to the unintended disclosure of sensitive information. This scenario poses a significant threat to an organization's financial stability, reputation, and legal standing. Navigating Biases and Workforce Dynamics Generative AI's prowess in creating content is not devoid of challenges. These systems can inadvertently inherit biases present in their training data, potentially amplifying pre-existing societal inequalities. As organizations leverage these technologies, they must actively address and rectify these biases to ensure fairness and inclusivity. Another aspect of concern relates to workforce dynamics. While Generative AI can undoubtedly elevate productivity, it also introduces concerns about job displacement. Organizations must commit to proactive upskilling and reskilling initiatives to address potential shifts in the job landscape. Guidelines for Ethical Application In the pursuit of harnessing Generative AI's potential while upholding ethical standards, organizations require a comprehensive and actionable framework. Building upon established AI principles such as accuracy, safety, transparency, empowerment, and sustainability, organizations can navigate these challenges. As Generative AI evolves, organizations must rely on reliable data sources, curate datasets to eliminate bias and inaccuracies, and maintain human oversight for robust evaluation. Ongoing testing and feedback loops ensure that performance remains accurate and unbiased, minimizing potential risks. Transparency and Addressing Misinformation Ethical best practices form the cornerstone of responsible AI integration. Staying informed and proactive, regardless of one's role, is essential. Familiarizing oneself with global AI ethics guidelines that prioritize principles such as human rights, diversity, privacy, transparency, and fairness sets the stage for responsible AI use. Engaging with ethical AI communities fosters a collaborative approach to addressing the challenges posed by Generative AI. Fostering awareness and advocating for critical thinking when consuming AI-generated content can help curb the propagation of misinformation. Conclusion As Generative AI continues to reshape industries and redefine possibilities, ethical considerations stand as a crucial cornerstone. Upholding principles of accuracy, safety, transparency, and fairness ensures that this transformative technology contributes positively to employees, customers, and society at large. In a landscape characterized by rapid technological advancements, a steadfast commitment to ethical guidelines becomes the lodestar, guiding organizational decisions and actions toward responsible and impactful Generative AI utilization. Dreaming of an AI-driven transformation? Engage with Codersarts AI today and let's co-create the future of tech, one prototype at a time.
- LangChain: Bridging LLMs and Data for AI Advancements
Large language models (LLMs) are a type of artificial intelligence (AI) models that can generate text, translate languages, and answer questions in an informative way. However, LLMs can be limited by the amount of data they have been trained on. LangChain is a framework that bridges the gap between LLMs and data. LangChain allows LLMs to access and process data from a variety of sources, including text, images, and audio. This allows LLMs to learn more about the world and improve their performance on a variety of tasks. In this blog post, we will discuss how LangChain can be used to advance AI. We will also discuss some of the challenges and limitations of LangChain. LangChain Framework: Seamlessly Integrating LLMs and External Data LangChain serves as an open source framework that empowers AI developers to seamlessly integrate Large Language Models (LLMs) like GPT-4 with external data. The framework provides convenient Python or JavaScript (TypeScript) packages for implementation, offering a flexible approach to developers. Addressing Outdated Data Limitations with LangChain GPT models trained up to 2021 come with inherent limitations due to outdated data. LangChain directly addresses this challenge by establishing a connection between LLMs and custom data and computations. This connection enables LLMs to access the latest and most relevant information from sources like reports and documents. Enabling LLMs to Utilize External Data A standout feature of LangChain is its capability to enable LLMs to draw upon external databases. By doing so, the framework enhances the responses generated by LLMs, incorporating valuable insights from external data sources. This feature has gained prominence, particularly after the release of GPT-4, as it complements the capabilities of powerful LLMs. Simplifying this process, LangChain breaks down data into manageable "chunks," storing them within a Vector Store to optimize efficiency. Utilizing Vectorized Representations for Accurate Responses At the core of LangChain's mechanism is the utilization of vectorized representations of documents. This empowers LLMs to generate accurate responses that are grounded in relevant data extracted from the Vector Store. Beyond its integration with LLMs, LangChain extends its functionality to enable the creation of applications capable of diverse tasks, ranging from web browsing to sending emails and interfacing with APIs. Components of LangChain Framework The architecture of LangChain consists of key components, including Models (LLM Wrappers), Prompts, Chains, Embeddings and Vector Stores, and Agents. These components are intricately woven together to form a cohesive framework. Developers engaging with LangChain can set up the environment, initialize models, use dynamic PromptTemplates, establish Chains to harmonize LLMs and prompts, utilize Embeddings and Vector Stores for personalized data, and create self-sufficient Agents for sequential task completion. Expansive Applications of LangChain The potential applications of LangChain span a wide range of AI-powered scenarios, including AI-driven email assistants, collaborative study companions, data analysis tools, customer service chatbots, and more. In summary, LangChain emerges as a robust framework that bridges the gap between LLMs and external data, ushering in a new realm of versatile AI applications. Its well-defined components and capabilities pave the way for innovation within the AI landscape. Key Takeaways from LangChain The comprehensive guide delves into the practical aspects of LangChain's implementation: LangChain's Purpose: The framework facilitates LLM integration into software applications and data pipelines beyond chat interfaces. Prompt Templates: LangChain addresses repetitive prompts through dynamic templates. Structured Responses: Output parser tools are provided to handle structured response formats. Seamless LLM Switching: LangChain simplifies transitions between different LLMs. Addressing LLM Memory Limitations: The framework addresses memory limitations by feeding past messages to LLMs. Streamlining Pipeline Integration: Tools like chains and agents streamline complex pipeline integration. Data Passage to LLMs: LangChain introduces techniques for effective data passage to LLMs. Language Support: The framework supports both JavaScript and Python, catering to different application needs. Diverse Use Cases: LangChain's versatility spans querying datasets, API interaction, and context-rich chatbots. Endless Potential: Beyond covered use cases, LangChain's potential extends to personal assistants and more. In essence, LangChain provides a powerful solution for integrating LLM capabilities with external data, opening doors to innovation in various AI applications. Dreaming of an AI-driven transformation? Engage with Codersarts AI today and let's co-create the future of tech, one prototype at a time.
- AI as a Service Platform for AI Prototyping
Artificial Intelligence (AI) is undeniably a dominating force in the modern tech landscape. However, creating, developing, and integrating AI solutions from scratch can be resource-intensive and challenging. Enter AI as a Service (AIaaS) – a game-changing approach that simplifies AI adoption. Let's explore the world of AIaaS platforms, focusing on AI prototyping, and how they're shaping the future of tech development. AI as a service (AIaaS) platforms offer a solution to this problem. AIaaS platforms provide businesses with access to AI technologies on a pay-as-you-go basis, which makes it easier and more affordable for businesses to get started with AI. One of the benefits of using an AIaaS platform for AI prototyping is that it allows businesses to test out different AI technologies without having to invest in the upfront costs of developing their own AI solutions. This can help businesses to identify the right AI technologies for their needs and to avoid making costly mistakes. Another benefit of using an AIaaS platform for AI prototyping is that it allows businesses to get started with AI quickly. AIaaS platforms typically offer a wide range of pre-built AI models and algorithms that businesses can use to build their prototypes. This can save businesses a significant amount of time and effort in the development process. What is AI as a Service (AIaaS)? AIaaS refers to third-party offerings of AI-driven solutions that allow businesses to experiment with AI tools without huge upfront investments. Whether it's machine learning models, chatbots, or vision-based systems, AIaaS delivers these capabilities as accessible services. The Rise of AI Prototyping on AIaaS Platforms Prototyping is a critical phase where theoretical AI concepts transform into tangible models. Here's why AIaaS platforms are becoming popular for AI prototyping: Cost-Efficient: Organizations can test and develop AI prototypes without setting up the entire infrastructure. Flexibility: With an array of tools and frameworks, businesses can choose what suits them best. Scalability: As your prototype develops, AIaaS platforms can scale resources accordingly. Speed: With pre-built tools and libraries, the prototyping phase is accelerated. Applications and Use Cases Chatbots and Virtual Assistants: Businesses can prototype AI-driven customer service tools tailored to their needs. Predictive Analysis: For sectors like finance or healthcare, AIaaS platforms allow for the rapid prototyping of predictive models. Image and Voice Recognition: Media and entertainment industries can quickly develop AI tools for content categorization and analysis. Codersarts AI: Your Partner in AIaaS Prototyping Creating a perfect AI prototype is both art and science, and at Codersarts AI, we master both. As pioneers in AI solutions, we're proud to offer AIaaS for businesses, startups, and innovators worldwide. Our AIaaS platform is designed to simplify and amplify your AI prototyping endeavors. From initial brainstorming to the final model testing, our seasoned experts are with you at every step, ensuring your AI prototype is robust, innovative, and business-ready. Why choose Codersarts AI for your AIaaS needs? Bespoke Solutions: Our AIaaS offerings are tailored to your unique needs and challenges. End-to-End Support: From ideation to execution, we're your dedicated AI partner. Cost-Efficient: Harness the power of AI without the heavy financial lift. Conclusion AI as a Service platforms have democratized access to AI, enabling businesses of all sizes to experiment, innovate, and thrive. And with AIaaS for prototyping, the journey from concept to execution has never been smoother. Dreaming of an AI-driven transformation? Engage with Codersarts AI today and let's co-create the future of tech, one prototype at a time.
- AI Proof of Concept(PoC) Project
In the contemporary world of technology, the transformative potential of Artificial Intelligence (AI) cannot be overstated. From automating mundane tasks to performing complex data analysis, AI has permeated almost every facet of business and personal life. However, before investing heavily in full-fledged AI solutions, organizations often seek assurance. This assurance comes in the form of a 'Proof of Concept' or PoC. Let's delve into understanding what an AI PoC is and why it's pivotal. What is an AI Proof of Concept (PoC)? In simple terms, an AI PoC is a small-scale, practical experiment that demonstrates the feasibility of an AI concept. It's a tangible demonstration that an AI idea is viable and can be executed in the real world with reasonable effort. The primary aim? To validate that the proposed AI solution addresses a specific business need or problem effectively. further more, An AI Proof of Concept (PoC) project is a small-scale project that is used to test the feasibility of using AI for a specific purpose. PoC projects are often used by businesses to explore the potential of AI for their business before they commit to a larger-scale project. There are many different types of AI PoC projects, but they typically involve the following steps: Define the problem. The first step is to define the problem that you want to solve with AI. What are you trying to achieve? What are the specific goals of the project? Research the technology. Once you have defined the problem, you need to research the AI technologies that could be used to solve it. There are many different AI technologies available, so it is important to choose the right ones for your project. Build a prototype. The next step is to build a prototype of your AI solution. This is a small-scale version of the final product that you want to create. The prototype will allow you to test the feasibility of your solution and to identify any problems that need to be addressed. Test the prototype. Once you have built a prototype, you need to test it to see if it works as expected. You should test the prototype with a variety of data sets and scenarios to make sure that it is robust. Evaluate the results. The final step is to evaluate the results of the PoC project. Did the prototype solve the problem that you were trying to solve? Were there any unexpected problems? What are the next steps? Why is PoC Crucial? Risk Mitigation: Before substantial investments, businesses can gauge the feasibility and applicability of the AI solution. Cost Savings: By pinpointing potential challenges early on, organizations can make informed decisions, saving resources and funds. Stakeholder Buy-in: Demonstrable results from PoCs can help in gaining the trust and approval of stakeholders. Refining the Idea: PoCs allow for iterative testing and refining, ensuring the final product is robust and optimized. Steps to Execute an AI PoC Define the Scope: Clearly outline what you aim to achieve with the PoC. Is it to improve an existing process or to test a novel AI idea? Gather Data: AI thrives on data. Ensure you have relevant, clean data that aligns with your project's goals. Develop & Test: Using AI algorithms, create a small-scale model and test it rigorously. Evaluate Results: After testing, assess the results against predefined metrics and KPIs. Present Findings: Share the outcomes with stakeholders, focusing on tangible benefits and improvements. How can Codersarts help? Codersarts is a company that specializes in building AI PoC projects. We have a team of experienced AI engineers who can help you to define the problem, research the technology, build the prototype, and test the results. We also offer a variety of other services, such as AI consulting and AI training. If you are interested in learning more about AI PoC projects or if you would like to work with Codersarts to build a PoC project for your business, please contact us today. We would be happy to discuss your needs and to help you to achieve your goals. Here are some of the benefits of using AI PoC projects: They can help you to explore the potential of AI for your business. They can help you to identify the right AI technologies for your needs. They can help you to test the feasibility of your AI solution. They can help you to identify any problems that need to be addressed before you commit to a larger-scale project. Overall, AI PoC projects can be a valuable tool for businesses that are considering using AI. They can help you to explore the potential of AI, identify the right AI technologies, and test the feasibility of your AI solution. However, it is important to keep in mind that PoC projects can be expensive and time-consuming, so it is important to weigh the benefits and risks before you start a project. Codersarts: Your Partner in AI PoC Development Embarking on the AI journey can seem daunting, but you don't have to do it alone. At Codersarts, we pride ourselves on assisting organizations in realizing their AI visions. Our seasoned experts can guide you through the entire PoC process, ensuring your concepts aren't just theoretically brilliant, but practically transformative. Whether you're a startup experimenting with an innovative AI idea or an established entity looking to augment your processes, Codersarts is poised to be your collaborator. We don't just build; we co-create, ensuring your AI solutions are bespoke, efficient, and impactful. In the vast universe of AI, a Proof of Concept acts as a guiding star, ensuring organizations are on the right path. It's the bridge between AI theories and actionable solutions. And with partners like Codersarts, the journey becomes not just simpler, but also more promising. Thinking about an AI PoC? Reach out to Codersarts and let's shape the future, one concept at a time.
- Getting started with Tableau
Tableau is a very powerful and fast growing data visualization tool used in many industries. It helps in simplifying raw data into a visual format which is very easy to understand at the first glance. Tableau helps to create stories describing the data that can be understood by professionals at any level in an organization. The great thing about Tableau is that it doesn't require one to be from a technical background to work with it. It allows non-technical users to create customized dashboards. Thus this tool has garnered interest among people from all sectors such as business, researchers, different industries, etc. What makes Tableau stand out from the rest is that data analysis is very fast with it and the visualizations created are in the form of dashboards and worksheets. Let's introduce you all to this wonderful tool. If you don't already have Tableau installed, please download it by referring to this blog. The first screen that appears when you open Tableau is it's start screen. On the left hand side of the start screen you get options to load data in various formats such as Excel, Text, JSON, Pdf files etc. or you can load data directly from a server as well. Here, you can also find dataset that comes with Tableau so that you can explore various functions of the tool. On the right hand side there are several tutorial videos to help you get started with this tool. Loading Dataset In this tutorial we will work with the superstore dataset. Although it is already available in Tableau please download it from below here since some changes have been made to it for the purpose of this tutorial. NOTE: Please get familiar with the dataset. All the column names are self-explanatory. Our dataset is an Excel file therefore on the start screen we will click on 'Microsoft Excel' , in the dialogue box that pops up go to the directory containing the dataset file. Once you open the dataset, another page appears that let's you know that the data set is now connected. Here you can also see the name of sheets ('Orders', 'People', and 'Returns') available in this file on the left hand side under a tab named 'Sheets'. You can also preview the data before getting things started, by dragging the sheets to the 'Drag sheets here' space on the screen. You can also perform various join and sort operations on the data according to your needs in this space and preview it. NOTE: We won't be performing join on our data in this tutorial, since we don't need it. Even at this point you can add more data by clicking on the 'add' button near the 'Connections' tab. Congratulations! we have reached our first milestone. Worksheets Now, that we have loaded and viewed our data, we have reached the point where we can mold it to get some information out of it. The work space in Tableau is called 'Worksheet'. In order to navigate to the worksheet you need to click on the 'Sheet 1' tab on the bottom left corner of the screen. (And just so you know, you can also change the name of the sheet by right clicking on it and then selecting the 'Rename' option, the same way as in Microsoft Excel.) The above picture names various sections of the worksheet. The topmost bar displays the name of the workbook. A workbook contains sheets. A sheet can be a worksheet, a dashboard, or a story. From the Toolbar we can access commands and analysis and navigation tools In the Cards and Shelves workspace area we drag fields to add data to a view. The View area is the canvas in the workspace where we can create a visualization (also referred to as a "viz"). On the side bar there are two panes, namely, Data pane and Analytics pane. In the Data pane there are two section: dimensions and measures, which groups together qualitative and quantitative fields in the dataset respectively. Also, the different icons in green and blue corresponds to different data types. Even the colors of the icon are significant, blue colored icons tells us that the data in the field is discrete where as green colored icons tells that the data is continuous in nature. Sheet tabs represents each sheet in a workbook. This can include worksheets, dashboards, and stories. Status bar displays information about the current view. You now have all the basic information and are just one step away from creating visuals. We have just passed our second milestone! Creating Charts We will create a sales trend chart using the superstore data. One thing to note is that, the x and y axes are represented by columns and rows in Tableau workspace. To create a sales trend chart we want the Dates on the x-axis and Sales on the y-axis. Thus, we will drag the Dates field from the Data pane to the columns shelve and Sales to the rows shelve. So, we just created a line chart. But, is that it? Well, yes! it's pretty easy right? But wait. We are not done yet. In the above picture we can see that on dragging the required fields into the columns and rows shelves, an yearly sales trend chart was made. But what if we want to dig deeper to get more insight from the data? To do that, we will click on the '+' sign before the YEAR field in the columns shelve. This makes the chart appear like this: Clicking on the '+' sign transforms the chart into a quarterly sales trend per year. Let's click on the '+' sign before the QUARTER field in the columns shelve. Well, this gives us a more detailed monthly sales trend chart. But there are several discontinuities in the chart. Why is that? Any guesses? (HINT: the colors are telling you something!) I hope you guessed it right. The data fields are blue in color which tells that these fields are discrete in nature, thus the presence of discontinuity in the chart. To resolve it, we will first click on the '-' (minus) sign before the YEAR and QUARTER fields which will bring us back to square one. Then, we will click on the downward facing arrow on the right side of the YEAR field. This will open a dropdown list with several options. We will select the "YEAR" option in the continuous variables section to make the data in this field to be treated as continuous. Doing this makes the field continuous and thus the color of the YEAR field in the columns shelve changes from blue to green. Now if we click on the '+' sign again as we did above, we will get the following visuals. One more way to bring the same change is shown below, where we change the 'Order date' field in the Data pane from discrete to continuous variable before dragging it to the column shelve. We can change the color , size, label, style of the chart using the Marks card at the left of the View. We can also change the type of chart using the 'Show me' tab in the top right corner of the worksheet. We will explore these charts in the coming blogs. One last lesson before we part ways: To make chart reading easier and more accurate, Tableau offers 'Tooltip'. If you hover the pointer over the line in the chart, it will show you the value of the x and y axes. A nice and detailed chart is a click away. You have achieved your goal, now you can rest and rejuvenate to walk yourself through another blog. In the upcoming blogs we will explore more of Tableau, so keep an eye out for it. Until then, If you need implementation for any of the topics mentioned above or assignment help on any of its variants, feel free to contact us.
- Tableau
Tableau is one of the best and powerful visualization tools these days. It is widely used in the field of Business Intelligence but it is also utilized in other sectors as well such as research, statistics, various industries etc. It simplifies raw data into understandable format. We can also manipulate data in it to get desired results. It is very easy to use and doesn't require any technical background to work with. The visualizations are created in the form of Dashboards and Stories. Tableau can be majorly classified into two sections: Developers tool: This consists of the tools that allows us to do the actual work i.e. create dashboards, reports, charts, stories and other visualizations. The products under this category are Tableau Desktop and Tableau Public. Sharing tools: The name is self-explanatory; it helps in sharing the visualizations created using developers’ tools. The products under this category are Tableau Online, Tableau Server, and Tableau Reader. All in all, there are Five products in Tableau: Tableau Desktop, Tableau Public, Tableau Online, Tableau Server, and Tableau Reader. Tableau Desktop: As mentioned earlier, this is where all the major work is done. It has a lot of features that lets you create visualizations very easily. It provides connectivity to Data Warehouse and other file types such as excel, text, json, PDF etc. The visualization can be stored locally or publicly. Tableau Desktop can be further classified in two parts: Tableau Desktop private: where the workbooks are kept private, and the access is limited. The workbooks cannot be published online, it can only be distributed either Offline or in Tableau Public. Tableau Desktop Professional: Here the only major difference is that the workbooks can be published online and there is full access to all the features. Tableau Public: It is the same as Tableau Desktop, but it is a public version i.e., it is free but the workbooks created cannot be saved locally. they can only be uploaded to Tableau's Public cloud where it can be seen and accessed by everyone. There is no privacy offered in this version. It is best for an individual who wants to learn working with Tableau. Tableau Server: It is essentially used to share the workbooks across the organization. The work needs to be published in Tableau Desktop first to be able to upload it on the server. Once uploaded, anyone with a license can view the work. Though it isn't necessary for the licensed user to have Tableau Server installed. If a person has valid login credentials, then he/she can view the work on a web browser. The admin of the organization will always have full control over the server. Tableau Online: It is an online sharing tool of Tableau. Its functionalities are similar to Tableau Server, but the data is stored on servers hosted in the cloud which are maintained by the Tableau group. There is no storage limit on the data that can be published. It creates a direct link to over 40 data sources that are hosted in the cloud such as the MySQL, Hive, Amazon Aurora, Spark SQL and many more. To publish, both Tableau Online and Server require the workbooks created by Tableau Desktop. Data that is streamed from the web applications say Google Analytics, Salesforce.com are also supported by Tableau Server and Tableau Online. Tableau Reader: It is tool used to view workbooks created using Tableau Developer tools. It doesn't allow editing and modification in the workbook. Anyone having the workbook can view it using Tableau reader. In fact, if you want to share the dashboards created by you then the receiver needs Tableau Reader to be installed. Tableau has the ability to connect to any platform to extract data. Simple databases such as excel, PDF; and complex databases like Oracle, a database in the cloud such as Amazon web services, Microsoft Azure SQL database, Google Cloud SQL and various other data sources can be extracted by Tableau. Tableau Uses Following are the main uses and applications of Tableau: Business Intelligence Data Visualization Data Collaboration Data Blending Real-time data analysis Query translation into visualization To import large size of data To create no-code data queries To manage large size metadata Download and Installation: Tableau Public: Step 1: Go to https://public.tableau.com/en-us/s/download. Enter your email id and click on "DOWNLOAD THE APP" button. Step 2: The .exe file for Windows will start downloading, and you will be able to see the downloading progress in the bottom left corner of the website. Step 3: Open the downloaded file. Accept the terms and conditions and click on "Install" button. Step 4: After installation the application would open to its home page. Tableau desktop: Step 1. Go to https://www.tableau.com/products/desktop. Step 2. Click the "TRY IT FOR FREE" button. Step 3. It will redirect you to another page where you need to enter your email id and click on "DOWNLOAD FREE TRIAL" button. Step 4. This will start downloading latest version of Tableau. An .exe file for Windows is downloaded, and you can see the downloading progress in the bottom left corner of the website. Step 5. Open the downloaded file. This will open the setup wizard. Accept the terms and conditions by checking the box and click on "Install" button. Step 6. A pop-up message would open asking for the approval of Administrator to install the software. Approve it and the installation of the Tableau Desktop on Windows system will start. Step 7. Once the installation is completed, the Tableau Desktop application would open. Step 8. A registration window will appear: Click on Activate Tableau and enter your license details, if you do not have a license, enter your credentials. Click on Start Trial and wait for registration to complete. Step 9. Wait for registration to complete. You are ready to use Tableau Desktop. Sample Visualizations: This section is going to be a treat to the eyes. We can create Dashboards to tell stories using the data at hand. Let's take a look at some Dashboards telling different stories. We will see the versatility of Tableau and the amazing visuals it offers. A dashboard showing sales of audiobooks. A dashboard showing sales of a super store. We can also make interactive and animated dashboards. Here are some dashboards exhibiting such properties which makes story-telling using data more interesting and eye-catching. Some more amazing dashboards! You may like these blogs as well: Data visualization tools Exploring data visualization using matplotlib and seaborn If you need implementation for any of the topics mentioned above or assignment help on any of its variants, feel free to contact us.
- Predictive Analysis - Health Risk Assessment Using Deep Learning
In this blog we will discuss Health risk assessment using Deep Learning Algorithms. What is Predictive analysis ? Predictive analytics helps connect the data to effective action by drawing a reliable conclusion which a data analyst can predict the future based on the current and previous data. This term is mainly used for analytics and statistical techniques. Health Risk Assessment Deep Learning The healthcare sector is lacking in actionable knowledge. This industry faces challenges in essential areas like electronic record management, data integration, and computer-aided diagnoses and disease predictions. It needs to reduce the healthcare cost and healthcare movements. Rapidly expanding the field of predictive analytics and deep learning play a pivotal role in the evolution of large volumes of healthcare data research. Deep learning provides a wide range of techniques, tools and framework to address these challenges. Nowadays Health data is expanding rapidly in various formats. This health data offers more opportunities for health data analysis and enhancement of health services by innovative approaches. Predictive analytics helps healthcare life sciences and providers and applies many techniques from statistics, data mining, modeling, machine learning, and artificial intelligence to investigate current findings to make predictions about the future. It helps healthcare organizations to prepare for health care by optimizing the cost, diagnosing the diseases accurately, enhancement of patient care, resource optimization and improves clinical outcomes The concept of deep learning is to dig a large volume of data to automatically identify patterns and extract features from complex unsupervised data without the involvement of humans, which makes it an important tool in big data analysis. Deep learning plays an important role in diagnostic applications. Deep learning techniques can reveal clinically relevant information hidden in the large data with a guidance of relevant clinical questions to assist clinical decision-making and in turn provides the physicians the analysis of any disease accurately for better treatment, thus resulting in better medical decisions. Predictive analytics using Deep Learning Health risk assessment predictive analytics aims to predict the health related outcomes based on clinical or non clinical patterns in the data. There are two methods to build the predictive model First, The collection of patient data in clinical trials with a set of predefined protocols. For example, Lung cancer risk prediction model, UK prospective diabetes study (UKPDS), Heart disease Prediction etc. Second, The use of existing patient data collected in clinical practice, such as EHRs, insurance claims, and clinical registries. For instance, the inpatient mortality predictive model. Predictive models capture the characteristics of the specific event. The UKPDS risk engine can predict coronary heart disease and stroke in patients with type 2 diabetes. Deep learning is widely used for medical imaging analysis in several different application domains. Medical imaging techniques such as MRI scans, CT scans, ECG, used to diagnose dreadful diseases such as heart disease, cancer, brain tumor. Hence, with the help of deep learning, the doctors can analyze the disease better and provide patients with the best treatment. In addition, deep learning is used to analyze medical insurance fraud claims. Moreover, deep learning helps the insurance industry to send out discounts and offers to their target patients. Deep learning technique used to detect Alzheimer’s disease at an early stage in which the medical industry faces the challenges currently. Deep learning techniques are used to understand a genome and help patients get an idea about the disease that might affect them, which has a promising future also. Deep Learning Framework Deep learning combines advances in computing power and neural networks with many layers to learn complicated patterns in large amounts of data. It is an extension of the classical neural network and uses more hidden layers so that the algorithms can handle complex data with various structures. Deep learning collects a large volume of data, including patients records, medical reports, and insurance records, and applies its neural networks to provide the best outcomes. Therefore, it is important to involve a deep learning role to resolve healthcare issues due to its representational and recognition supremacy that assists healthcare personnel to determine, predict, analyze, and practice its theories for the delivery of healthcare. Deep Learning Models Feature engineering is the main difference between traditional machine learning algorithms and deep learning algorithms. It required domain expertise and a time consuming process. Deep learning involves automatic feature engineering. Traditional machine learning algorithm Deep learning algorithm Convolutional neural network (CNN) model is the most commonly used deep learning algorithm. CNNs are composed of neurons that have learnable weights and biases. Each neuron receives inputs and creates a dot product. The complete network expresses a single differentiated function that scores the input aligned data of health attributes in accordance with the classes of health risks. CNNs have been proven to be more efficient in training inputs with a restricted number of parameters and hidden units. CNN can achieve local connections and tied weights efficiently by pooling translation invariant features. This specialty is adaptive to our design, as the input health data has been normalized and the output health risks have been predefined in certain classes We developed a diabetes Risk Prediction model Using Deep learning Algorithms. This Work is used to predict the diabetes in a patient. The dataset used here is the Pima indians diabetes. The dataset consists of 768 entries having 9 features. Pregnancies - Number of times pregnant Glucose - Plasma glucose glucose concentration a 2 hours in an oral glucose tolerance test Blood Pressure - Diastolic Blood Pressure (mm hg) Skin Thickness - Triceps skinfold thickness (mm) Insulin - 2 Hours serum insulin (mu U/ml) BMI - Body mass index (weight in kg/(height in m)^2 Diabetes Pedigree Function - Diabetes Pedigree Function Age - Age (years) Outcomes - class variable (0 or 1) 268 of 768 are 1 the others are 0(1 means the patient is diabetic and 0 means the patient is non diabetic) Sample Of Dataset : Here we can see the Training and validation accuracy and training and validation loss of diabetic Risk assessment model. Thank You
- SQLContext and HiveContext operations Using Pysparks
In this article we will see how to perform SQL & Hive operations to Analyze data in pyspark. As we know, Pyspark is the Python API for Spark. Spark SQL is a framework which runs on spark. Spark SQL is a Spark module for structured data processing and the use of Spark SQL is to execute SQL queries either SQL or HiveQL. SQLContext allows us to connect the different data sources to write and read the data. Spark SQL also reads and writes data which is stored in hive. Hive is used for Handling Big-Data. We can store multiple tables records into a single table using hiveQL. In this blog we also see how multiple table records are stored into a single table.. SQLContext is the entry point of all relational functionality in spark. Code Snippet : from pyspark.sql import SQLContext,SparkContext,HiveContext sc = SparkSession.builder.appName(“SQl_Hive”).getOrCreate() sqlContext = SQLContext(sc) Now let's see how to load the data and read data using SQLContext. Here it is as shown below In the First Line we are reading the data using the sqlcontext. In the Second line we Register the data frame as a Table. Now we can read the data with the help of SQL queries. Code Snippet : # Create the DataFrame df_employee = sqlcontext.read.csv("WA_Fn-UseC_-HR-Employee-Attrition.csv",inferSchema=True,header=True) df_employee.registerTempTable("df_employee") Using the following command we can read data from the table. Code snippet : #Read all the data from dataset sqlcontext.sql("SELECT * FROM df_new").show() As we can see using SQLContext we can read the data and perform various SQL operations to extract the information from the table. We can execute any SQL queries to grab the information into the table like Join queries, aggregate functions, Sub Queries etc. Output : Let's check whose employee Monthly income is highest among all other employees from each department by performing the SQL queries on the table. Code Snippet : sqlContext.sql("SELECT EmployeeNumber,Department,MonthlyIncome FROM df_new WHERE MonthlyIncome IN (SELECT MAX(MonthlyIncome) FROM df_new GROUP BY Department);").show() As shown below we can see the output is those employees whose monthly income is highest in their department. The employee number, employee department which he belongs to and his Monthly income. Output : We can also create a hive table in pyspark using the following command named employee with the fields employee_number, Department, Job_role, Age, Gender, Monthly_income, Over_time, Marital_status. For creating the table we have to import SparkContext and HiveContext. Code Snippet : from pyspark import SparkContext,HiveContext sc = SparkContext(“appName = “Hive_operations”) hive_context = HiveContext(sc) Using the following command We are creating the table Code snippet : hive_context.sql("CREATE TABLE employee1 (Employee_Number INT, Department STRING, Job_Roll STRING, Age INT,\ Gender STRING ,Monthly_Income INT, Over_time STRING, Marital_Status STRING) row format delimited \ fields terminated BY ',' tblproperties('skip.header.line.count'='0') stored AS textfile; ") hive_context.sql("select * from employee1").show() We can see Empty table is created Output : Now we are loading the data into the table from the .csv file using a hive query and the file is present in our local system. Code Snippet : # load the data in hive table hive_context.sql("LOAD DATA LOCAL INPATH 'employee1.csv' INTO TABLE employee1") Output : Now we will see how to merge two tables into a single table. Suppose we have two tables with the same structure we want to all record in a single table using a hive query it is possible. We have two hive table employee1 and employee2 using the following command we merge two tables. We can also merge multiple tables into a single table. Code snippet : # create a new table and concatenate data from employee1 and employee2 table hive_context.sql("create table employee as (select * from employee1 union all select * from employee2)") Here we can see all records from the employee1 and employee2 tables are merged into a single table. Output : Conclusion : In this article we are seen How to perform the some basic operation using SQLContext and HiveContext using Pyspark. Thank you
- Logistic Regression With Pyspark
In statistics, logistic regression is a predictive analysis that is used to describe data. It is used to find the relationship between one dependent column and one or more independent columns. Dependent column means that we have to predict and an independent column means that we are used for the prediction. Before building the logistic regression model we will discuss logistic regression, after that we will see how to apply Logistic Regression Classification on datasets using Pyspark. Logistic regression Logistic regression is the machine is one of the supervised machine learning algorithms which is used for classification to predict the discrete value outcomes. It uses the statistical approach to predict the outcomes of dependent variables based on the observation given in the dataset. There are three types of Logistic regression Binomial Logistic Regression Multinomial Logistic Regression Ordinal Logistic Regression Advantages of Logistic regression : It is simple and easy to implement machine learning algorithms yet provide great training efficiency in some cases. Due to this reason it does not require high computational power. This algorithm is proven to be very efficient when the dataset has features that are linearly separable. This algorithm allows models to be updated easily to reflect new data, ulike decision trees or support vector machines. The update can be done using stochastic gradient descent. Its outputs well-calibrated Probabilities along with classification results. Disadvantages It can't solve nonlinear problems with logistic regression since it has a linear decision surface. Logistic Regression is a statistical analysis model that attempts to predict precise probabilistic outcomes based on independent features. On high dimensional datasets, this may lead to the model being over-fit on the training set, which means overstating the accuracy of predictions on the training set and thus the model may not be able to predict accurate results on the test set. This usually happens in the case when the model is trained on little training data with lots of features. Now here we are going build the Logistic regression model on the dataset using Pyspark Why PySpark ? Spark is much faster. Spark is multi-threaded. It means two or more executions run concurrently. Whereas pandas are single threaded. Spark will only execute when you take Action. So Lets Start.. Steps : - 1. Import some important libraries and create the SparkSession. SparkSession is the entry point of the program. Load the dataset search_engine.csv using pyspark. Code snippet : import findspark findspark.init() #import SparkSession import pyspark from pyspark.sql import SparkSession spark=SparkSession.builder.appName('Logistic_Regression').getOrCreate() #Read the dataset df=spark.read.csv('search_engine.csv',inferSchema=True,header=True) After loading the data when you run the code you will get the following result. Output : PrintSchema : It displays the structure of data. Calculate Statistical data like Count, Average, Standard deviation, Minimum value, Maximum value for each column ( Exploratory Data analysis). Code snippet : #statistical Data Analysis df.describe().show() Calculate total number of countries, platforms and status are present in datasets. Code snippet : #count the country present in the datasets df.groupBy('Country').count().show() #count the search engine present in the datasets df.groupBy('Platform').count().show() # Count the status df.groupBy('Status').count().show() Output : Lets see the visualization data. Code Snippet : import matplotlib.pyplot as plt import seaborn as sns df11=df.toPandas() sns.set_style('whitegrid') sns.countplot(x='Country',hue='Platform',data=df11) Output : Categorical Data cannot deal with machine learning algorithms so we need to convert into numerical data. We use StringIndexer to encode a column of string categories to a column of indices and The ordering of the indices is done on the basis of popularity and the range. when you convert the column into numbers you will get the following result. Code snippet : #import required libraries from pyspark.ml.feature import StringIndexer # Convert the platform columns to numerical search_engine_indexer = StringIndexer(inputCol="Platform", outputCol="Platform_Num").fit(df) df = search_engine_indexer.transform(df) #Dsiplay the categorial column and numerical column df.select(['Platform','Platform_Num']).show(10,False) df.select(['Country','Country_Num']).show(10,False) Output : Sometimes in a dataset, columns are found that do not have a specific number of preferences. The data in the column is usually shown by category or value of category and even when the data label in the column is encoded. So Now we are using OneHotEncoder to split the column which contains numerical data. when you split the column by using OneHotEncoder you will get the following result. We can see the platform column into the search_engine_vector column. Code snippet : #import library onehotencoder from pyspark.ml.feature import OneHotEncoder #one hot encoding search_engine_encoder = OneHotEncoder(inputCol="Platform_Num", outputCol="Search_Engine_Vector").fit(df) df = search_engine_encoder.transform(df) Output : Now we are using VectorAssembler to concatenate the multiple columns into a vector column. It will combine all the features of multiple columns in one column. After applying the VectorAssembler we can see all the columns concatenated into feature columns. Code snippet : #import vector assembler library from pyspark.ml.feature import VectorAssembler #concatenate all the columns in vector column. df_assembler = VectorAssembler(inputCols=['Search_Engine_Vector','Country_Vector','Age', 'Repeat_Visitor','Web_pages_viewed'], outputCol="features") df = df_assembler.transform(df) Output : Now Split your data into train and test data. Normally this is 70% and 30%. This method is used to measure the accuracy of the model. apply the Logistic regression model. After applying the model you will get the following result. Status columns have original data, prediction column means it will predict the value calculated by this model and last column is the probability column. Code snippet : #split the data training_df,test_df=model_df.randomSplit([0.75,0.25]) #import the logistic regression from pyspark.ml.classification import LogisticRegression #Apply the logistic regression model log_reg=LogisticRegression(labelCol='Status').fit(training_df) #Training Results train_results=log_reg.evaluate(training_df).predictions train_results.filter(train_results['Status']==1).filter(train_results['prediction']==1).select(['Status','prediction','probability']).show(10,False) Output : Accuracy comes out to 0.9396. It obtains 93 % values that are correctly predicted by this model. That means our model is doing a great job identifying the Status. Calculate the Precision Rate for our ML model. Precision Rate comes out to 0.9389. It means 93.89% Positive Predictions are correctly predicted. Code snippet : #Calculate the matchine record out of the total records accuracy=float((true_postives+true_negatives) /(results.count())) print("Accuracy : " + str(accuracy)) recall = float(true_postives)/(true_postives + false_negatives) print("Precision Rate : " + str(recall)) Output : Thank you
- Optical Character Recognition Using Convolutional neural network
In this article you will learn about optical character recognition (OCR)? How does optical character recognition work? Let's start What is OCR? OCR stands for optical character recognition. We have plenty of information in the form of printed documents, handwritten scripts and images. OCR is the process to recognize scanned images of both handwritten and printed characters and convert it into a machine readable and digital format. There are three main aspect of OCR approach: Preprocessing Character recognition Character segmentation and presentation of data The OCR can be implemented by using convolutional neural networks. CNN is popular in deep neural network architecture. How does Optical character recognition Work? Techniques Image Processing in OCR The aim of pre-processing is to improve the quality of image data so that OCR model gives you accurate output. Mostly the OCR model gives an accurate output with 300 DPI. Image scaling refers to the resizing of a digital image. When we scanned the document sometimes the document was not properly aligned. Skewed image defines the image which is not straight. It directly impacts the line segmentation of the OCR model which reduces the accuracy rate. It may need to be tilted a few degree clockwise or counterclockwise in order to make lines of text perfectly horizontal or vertical. Character Recognition in OCR : There are two types of OCR algorithm. Matrix matching compares the image which is scanned by OCR scanner as a character with a library of character matrices or templates. When an image matches one of these matrices of dots within a given level of similarity. The computer labels the image according to that ASCII character. Feature Extraction is OCR without strict matching to prescribed templates. Also known as Intelligent Character Recognition (ICR), or Topological Feature Analysis, this method varies by how much "computer intelligence" is applied by the manufacturer. The computer looks for general features such as open areas, closed shapes, diagonal lines, line intersections, etc. This method is more versatile than matrix matching. Matrix matching works best if the OCR encounters a limited repertoire of type styles, with little or no variation within each style. Where the characters are less predictable, feature, or topographical analysis is superior. Post-Processing in OCR It is the error correction technique to ensure high accuracy of the OCR model. OCR accuracy can be increased if the output is constrained by lexicon. In this way the algorithm can make a list of words that are allowed to occur in the scanned document. Convolutional Neural Network CNNs are made of a large number of interconnected neurons that have learnable weights and biases. In CNN architecture the neurons are organized as layers. It contains the hidden layers, input layer, and output layer. When the large number of hidden layers in the network is generally said to be a deep neural network. The hidden layer neurons of CNN are connected to a small region of the input space generated from the previous layer instead of connecting to all, as in the fully connected network like Multi Layered Perceptron networks. This method reduces the number of connection weights in CNN compared to MLP. CNN takes less time to train for networks of similar size. The input to the typical CNN are two dimensional arrays of data such as images. Unlike the regular neural network the layers of a CNN are arranged in three dimensions. Basically the input layer is a buffer to hold the input and go to the next layer CNN performs the core operation of feature extraction and convolutional operation of the input data. ReLU Rectified Linear Unit is an activation function used to introduce non linearity. It replaces the negative value with zero. It can speed up the learning process. Every output of the convolutional layer is passed through the activation function. Pooling layer reduces the spatial size of each feature map, hence the computation is reduced in the network. It also uses a sliding window that moves in stride across the feature map and transforms it into representative values. Fully connected layers connect every neuron in the layer to all the neurons in the previous layer. It learns non-linear combinations of features and is used to classify or estimate output. For classification problems, the fully connected layer is followed by a soft-max layer, It produces the probability of each class for the given input. And for regression problems, it is followed by a regression layer to predict the output. Thank you
- Determine Color , Contours and Center Using OpenCv
In this article we will show you how to determine the color minor dots, contour and center in an black image background using python 3.7, Open source Computer Vision Library (OpenCV) and numpy. Advancement of Artificial intelligence computer vision came in the late 1960's. Computer vision should be able to detect the 3d object and 2d pictures. Opencv is a computer vision library. It has been written in the programming languages c and c++. It can be easily run on windows and linux and interface with python language. Computer vision is a field of artificial intelligence that trains computers to interpret and understand the visual world. Color detection using opencv, allows detection of specific color in an image. In this opencv detection system read the image, scan the object, match the color and give the result.if the Color is matched with a defined color pattern by RGB color model then the system gets the correct output as a result. Image processing is a method to perform some operations on an image, in order to get an enhanced image or to extract some useful information from it. It is a type of signal processing in which input is an image and output may be image or characteristics/features associated with that image. We are using the jupyter notebook to determine color, contour and center. Jupyter Notebook IDE: Jupyter Notebook is an open source web-based application which allows you to create and share documents containing live code, equations, visualisations, and narrative text. The IDE also includes data cleaning and transformation, numerical simulation, statistical modelling, data visualisation, and many others. Step involved in contour and color detection in Python 3.7 Lets begin with a given sample image in either .jpg or .png format and apply object detection in it. To implement this project the following packages of python 3.7 have to be downloaded and installed. Description of Library used Numpy : Numpy is the most basic yet a powerful package for mathematical and scientific computing and data manipulation in python. It is an open source library available in python. Cv2: OpenCV is a high performance library for digital image processing and computer vision, which is free and open source. Imutils : imutils are a series of convenience functions to make basic image processing functions such as translation, rotation, resizing, skeletonization, and displaying Matplotlib images easier with OpenCV. Code Snippet : import cv2 import numpy as np import imutils Procedure Read an image : First a simple image in which processing is to be applied is to be read. It's done using the opencv library. Next convert the image into hsv color model. HSV or Hue Saturation Value is used to separate image luminance from color information. This makes it easier when we are working on or need luminance of the image/frame. HSV is also used in situations where color description plays an integral role. Here, convert the image from RGB to HSV color space and then define a specific range of H-S-V values to detect color Code Snippet : img = cv2.imread('data\\left.png') hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) Normal image and HSV image : Now define a Lower color range and upper color range and convert it into numpy array Code Snippet : #Red Color lower_red = np.array([0,100,100]) upper_red = np.array([7,255,255]) # yellow color lower_yellow = np.array([25,100,100]) upper_yellow = np.array([30,255,255]) #green color lower_green = np.array([40,70,80]) upper_green = np.array([70,255,255]) #blue color lower_blue = np.array([90,60,0]) upper_blue = np.array([121,255,255]) #dark teal color lower_dark_teal = np.array([80,100,100]) upper_dark_teal = np.array([90,255,255]) #dark yellow color lower_dark_yellow = np.array([20,100,100]) upper_dark_yellow = np.array([25,255,255]) To perform the actual color detection using OpenCV, see below we use the cv2.inRange function. cv2.inRange function requires three argument images, where we are going to perform a color detection, the second lower range of the color you want to detect, and the third is the upper range you want to detect. Binary mask is returned, where white pixels (255) represent pixels that fall into the upper and lower limit range and black pixels (0) do not. Code Snippet : yellow = cv2.inRange(hsv, lower_yellow, upper_yellow) green = cv2.inRange(hsv, lower_green, upper_green) blue = cv2.inRange(hsv, lower_blue, upper_blue) red = cv2.inRange(hsv, lower_red, upper_red) dark_teal = cv2.inRange(hsv, lower_dark_teal, upper_dark_teal) dark_yellow = cv2.inRange(hsv, lower_dark_yellow, upper_dark_yellow) Now we will have to use find Contours to find the number of contours in the Image and find the center of each of them. cnts1 = cv2.findContours(red,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE) cnts1 = imutils.grab_contours(cnts1) cnts2 = cv2.findContours(yellow,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE) cnts2 = imutils.grab_contours(cnts2) cnts3 = cv2.findContours(blue,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE) cnts3 = imutils.grab_contours(cnts3) cnts4 = cv2.findContours(green,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE) cnts4 = imutils.grab_contours(cnts4) cnts5 = cv2.findContours(dark_teal,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE) cnts5 = imutils.grab_contours(cnts5) cnts6 = cv2.findContours(dark_yellow,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE) cnts6 = imutils.grab_contours(cnts6) Now Draw the contours for each blob and find the center of blobs using cv2.moments. Image Moment is a particular weighted average of image pixel intensities, with the help of which we can find some specific properties of an image, like radius, area, centroid etc. To find the centroid of the image, we generally convert it into a binary format and then find its center. Using the putText we can display the color name of each blob. Code Snippet : for c in cnts1: cv2.drawContours(img,[c],-1,(0,255,0),3) # compute the center of the contour M = cv2.moments(c) if M["m00"] != 0: cX = int(M["m10"] / M["m00"]) cY = int(M["m01"] / M["m00"]) else: cX, cY = 0, 0 cv2.circle(img, (cX, cY), 7, (255, 255, 255), 1) cv2.putText(img, "red", (cX - 20, cY - 20),cv2.FONT_HERSHEY_SIMPLEX, 2.5, (255, 255, 255), 1) Using the following code we can get out Code Snippet : cv2.imshow("result",img) k = cv2.waitKey(0) if k == 27: break Output : Conclusion : Computer vision can be used to solve most of the problems with utmost sophistication. We can use both Python and MATLAB for Computer Vision, but we prefer Python because it takes less simulation time than MATLAB. Contours, Centers & colors were detected in the given sample images successfully. Thanks You,
- Recommendation system
In this article, we are going to see the recommendation system and How it works. After that we will develop a movie recommendation model system. Lets Begin. What is a recommender System ? Recommender Systems is techniques providing suggestions for items to be of use to a user. The suggestions provided are aimed at supporting their users in various decision-making processes such as what product to buy, which book to read, which movie to watch etc. The system has proven to be a valuable tool for online users to cope with information overload. Become one of the most popular and most powerful tools in electronic commerce. Various techniques for recommendation generation have been proposed during the last decade and many of them are successfully deployed in commercial environments. Recommendation Techniques There are three main technique to build the recommendation system Content based method Collaborative filtering methods Hybrid methods Content based method Content based system which uses characteristic information. This information about item keywords categories etc. and users preferences, profile etc. The system learns to recommend items that are similar to the one that users liked in the past. The similarity of items is calculated based on the features associated with the compared items. For example, if a user has positively rated a movie that belongs to the action or thriller genre, then the system can learn to recommend other movies from this genre. Collaborative Filtering methods In this method recommend items based on similarity measures between users and items. The items recommended to a user are those preferred by similar users. For example a user likes Product A and another user likes the same product A as well as another product B, The first user could also be interested in the second product. Their aim is to predict new interactions based on historical ones. There are two types of Collaborative filtering methods. Memory based Model based In a memory based method the first way is to identify the cluster of users and interactions of one specific user to predict the interactions of other similar users. Second way identifies clusters of items that have been rated by user A and utilizes them to predict the interaction of user A with a different but similar product B. In a model based is used data mining and machine learning techniques. The aim is to train the models to be able to make the prediction. Hybrid Methods Hybrid method combines the collaborative filtering and content based methods. Benefits of Recommender system Increase number of sales items Very few techniques to increase the number of sales items without increasing the marketing efforts. Once you build the automated recommendation system, you will get recurring additional sales without any efforts. Increase user satisfaction A well developed Recommendation model can also improve the experience of the user. The user will find the recommendations interesting, relevant and, with a properly designed human-computer interaction, and will also enjoy using the system Better understand what the user wants The description of the user’s preferences, either collected explicitly or predicted by the system. The service provider may then decide to re-use this knowledge for a number of other goals such as improving the management of the item’s stock or production Increase user fidelity A user should be loyal. Many recommendation models compute recommendations, leveraging the formal information acquired from the user in previous interaction. For example, rating of items. Application of Recommedation systems Product recommendation: Most important use of the recommender system is at online retailers. All Ecommerce websites and online vendors try to present each returning user with some suggestion of a product that they might like to buy. Movie recommendation : Netflix offers its customers recommendations of movies they might like. These recommendations are based on ratings provided by users Books recommendation : like kindle offers its customers recommendations of books they might like. These recommendations are based on ratings provided by users Now we are going to build the Movie recommendation model system. which provides you with the recommendations of the movies that are similar to the ones that have been watched in the past. For that The dataset I used here directly comes from netflix and I am importing this dataset from kaggle. The dataset contains the 4 text files. Each text file contains over 20 m rows. But here I am using only one text file for building the recommendation model due to processing time. Here I have already imported some packages which will be needed for build the model Code snippet : # import libraries import pandas as pd import numpy as np import matplotlib.pyplot as plt from surprise import Reader, Dataset, SVD from surprise.model_selection import cross_validate In this step I have defined a read_data function for reading the data and then stored data in a variable. Code snippet : # define function for reading data def read_data(data_loc): df = pd.read_csv(data_loc, header=None, names = ['Customer_Id','Ratings'],usecols=[0,1]) return df # data path path_1 = "/content/combined_data_1.txt" path_2 = "/content/combined_data_2.txt" path_3 = "/content/combined_data_3.txt" path_4 = "/content/combined_data_4.txt" data_1 = read_data(path_1) data_2 = read_data(path_2) data_3 = read_data(path_3) data_4 = read_data(path_4) Here we can see the shape of all text file data in each file contains over 20 million records In this step Count the total records, total movies, total customer and total ratings Code snippet : # total movies total_movies = all_data.isnull().sum()[1] # total customers total_customer = all_data['Customer_Id'].nunique() - total_movies # total ratings total_ratings = all_data['Customer_Id'].count() - total_movies print("Total Records ",all_data.shape[0]) print("Total Movies ",total_movies) print("Total Customers ",total_customer) print("Total ratings ",total_ratings) Output : In this step Now here Iterating the ratings data in a loop and calculating the percentage of each rating then and that data will be visualizing. Code snippet : erc = all_data.groupby('Ratings')['Ratings'].agg(['count']) ## Plotting the graph ax = erc.plot(kind = 'barh', legend = False, figsize = (15,10)) plt.title('Total : {:,} Movies, {:,} customers, {:,} ratings given'.format(total_movies, total_customer, total_ratings), fontsize=20) plt.axis('off') for i in range(1,6): ax.text(erc.iloc[i-1][0]/4, i-1, 'Rating {}: {:.0f}%'.format(i, erc.iloc[i-1][0]*100 / erc.sum()[0]), color = 'white', weight = 'bold') Now here I am extracting records ratings nan value is True and creating a new dataframe to know all that where does the movie counting start from. creating a numpy array containing movie ids according the 'ratings' dataset and then store it in list arr_movie create a new array Account for last record and corresponding length append the created array to the dataset after removing the 'nan' rows. After that these two columns are converted into an integer. Code Snippet : # # To count all the 'nan' values in the Ratings column in the 'ratings' dataset data_nan = pd.DataFrame(pd.isna(all_data.Ratings)) data_nan = data_nan[data_nan['Ratings'] == True] arr_movie = [] movie_id = 1 for i,j in zip(data_nan['index'][1:],data_nan['index'][:-1]): temp = np.full((1,i-j-1), movie_id) arr_movie = np.append(arr_movie, temp) movie_id += 1 final_rec = np.full((1,len(all_data) - data_nan.iloc[-1, 0] - 1),movie_id) arr_movie = np.append(arr_movie, final_rec) print('Movie numpy',arr_movie) print('Length',(len(arr_movie))) all_data = all_data[pd.notnull(all_data['Ratings'])] all_data['Movie_Id'] = arr_movie.astype(int) all_data['Customer_Id'] = all_data['Customer_Id'].astype(int) print('Data') print(all_data.iloc[::5000, :]) Output : Created a list of all the movies that are rated less often.It includes only top 30% rated movies, for that count the ratings and find the mean value by movie id. And this code will return output the minimum number of times of review, which is less often. Code Snippet : f = ['count','mean'] movie_gb_mi = all_data.groupby('Movie_Id')['Ratings'].agg(f) movie_gb_mi.index = movie_gb_mi.index.map(int) movie_benchmark = round(movie_gb_mi['count'].quantile(0.7),0) drop_movie_list = movie_gb_mi[movie_gb_mi['count'] < movie_benchmark].index print('Movie minimum times of review:',(movie_benchmark)) Similarly created a list of all the inactive users who rate less often for that count the ratings by customer_id. This will return the top 30 % minimum times of review by customers. Code Snippet : cust_gb_ci = all_data.groupby('Customer_Id')['Ratings'].agg(f) cust_gb_ci.index = cust_gb_ci.index.map(int) cust_benchmark = round(cust_gb_ci['count'].quantile(0.7),0) drop_cust_list = cust_gb_ci[cust_gb_ci['count'] < cust_benchmark].index print('Customer minimum times of review:',(cust_benchmark)) Dropping a list of all the movies which get rated less often. And also Dropping a list of all the inactive customers who rate less often. Code Snippet : print('Original Shape: ',all_data.shape) all_data = all_data[~all_data['Movie_Id'].isin(drop_movie_list)] all_data = all_data[~all_data['Customer_Id'].isin(drop_cust_list)] print('After droping the Shape is : ',(all_data.shape)) print('Data') all_data Output : Creating the matrix ratings for values, index for customer id and movie id for columns. we need it for our recommendation system Code Snippet : data_pivot = pd.pivot_table(all_data,values='Ratings',index='Customer_Id',columns='Movie_Id') print(data_pivot.shape) data_pivot Output : We have one more dataset of which is movie titles. taking only 2 lacs record for fast processing. After that applied the SVD algorithm on that dataset. Which is created. Code Snippet : movie_titles = pd.read_csv('/content/movie_titles.csv', encoding = "ISO-8859-1", header = None, names = ['Movie_Id', 'Year', 'Name']) movie_titles.set_index('Movie_Id', inplace = True) movie_titles.head() # reader reader = Reader() # get just top 2Lacs rows for faster run time data = Dataset.load_from_df(all_data[['Customer_Id', 'Movie_Id', 'Ratings']][:200000], reader) # Use the SVD algorithm. svd = SVD() # Compute the RMSE of the SVD algorithm cross_validate(svd, data, measures=['RMSE', 'MAE']) Here we can see the cross validation result of SVD algorithm Output : Taking one user customer id extract the record who has given five ratings to those movies Code Snippet : cust_1493615 = all_data[(all_data['Customer_Id'] == 1493615) & (all_data['Ratings'] == 5)] cust_1493615 = cust_1493615.set_index('Movie_Id') cust_1493615 = cust_1493615.join(movie_titles)['Name'] cust_1493615 Output : Now here predicting the movie above users who have rate 5 rating the movies. Now here see which movies he loves to watch. First dropping the list of all movies who rate less often from the movie title dataset. Taking full data set customer id movie id and ratings and store it in variable data1 After that fit train set on svd model and then predicted the ratings which can say here estimated score of this user Code snippet : customer_1493615 = movie_titles.copy() customer_1493615 = customer_1493615.reset_index() customer_1493615 = customer_1493615[~customer_1493615['Movie_Id'].isin(drop_movie_list)] # getting full dataset data1 = Dataset.load_from_df(all_data[['Customer_Id', 'Movie_Id', 'Ratings']], reader) trainset = data1.build_full_trainset() svd.fit(trainset) customer_1493615['Estimate_Score'] = customer_1493615['Movie_Id'].apply(lambda x: svd.predict(1493615, x).est) customer_1493615 = customer_1493615.drop('Movie_Id', axis = 1) customer_1493615 = customer_1493615.sort_values('Estimate_Score', ascending=False) customer_1493615 Our Final Prediction Output : Thank You,










