This article talks about how to set up a basic integrity check for a model using the ticdat library. This is a sequel to the previous article — tidat — The pythonium shield for your model. Here, beginner level of python programming and concepts of data modeling are requisites.
Now, we know what “data integrity” is and why it’s important to have such a protection layer in a data science workflow. Suppose we’re provided with a survey of 100 individuals. Assume, the collected data has the following fields:
SSN | Mobile No. | Sex | Date-of-Birth | Hand-preference | Income
Back in my hometown, in India, we own a small ancestral setup in pawn broker-ship or in more common language referred as mortgage. The idea of such an enterprise is really simple, one person has a cash surplus (lender), the other is in need of a small captial (borrower). The borrower goes to the lender offers a personal belonging as a collateral and gets the money for its value. The money is offered at some interest for a fixed tenure, the borrower returns the due and gets the collateral back.
The case in point here were usually farmers from neighbouring…
Data products in real life are like the infinity war. Prediction and optimization models are the avengers of this war, fighting different scenarios and defeating the problems. But would avengers have won the war without Wakanda’s vibranium shield?
Hmm.. probably not!
- Dr. Strange ;)
Effective algorithms and models are designed to function on a predefined data model. The developers create some regulations that input data needs to follow in order to satisfy model dependencies and assumptions. These are the rules that are NOT meant to be broken. For instance, imagine you’re designing a prediction engine with blood group…
Starting with the duo processors in the 90’s personal computers are growing muscular with every next upgrade. Today these are equipped with numerous cores adding the capability to perform simultaneous tasks with no compromise on efficiency of each. As data scientists, we come across a lot of situations where iterative operations like running a big loops, filtering and aggregating data etc. that provide an opportunity where one could potentially utilize the multitasking power of the CPU. The operation could be parallelized hence determine the results faster. …
The Interview Portal is bringing together a series of interviews of people working into different areas talking about their experience and how to get started.
Here’s an article where I share my experience as an Operations Research professional in this series. I talked about a few of the projects I’ve worked upon in the past and as well as how did I venture into this domain.
Find this conversation here: https://theinterviewportal.com/2020/04/28/decision-sciences-professional-interview/
Great things are cocktails of Science and Art!