Skip to main content

Data Wrangling — Some Tips during this covid season

In this covid season, thought of sharing some tips and sample codes on Data Wrangling. After getting a glance of Google Colab, my go to language for data wrangling has been Python. This blog hence uses Colab and Pandas on Python for data wrangling.

Most of the data collected has format related issues because they are collected without the use of a software. These days, with tools like Google Forms coming into picture, it is much better, but when you have old data to deal with in its row form, there is no escape from formatting it and getting it to the shape that you need it to be in.

This blog shows how to use colab, pandas and lambda functions to quickly format some data.

I had written an earlier blog few days back on  and how to import data from excel to Colab. So, I shall skip that part now.

Let us say, the below is the data that we need to format (the admission number column) and we need all the data to be in the format: YYYY-Num

Image for post

We can do this quickly using dataframes and lambda functions in Python.

The below is what we can do to do the transformation:

Import the data into a dataframe using pandas.

Then, write a normal function in Python which will replace the slash in a string with a “-”. Later, apply a lambda function on the column “Admission Number” and pass this function which we just wrote.

We can repeat the same with another function to format the date. In the second function we can split the data with “-” and check for the format of the year and replace it if required in the YYYY format. Apply this lambda function again on the column which requires the data to be formatted.

The resulting data can be exported to another .csv on the drive too. A piece of demo code is written to show this. The note book has been uploaded to my Github from Colab directly and is available . Below are the screenshots of the relevant pieces of code.

Image for post
Image for post

Might sound simple. But, these simple tools worked like magic on my old Mac with performance issues to transform 1000s of records and to get some “visualisable” data quickly.


Originally posted here

Comments

Popular posts from this blog

Using Data Effectively & Creatively to Protect Our Elderly Population

Exploring the Root Cause of Medicare Claims Fraud Interoperability has become the new buzz word within healthcare.  The goal is admirable: use computer systems to easily exchange data.  It sounds so simple.  We are even at a place with machine learning and artificial intelligence where all kinds of information can be easily shared.  So it makes me stop to think, why is Medicare fraud still so prevalent.  If we are sharing information and we can detect patterns, then why are we still having this issue.  It seems that the answer lies in the fact that while the capability exists, the willingness to share data across the right platforms still is not being done.  One carrier may have data within its organization while one regulatory body may internally share information and on the other hand providers respectively share information among one another.  Yet, if the American healthcare system is not a system at all, in that all the parts do not integr...

How Data Visualization Helps Enhance the Value of Business Intelligence

How Data Visualization Helps Enhance the Value of Business Intelligence By Frank Poladi With massive amounts of data available both internally and externally, making sense of the information isn't easy. However, business intelligence (BI) tools make it easier. With a robust BI solution in place, it becomes possible to mine data from diverse databases for insights, trends, and analysis. Business intelligence is hot right now, but not all BI solutions are created equally. As powerful as these tools are and as richly detailed as reports may be, the results can be difficult for the average manager to decipher - and more importantly, use. Fortunately, it's not necessary to puzzle over stacks of business intelligence reports when you have a solution that includes data visualization tools. What is data visualization? Visualizations are a graphical way of displaying data such as pie charts, bar charts, and trend lines. In addition to these familiar charts and graphs, data vi...