July 31, 2023
Rudly Raphael
In the ever-evolving world of data analysis, a groundbreaking revolution is underway in the form of AI. With the power of AI, data analysis has transcended its original limitations, opening up a new era of insights and possibilities.
In an exciting collaboration, Eyes4Research has teamed up with Signoi, LTD. to dive into the transformative potential of AI-driven data analysis, to explore how this technology is reshaping industries, and empowering decision-makers to make informed choices like never before.
Recently we got access to the latest feature in ChatGPT (using the 4.0 model), it now has the ability to write and execute Python programs in a “sandbox” environment, and we feel it’s a revolution in data analysis.
It’s a revolution in the dreaded “paradigm shift” way, we believe anyway – given that Kuhn was never terribly clear what exactly a paradigm was.
Let’s illustrate this with the results from a survey we ran recently on attitudes to AI. The new chatGPT 4 allows you to upload files up to 50MB in size, this includes CSV format files. We uploaded the raw data file (some 43MB), with just over 2000 completed interviews. The survey has an open end question that we have analyzed for sentiment and we used our standard archetypes (Jungian, Schwartz, and Shadows) to score the response to the open-ended question as well.
We have a variable in the dataset called “fsentiment” which is the general sentiment score of the text. First, we asked it to analyze the relationship between the variable “Social_Grade” (for the Americans, yes, we are still obsessed with class) and fsentiment. And we do mean asked as in “What is the relationship between the variable “Social_Grade” and fsentiment ?”. This is what it said:
“The correlation coefficient between the ‘Social_Grade’ and ‘fsentiment’ variables is approximately -0.063. This suggests a weak negative linear relationship between the two variables.
In other words, there is a slight tendency for ‘fsentiment’ to decrease as ‘Social_Grade’ increases (given the numerical coding used for ‘Social_Grade’). However, the relationship is weak and may not be practically significant.”
Note that it had worked out it needed to convert the “Social_Grade” variable to a number as it was a categorical. But we thought using ANOVA would be better so we told it to do that, we just said “Use ANOVA”.
It ran the analysis and said:
The one-way ANOVA test evaluates whether there are significant differences in the means of ‘fsentiment’ across the different categories of ‘Social_Grade’.
“The results of the ANOVA test are as follows:
The p-value is greater than the commonly used significance level of 0.05, which means we fail to reject the null hypothesis that the group means are equal. In other words, the test did not find a statistically significant difference in ‘fsentiment’ across the different ‘Social_Grade’ categories.”
Actually, it said more than this, but we will stop here. The new code generation facility enables a range of analyses to be executed via simple English statements, such as “use ANOVA”. It leverages the range of libraries that are available in Python that can perform statistical analyses, it doesn’t have all the possible libraries but it does seem to have a good range, with some surprising inclusions.
You can also ask for graphics to be generated, so we said:
“show me the relationship between these variables: jung_rebel_mm3 jung_jester_mm3 jung_lover_mm3 jung_caregiver_mm3 jung_everyman_mm3 jung_innocent_mm3 jung_ruler_mm3 jung_sage_mm3 jung_magician_mm3 jung_hero_mm3 jung_creator_mm3 jung_explorer_mm3 and fsentiment”
These are our proprietary archetype scores for Jungian archetypes based on the open-ended text question, it then printed out the correlations between fsentiment (the variable with the sentiment score in it) and the Jungian variables.
We then asked it to: “make a polar plot of these results”, and I got (eventually, it tried 5 times to produce this!) this plot:
But we thought it could do better, so we then used my favorite chatGPT prompt, we said: “Make it more pretty” and that was the only command we gave it. And it did, below is the “pretty” version of the polar graph.
It also gave an account of what it had done:
“Here is an enhanced version of the polar plot, which visually represents the correlations between ‘fsentiment’ and the selected Jungian archetype variables:
As before, each point on the plot corresponds to a variable, and the distance from the center of the plot indicates the strength and direction of the correlation (with positive correlations extending toward the top of the plot and negative correlations extending toward the bottom).
From the plot, we can see that ‘jung_everyman_mm3’, ‘jung_magician_mm3’, and ‘jung_creator_mm3’ have the highest positive correlations with ‘fsentiment’, while ‘jung_rebel_mm3’ has a slight negative correlation.
Please note that this is a visual representation of the correlation values calculated earlier, and the same caveats apply: these are measures of linear association, not causation, and there may be other variables or factors at play.”
It’s fair to say that it can be a bit prolix, but we think that could be easily rectified by simply telling it to be “less prolix”, similar to telling it to produce “more pretty” graphics.
So is chatGPT a perfect data analyst? Far from it – it stumbles often, sometimes with some very basic Python programming errors, but it keeps trying until it gets it right. In this respect, it is very similar to a human data analyst, except that it does keep trying until it gets what it thinks is the correct result.
We could have done all this analysis much, much more quickly by ourselves, then again we have decades of experience in software development from assembly language to Python and R, and we have had years of formal courses in statistics.
Making it more pretty…
This is a fantastic tool, it really empowers users to generate complex analyses with simple English commands. With simple English commands, you can generate complex analyses. And it will also generate a report if you want one! Just ask it. And you don’t need to get into the details of what you want, you could just say “Find the relationships between the continuous variables and the fsentiment variable” and it would try to do that. And it could then write a paper for you about the results of the analysis. And probably “make it more pretty” if you don’t like the way it looks.
It doesn’t take much imagination to see the effect this kind of system could have (and will have) on the market research world – and far beyond that.
And there is now a stark choice: embrace this technology or die.
It’s that simple – and we can’t “make it more pretty”.
About Signoi
Signoi is a unique AI-powered software that automatically decodes unstructured data into patterns of meaning. It is used by leading brands and agencies on three continents to surface hitherto inaccessible insight and foresight for every use case you can think of.