00:00
This material is made available to you by on behalf of the university of Melbourne under section one month three of the copyright act nineteen sixty eight it may be subject to copyright for more information visit the university copyright website。Okay good afternoon。Let's make a start is three twenty now okay so welcome back to the lecture this is the beginning of the next sort of next topic for this subject so。So starting from lecture four today I'm going to talk about data pre processing and cleaning in our previous two lectures。
01:02
What?I I just like To Get some interactions because it's Friday afternoon so before I go to the next slide some of you have the happy on your computer what did we cover in the last two lectures Jason good name spaces。Somebody from the back I need one more。What else did we cover。Else we last weeks last two lectures。Ex in Jason good okay so。Of course I introduction then we covered all sort of data format and goes into details about X m l and Json files that's the data format so when we cover that you know we concentrate on the format on how data is organized in files and and therefore you know that their meaning and how they facilitate um。
02:13
Coding of semantics or facilitity。Like a structured data sharing um from today we are going to look at what you know the the content we're going to look at what is in the data。Okay so remember in the first introduction um you want to do a data science project the the core analysis is twenty percent right at the end you you'applying the deep learning whatever the coolest model that you applying is is the final little bit of it so at the beginning what we need to do the eighty percent of the effort。Wrangling so wrangling is messy okay so。
03:01
It goes into you know how do I deal with all this messy data。So please have a look。So this is kind of some some some of them may be in line or in you know WeChat conversations how old are you twenty years ago fifty years ago a hundred years ago。Okay so you can see it everywhere okay so the age can be spelled out in English can be like this um you know if I'm more dilIgEnt I'I can put Chinese characters in it。Like Obama right okay so this is the data that sometimes you know you probably don't if you are creating it sometimes we collect the data second hand so you cannot control or the data is like so this is a reality。Has anybody seen some other strange format strange content that you get from。
04:07
No good okay so this is what we are this is what the sort of things we are dealing with okay so。How do we so。Looking at this simple example right so I believe that the majority of us would say this。Data or this file has serious serious data quality issues。You agree I agree I hope。OKSO。Just like us let you see all of them okay so um they are different aspects of data qualities um so we need To Get some understanding of it because they are the the way you treat them and the way you have to you know like when you look at the data like when I show you that simple file is'very easy because you look at it you can your eye can already spot where the problem is but when you get a very large um。
05:08
Volume of data so you need to have some idea about what kind of you know what kind of quality problems I should I need to apply my skill to look for。OK so the first one is accuracy。If I tell you I'm eighteen years old。Okay so。So if I enter the in the western database and the age is recorded in in in I put eighty eighteen not eighteen one eight um this is you know as far as I'm concerned I can put it in Jason I can put it in XL I can put it in in even relational database because it will be valid value。
06:00
Right it will is the I I design the database and say okay as long as is between zero and hundred and twenty or you know what's the old age in I don't know maybe make it hundred or two hundred okay forget about those people who deep deep in the mountain we don't know how they are it is it is valid but it is not accurate。OK so e could be so that's the first one。The second one is completeness。Okay so in this slide if you probably can see some examples we just use data birth as a。Maybe a running example'if I don't have another one so。When data is incomplete what does it mean。Yes it's not complete right so suppose I have so one two three so one two seven suppose I have seven record and this is just one of my column and um number five number six might be you know still number one two three still number five six seven and I have everything else but I actually can't you know the the the data birth value for that these two students are missing。
07:18
OK so as far as I'm concerned as far as these two records are concerned their information is not complete。Okay so later on in the lecture we'go through the different kind of incompleteness。Okay the different kinds of missing。So so no problem not record you unvail so it is。Even that meaning is ambiguous sometimes。OK sorry I sometimes point point but you probably don't know what I'm pointing at I will try and do this I um the third one is consistency。So I feel like sometimes this is kind of testing my English what inconsist data in consistency means。
08:05
What does it mean?Format that's good so in here you can say you know。Maybe these two。The format is not consistent。Can you see why?Record two and three。So in Australia we probably write it this way right in the second。So this is American way I think。So months date in a year。So this is easy to spot because there is no month twentywie months right but if I give you fifth and fifth。You know of nineteen eighty。
09:01
So what's that okay piece of piece is a bad example so way it of me say if I say six five ah nineteen eighty。You know it can be this way can be that way so it's inconsent there are other okay so um。There are other kind of inconsistence can you think of any other examples rather than data birds。Names yes。Okay so sometimes if you remember we talk about when data wangling you don't get a simple table like that the data doesn't necessarily come from the single data source right so you can come you can have data from multiple data sources and then so I might have my medical record in a GP and I can have my bank account you know financial transaction information in in with the financial with the bank。
10:01
OK and then I can have my tax return with the a o。OK so when data comes in。Um maybe in in one I'm recorded as。Eighteen years old and in some records I'recorded the thirty eight okay so and also the addresses can be different they are different things that makes you think like you know。Maybe the text file number is saying that I'm the same person you know all these records belonging to the same person but the attribusing there are not consistent。From you know you can have two data birds for example coming back to data birds anyway so that's consistency。Okay that's pretty good。In okay and also in the encoding standard as well so um。Sometimes yeah。
11:02
Maybe in some of the attributes you actually don't write plane takes or it's not a numeric value but it is actually a reference where you look it up so for example you might have。You know in a bank transaction they actually have two digital or two later to actually refer to the type of transactions you use okay and imagine that if in twenty years ago we use a set of reference and then twenty years later you know the reference changes。And then so the it can be inconsistent either over time or from。When the data is collected or aggregated from different data sources。And timelines next one。That's probably quite straightforward right so。Anybody can anybody think of any timeless troubles in the data set。
12:05
Okay hypothetically what if?The bank。Made a mistake and actually you know never really updated my my bank transaction my bank account balance for。Reporting to the tax office。Okay so I'm not the one doing tax evasion right when I was a student maybe have no money so I'm always you know but they they forgot about something so they forgot to update you know my my term deposit from somewhere else okay so the um the data is sta it is in the old data and then so if you are not aware of that you might be analyzing something that is out of date it doesn't reflect the the current state of the world。Um。Any questions。Okay I'LL keep going ah the。Beliability。
13:09
Um sometimes the same problem ah can have different aspect of the same same scenario same ah phenomenon you can see maybe they are they are different aspects of data quality problem so later on maybe we these beliabilityil actually。Something I think is'quite important so basically you can ask it if you see a piece of data and then based on what you are reading your interpretation you think。You can ask yourself do I believe the data do I believe the value that I'm seeing。Okay so um so this is closely related to the outlier analysis that we are going to talk about later on okay some of the values are it might be unbelievable but true。Some of the value is unbelievable and yet yet it is a data quality issue it is a noise。
14:05
Okay so that's the that's what you can just you don't have to remember exactly these dimensions because people sort of DeFine study differently but that's how you can。A way to think about what beliability is。And finally interpretable。So here I say how easily can I understand the data how easily can I interpret the data how can I get the true meaning of the data。From from what stored there from what is being recorded。So can it's good to use your imagination so what is the what can you think of some situations where you really don't know whether how to interpret the data。
15:09
Seven。Sorry。Seven I can okay so it is true but I'm just joking but it is true for some people nineteen seventy one eight four。And then you got three foreign characters in the middle in between right so I probably can guess that nineteen sixty one is the year。Okay and then。OK if you if you know if you know the character。Then you know is month。Okay but if you don't then okay guess have a good guess but the thing is it is like that but also that's just you state again so I think in different cultures you know even new year is different the way you count ear can be different to。
16:00
OK so for example if this is the data birth I actually have two data birth。Do you believe me。I have two date birth every year。Okay。So that's the thing so um why do I have two data birth because in in my culture we follow a lunar calendar as well so that's what my parents remember so that's that's the date you know I have to follow lunar calendar To Get a birthday cake。So that's a difference um so when but if this is not clear to you when you look at the data you might not be able to interpret well so you might you might say oh um how come I'm getting inconsistency so you might you might think it is inconist but in fact it is just the interpretation problem。Okay and then you can say oh some places really cold really really hot as recorded in a thing but if I don't put the you you know twenty degree it is twenty degrees celsius or this twenty degree fahrenhe。
17:09
Okay so some of these is you can you can run into interpretation problems。Okay let's keep going okay so that belong so considering all the problems you need to address it so um you can rectify you can use some tools to do that but after cleaning basically the processing is that we need to look at the various various aspects of data quality issues and then addressed by cleaning it as much as we can。And then after that we you know the big。Effort is in the data integration so somehow we need to then join merge all the data together into a format that is useful。That is that is useful it is useful as in in the format that is expected by the analytical algorithms that were going to apply。
18:12
Sorry this is a bit strange。The microphone go downward。嗯。OK so at the end you can see that there's some other data in data transformation I have some something called data reduction。Okay so sometimes we have so many attributes some medical data even have thousands of you know different measurements different attributes but we。Ah there are some really cool techniques so we can't really handle that uh that many attributes because you know machine learning we will try and put the data point in in the very high dimensional space and that's really really difficult and so to。
19:00
It's a very cool trick that you know later on when you start learning doing machine learning kind of exercise you learn how I can then you know try and separate the data in a you know into a low dimensional space so that's a data reduction。And you might use some other method as well but at the end we get。We get sort of the tables where you have roles and columns like the way we you are familiar with like data data frame。OK so that's the term interchangeably we can use columns means attribute and in machine learning we call your features。Okay and then the roles are the instances or objects or records or cases。Okay all good。Good。OK so in this table I have one two three four four。
20:02
Features for attributes for columns and the items is other roles and I can call instance in an object。Now look at these attributes or features。OK these are number float real number。S。So apart from knowing the column and roads we need to know the the data types。Of our features。OK so height weight age are continuous features and citizenship Roman buttonans this is categorical。Like country names。So it means that there'is a finite list of values there。Okay so。Do you have any questions so far about data types。Do you all agree that these are。
21:01
Continuous features。Three。Say that again。Height is continuous yes yes。So are you saying the other to aunt。So good okay so you'starting to question I'm really glad you you're doing that because um naturally it is a continuous。Measurement right so I can I can say I'm。Okay okay I am hundred and sixty cent meters tall but I can say hundred and sixty point one hundred and sixty point one one one one so you know depending on how much I want To Go so um but in in reality so the nature of that kind of attribute continuous but sometimes people will model it and measure it or record it in a discreet manner。
22:10
So I don't want the decimal that's possible。Okay and they are all of these。Examples of data cleaning data pre processing tools okay we're not covering that I'VE seen those。Yeah all of those in my previous workplace but I don't touch them because that's the some some other people's job。OK however what we are going to learn is the sort of basic techniques that's being applied there。You know behind the scenes。And then so specifically how the we deal with noisy data in consistent data and。Intentionally disguised data okay this is a very interesting。Problem always for you know。Catching fraud criminals and anomaly detection。
23:04
Okay so noisy data as you can see there are many ways you could be a technical issues so you know I can only take eighty characters。So if your name is longer than eighty characters sorry。Cho so that's that's the and sometimes I can say put a csv file but you know I somehow put the comment in my names and then so all the splitting get out of sink。And this is the wrong value I don't like salary To Be negative like I'm owing something this only happens in credit cards。Okay。And more so we'VE gone through this this is fine now so in consistent data if。One thing I want to point out is this outlier just To Briefly introduce you to you first you know like。
24:01
What happens here okay this is inconsistence within a group so I can have。Ages of hospital patients so sixty two seventy two so maybe this is some aging home or something。And you get a nine nine nine。OKSO。How likely is do I believe it is is actually a noise or can it be an unbelievable but true value so that's the question we ask when we when we inspect the data qualities。And disguise data okay so that's the thing you know I don't want people to know so I can disguise sometimes I have to disguise it because I don't have a good value and so I just make it up。Or use something as a substitute。Okay so does anybody have a。Have a。Like sometimes that when you do certain things they council you mind I don't want to I want to hide something so I'm discussing。
25:03
I'm purposely giving out wrong information or incorrect information。You can't use the example I'VE used before。No。Yeah you go to shopping and get promotions but you don't want all these spas so what do you do。You leave the wrong number or you leave something anyway I mean I I don't need to teach how to disguise data。OK so that's now look at the data that's missing。Okay so。So if you look at the first point like this this is what we consider is missing so now is a special value special value in computer systems in your programming that that you know normally represent that there's no value there。
26:15
And sometimes an empty string is also an a way to express that the data missing。And so the。Now I have to you know get you to start thinking again so what missing。How the pattern of missing how the data。You know what what causes the missing the missing data in our data set is quite important。Okay so so basically we distinguish two types of missing。So the first one is missing at random。Okay so here he says a missing completely a random so there a kind of different distinctuion between completely a random and random but basically so when data is missing completely a random that means um。
27:07
It is just random randomly not there it is not correlated to anything else any attribute that in the data set or outside of the data set it is just randomly missing。OK so how can attribute how can be how can an attribute just。Missing randomly。How does it happen。Okay so I'LL give you an example that can happen okay so maybe I only have so I'm running I'm doing a survey okay and then I only have one hundred one hundred so back in the no computer times we use papers okay I only have one hundred surveys that I need to hand out to people。Or I only have ten pages and so actually I have twenty survey questions but I can really print。
28:06
Five questions in the in the。In the in the booklet I have you know the the paper is very precious I'm just making it up and so so the way you can make it completely random is like I can randomly just you know for every every ah so for the。嗯。Surveys I can randomly pick。You know five out of the twenty questions to ask。People or I can say I have I only have one hundred copies and in the class I have I don't know how many turned out but we can taste but later I say suppose we have three hundred and fifty students inroing the subject and I want to do only one hundred。Service okay I'm not interesting all three fifty just one hundred enough for me so I can randomly select by student number or something okay so that means the the missing is random。
29:05
Okay I'm missing the other two fifty in in the class but it's randomly missing because I am。Ing my imple that way。And so but normally that doesn't happen in reality it doesn't happen in real world it's very very hard to prove that some the data you observing that'missing is missing completely at random。Okay so it is just fair to contrast what's more important is when it is not missing a random。Okay there is some patterns in the missing and that's quite important。You happy with this so actually it is the。So how we identify or how we can you know make a best or estimate or the the most reasonable assumptions about why data is missing。Will affect how we deal with the missing values。
30:01
OK now data so you know on correlated just totally random that's that's just random but if。You know in this example in the second example it is not random and in this example you know。People with lower IQ don't want to say that but I will argue that then I IQ is still quite high but you know like so in this case you can observe the IQ scores some of them are not there but somehow。There is a relationship between。The missing to some other values some other attribute values。Okay so in this case。Okay and then so in this case um if the IQ missing um so people with lower IQ don't reported in this case the missing is actually related to the IQ score itself。
31:04
And that's the other one that reverse okay so if you put your。The head on a you know social science or policeing intellIgEnce had on so people with higher income might not want to report their their income OK so in that the case you。Why why sometimes that can happen right。Why would you not want to report your income if your income is higher。OK OK so this is the thing okay yeah so I don't want a tax office to know so that I can pay this this tax。That'that's one yes。Anything else?Yeah I can't think of anything else there are some some complex psychological issues there and then so sometimes the missing is just。
32:02
You know you can'that's what we call deliberate but sometimes is misunderstanding maybe it's optional and I don't have time I don't write it so it's missing okay and sometimes we are collecting data through sensors and one of the sensor is broken。Okay so different kind of missing you will buyers you will you know affect our analysis so that's look at different ways that people can some simple ways so what I'm going to introduce you is some simple ways that you can you can handle the missing value OK and then I will also refer you to some more complex ones that is really into the the area of machine learning。More complex modeling。Okay so the first strategy is the most simple one is that I just delete them。They cool I just delete them so it is okay this case is this this kind of。
33:03
Strategy is okay if you only have very tiny bit of missing and it doesn't really affect the overall distribution of my my data。But if your data is very small like。This。Okay I only have one two three four four records and anything that's missing I'm deleting the entire record。Because I don't know what to do with them。Then I'm left with one。Okay what are you what what it's very easy to overhe if you just have one sample you just you know use that whatever value is there。So that's case delution。Second way is。I just。Okay I use expert so second way is manually correct missing so I manually feeling what the value should be。Can you see any problems there。
34:06
Yes。These you not quite。Yes very good so it depends on the the experts opinion right so in general on the one hand if you trust the expert and that's fine but experts can be biased to。OK you can get three experts and they might disagree。Okay so that's very good that's one very good reason the other。This one more I want To Get out of。They soon。So basically that's that so in a lot of intellIgEnce systems that we just make sure that we don't have too many false positive precisely of that we cannot have human human effort is really expensive we cannot afford to hire people just to look at individual especially nowadays statey。
35:05
You know if I'VE got millions and millions of hospital records。I just find it hard to imagine the person's job to just just stay at the you know to look at that many records it's just not practical。Okay and then so。The third strategy is to actually somehow。Filling the the data with some value using some strategies okay so that's called inutation so inute values replacing the missing value with something。OK so with numeric data in this example what are the things you can what are the values you can think of。As a default value if I don't know what to do。I want to put some value in。Average yes so that's common sense right。If I don't have better more any more information I can do average I can do okay so this one is zero the one is OK this is average。
36:09
OK I can also do。Filling with zero。Now let's look at average what would be the problem if I do average。Average can be too high or too low why?Yeah okay so average if you。So if the the the distribution is very very。Why the standard division is very high or you have a outlier somewhere then the average is not you know you can。It is not the true um it might be it might make a mistake it might not be the you know close to the true value of that missing the true value of that missing。
37:07
Data。Right so。So。I guess。Any inutation we cannot guarantee there always some risk you have to take okay you always have to risk because you can you never know whether you right or wrong you can just do your best based on like your familiarity your understanding of the data but one thing about feeling with mean average value is that because it is average。And if you have many um missing values in that call in that feature and you you you just basically feeling with the mean value what happens is that。Um you are taking away the variation natural variations in the data。Okay so you are making them。Um like more monoton less less diverse。
38:03
And that will affect your standard deviation calculation and everything。So。In general so average is a good thing if you say I don't have any better information I just want to say you know will be。You know the most likely the average of the group so that's fine when you。Okay so so like we said early on so you might be when you put the average in you might actually。Um average is actually very sensitive to outliers okay so if you have an outliers the average will be very different to the true population center so there are other ways you can do that that。Other kind of values that you can use instead of average。This is iust so calculate。
39:02
Um but the other way the other measure that's medium that that is you can use and that's less sensitive to outlier is to actually not not averaging the value itself but actually so the data point in order。And I actually find the center point。OK that's what the median is。Okay so in this example。If I take an average of list of numbers I get six。OK so but then if I take a medn I actually I'm finding the center values and because there two equal centers I'm averaging the only the two elements。If I have all numbers then it is a center element that I'm doing the average。And then some categor value category valuables there is no ordering so so you can't do mean you can't do medium then you can do the most frequent。
40:01
Value。Okay and then you can be more creative for depending so you can have more understanding about the data set and actually um choose rather than taking an average of the entire data set you can you know break break them into subgroups。Maybe by a different you know different attributes the categ categorical attributes so in this example um rather than saying that to to impute the age value for Jackie。I can say let me do an average of the whole thing so ten plus fifteen plus two divided by three。Tout it but I can say okay Jackie belongs to spartan。You know so the the citizen should bepar so I can say okay maybe I will only take average from。
41:00
From the same people with the same citizenship。So in this case I would only average these two numbers。OK so that's a different choices you can make。And basically allliers。The DeFinition so that's the missing value missing missing value is finished I can do that so that's the the most simple ways。After this you are going into a different kind of inutation method okay so you can do regression。So that's kind of a simple modeling to say okay I have a data missing in here I treated as some the value that I want to predict or I want to do the regression on so you can then actually do a whole lot of predictive analytics to actually work out what the value should be。Okay and that's more involved and more complex and you can even do machine learning on that so but were not covering that just letting you know that you know beyond these they are more complex methods。
42:07
Now ourliar so we talk about believeness do you believe the data some data is unbelievable so basically。We so in looking at ourliers we look at。So a data point that is actually very far away from from the normal group。Okay so it is it is like。So。To explain this is like。For everybody else so environmentliar for everybody else the data comes from a one generating data generating system。And I kind of come from the universe。So I come from a very different world very different data generation system so that's why I don't quite fiting so that's what ourli is。And examples。
43:01
Unusual credit card purchases unusual large credit card purchases unusual frequent credit card purchases they are all you know。The the common descriptions of outliers。Ah and Michael Jordan that's a bit outdated okay so it's different people and you know very famous cyclist who won。In one year one like more metals than the the next two and three together right。So that's ourliar but it turns outay he's taking drug so it's。Anyway a lot outlier can be outstanding heroes it can be cheatah cheatahers。嗯。So okay so they object deviate from this generating process so if you look at this picture this is probably the normal group and this two other outliers。Example example okay this。
44:03
So how long does it take for human for mother for human to produce a baby。Ten months。And so this is kind of you can range from earlier thirty you know let's thirty five to forty five weeks。Okay so this actually happened husband and wife。The husband went to war and then fifty weeks later the wife gave birth to a baby to a healthy baby。OKSO。呃。What happens there so so the husband things like okay this is not my baby。Okay so that's the that's basically the story okay so you got to it so this husband and wife now do you believe who do you believe。
45:05
Who do you believe。You look at this graph。Stat statistic says this the limit。So that point fifty week healthy baby could be a unbelievable outlier so it is true but it's unbelievable so it's an outlier。Or it could be just simply not true。Okay so the baby has To Be be fathered by someone else so do you believe or do you not believe so if you so we are in a court right you are all the juries。What you going to do it's not you I know I don't don't you don't care about it because you never do this thing but。If you are the jury do you。Think the wife is innocent or not。Not。Okay I don't know so um。I don't okay so maybe I'LL give you a homework you go home and do some research and find out what what is the actual。
46:10
Verdict so what does the court say at the end。Okay so just a bit of differentiation between outliers and and error and random errors okay so outlier are something that maybe um it happens less frequent frequently and it'it's unusual right the random can be can be quite normal。It's just some some variations of the data。So it's they're quite different。Okay so if you know the existence sometimes it's hard to tell what sometimes an outlier is a noise but noise may you know normally outlier normally is a noise but a noise might not be an outlier。
47:00
It could be just hidden and you know just a bit fy。Around the data points。Okay so applications we'VE talked about that。I'LL show you just a few graphs so for example I don't know about football very well I somehow belong to westernog but I don't know much but anyway so this an outlier okay so this graph initially you can also spot ourliers here um so it'the the player average percentage of time in on the field。And the number of goals they they scored。OK so is it very easy for you to see that this is an outlier。Okay maybe not very impressive but if I just look at this graph you might you can。Of course visual is very subjective all right but you know if you say this is an outlier I would believe you。
48:03
This is alli I will believe you too。But if I say multiple choice they compared to this and that。Then you probably have to choose the right one。Okay so why do we care aboutlier like what we talk about when we when we average the the point with the outlier there when we do an inutation if I do that I'm actually skilling the data。Okay so now Friday afternoon that's dream a little bit so if I'm taking the average of the income of everybody in the room。Okay I get one number。Right。OK that's one number but if Bill gate is in the audience listening to my lecture。So that's a issue we have two hundred people here。And say his income is two billion a year。
49:04
Then what's our average income what is your average income on paper。Come on is a simple math two billion share by two hundred people。Because I think ours a together probably is negligible。So we still get what ten million each。So that I think that's pretty good okay。Just an extreme case to show you that you know with an outlier there you can really your result。Okay and allliers they are global they are I would say that maybe we have some localliers in our group that some people are very very rich and。Okay certainly not me but ah Bill gate will be probably be a global outlier in the global population he's rich he's the top you know one of the most rich person so。
50:05
All has global and local so local is that if you depending on the population you are considering it depends so looking at temperature are you talking about globally are you talking about Melbourne。Okay that makes it difference because you are comparing to a different group of normal objects yeah。So detecting our liarers basically I'm so uh oh my god okay so basically if you give three minutes I'teach you how to do box plot and。After this I think I'm going to um so I'm not going to teach you from next week on words next lecture will be James and then he will be taking these lectures for a few few。Few weeks and then until I in the middle of the semester jumping so um basically。
51:07
Box plot is very cool so I first of all get a middle point and I put the line there。OK so now I have two two parts right so if I then take another middle point in the lower part。I get quaal one。Then I go to the top part okay order the elements that'there order them and take the middle point I get caught three。That's how you construct a box plot。Then。I draw a box between q one q one and q three。Like that。Put the box around them。And then the whiskers are basically minimum and maximum。OK so that's how you draw box plot。
52:01
And。If you want to draw a box plot but then you want to indicate where the outlier exist or nott then that's how you do it you do the same thing with middle point qua one and quotehouse three。Then you instead of putting the maximum the the whis at the maximum you actually calculate the distance。You know the the the maximum distance that's allowed before they are considering they are consider as ourliarers okay so the maximum distance is measured as one point five times of the intercoal range。And the intercoal range is the difference。Between qo one and qhouse three。Okay so anything above that they are unbelievable。Okay they are allliars so that's the that is the DeFinition in general accepted by accepted by statistics。
53:06
Okay so maybe objectively we can treat this is a commonly accepted DeFinition of our LIS。Um okay so I guess I have to leave the race to James um they are other ways to detail outliers visually but ah will do that next week you you'do that next week with James so good luck and all the best for the rest of the study ill see you in few weeks。
我来说两句