00:00
This material is made available to you by on behalf of the university of Melbourne under section one three p of the copyright act nineteen sixty eight it may be subject to copyright for more information visit the university copyright website。Okay let's make a start gonna good afternoon how's the volume up the back okay give me away right okay great so wait for a lecture lecture seven um what what's the plan we we're going to this is come under visualization so we were doing visualization last week and we saw some different visualizations and we can think of a a further visualization call clustering where it's about finding groups in in in data finding finding useful useful groups present within your data so we're going to talk about that and and see a couple of see a couple of methods we'LL we'LL see something called cameines which is the sort of the classic classic method you you use to to to make groups out of out of a data set and。
01:35
We'look at a visualization method that that is quite useful to pick the number of clusters if you're if you're not sure how many clusters there are so that's the that's the plan and Friday's's class will also be about clustering but it will be a different different type of clustering hierarchical clustering and Chris Chris you and the head tutors going To Be taking that that class on Friday。
02:04
Okay so what's what's it about so the the challenge for us is that we'VE got data set and'got lots of dimensions lots of features or attributes it's probably difficult for us to visualize and know what the important groups are and the data we need we need an automatic method to do this and if we know what the groups are then we could。Treat them differently in in in some ways so talk about examples in a moment but here's here's a very simple simple data set you can see there's three three groups they happen To Be colored according to what group they're in and the question is how could we how how could we write an algorithm that would produce this this output that would color the object according to To Group membership we can see it。But the computer doesn't doesn't see you have to specify an algorithm to find these things so these are the these are the groups that we we want to have an algorithm to find。
03:10
What's a good group a good group um in one sense is a is a tight group so everyone in the group is is similar to each other so the the the pair wise distance between neighbors within a group is is small so we want that and we'd also like our groups To Be different from one another so a large difference between between the groups that we find。So we're going to we're going to have some algorithm that will find these things we we hope it will behave like this tight clusters that are well separated from from one another。So where might you might you do you do this。
04:01
From a business point of view if you're interested in marketing or or know about marketing you you'running a market campaign you want to send information to different people depending on what group they are depending on whether they're likely To Buy this or likely to do that to do that you have to know the groups you need some some way of doing that they call segmentation so segmentations just another word for for clustering image analysis is another way to to think about this if you got an image and you want To Break it down into what objects in the image it's a bit like finding groups of of pixels。Search engine result presentation I don't know about you but you get back this set of results from Google or search engine。Sometimes it feels like they're in clusters or groups of similar similar sorts of results。Another type of of group personality type and'I'LL say talk about that right right next as the next thing。
05:10
Fact I'LL skip that and I'LL come back to that other slide our object is our objective is we'VE got this data we want to find the groups here the groups everyone's in one of these groups and in fact the group if we wanted to。Summarize it it just be the middle point so the person in the middle of a group is somehow a representative for that for that group now where is what I wanted to show where is it gone。Yeah。Why wasn't it sh?
06:13
呃。So。This this is a story it'LL be。Would all have seen and and and and read in some former other started last year it was the it was the the Cambridge analyticica Facebook saga and essentially what what was going on is that。There was data from Facebook people clicking on what they like so each person has a profile and that that profile um records what they'VE clicked on what they what they liked and you could represent that is a row in a in a ah in a data set a data set with lots of with lots of people fifty million people and all of their likes and the the disturbing thing was that that likes are quite predictive of of various various various things depending on what you like it it can predict。
07:19
Quite quite a number of things so there's there's an interesting description given at the link at the link below about about what happened and some of the claims that are made about how predictive these these these likes are。But at the end of the day um what they'are accused what what they'aused of doing is facilitating effectively a segmentation a targeting of of people based on their personalities so if I can find the cluster of people if I can find the clusters of a certain personality type if I can if I can understand that cluster and represent that cluster I can potentially apply and an intervention so it is middle point here if I have a cluster of people with a certain personality I could I could target them with an advertising campaign hoping to push them in a certain direction to vote and the example there is is a gun a gun example。
08:23
So I guess the the point of of these of these two slides is just to so that what we're doing we're talking about clustering and segmentation and and this is this can be viewed as a as a perhaps a misuse of clustering and segmentation and I encourage you to have a have a read of the have a read of the article that gives more detail。Okay so that's that's the that's the scenario any any queries questions at this point on on what we're trying to do or maybe why why we're trying to do it why's why it's important。
09:07
Not at not at thistench okay keep going。So so we'VE got um we're going to develop an algorithm and to to develop an algorithm we need to compare compare objects with each other we're going To Be using that in the doing that in the typical way just computing how far things are apart from each other we're just using the the the distance formula that we'we'VE seen before and a common thing is you before you apply applied a distance formula you need to put every attribute on the same on the same scale as the other attribute so maybe put them all into the rain zero to one before before you use this formula。We then we can get our algorithm and as I explain before this is what we want to do this is what the output we want we want to have looks looks like。
10:05
All right k means so。This is the shooter code I'm going to I might be skipped to the example as a nice example that I'VE taken from。Some slides andrey more at cmu so we'VE got a data set。Two features to attributes and we want five clusters。So I ask you how many clusters do you want you say James I want five k is five okay fantastic。First step ear you you know you you close your eyes and you just randomly throw throw a dart at the dark board and you you'VE got five locations but five five cluster centers okay so we'VE got five five representatives for the five clusters that we that we want。Just random。
11:01
Next we look at every every point in the data every object and we see which center its it's closest to so all of these guys up in the top are closest to this this center here so are going To Group them group them with that center that's there that's the cluster they belong to they belong to the top cluster。These ones over on the right these ones are all closest to this to this center on the right so that's the cluster that they go into。So we'VE just visualized this in a。Um。We'VE got five groups five clusters and everyone's in one of those those clusters okay。So we'VE done that and now we now we're going to we're going to itrate so you can imagine a some for loop we're going to。Um do a computation that finds the the real center of the of the set of the cluster so the actual center of this cluster is is here so'going to move the center to there which take the average of all points get the center。
12:11
This center gets moved to here this center gets moved to here this to here this to here okay so step four is just finding the the real center based on who'in who'in your group。Now now do it again given given that this we remove the centerres everyone might be associated with a different cluster so you do that you do that computation again so we move the centers and now we we we we reassigned people to to the nearest the nearest cluster。And it looks it looks like this。All right。Go through that again quickly so we started off we wanted five we randomly guest five cents fantastic we compute the nearest center for every object。
13:12
I'VE got five five clusters now given that who's in each cluster I'm going to re compute the average of that cluster the average at the middle now I get the middle of that of that。Now I'm going to re comppute who who belongs to which cluster so everyone gets reassigned to the nearest the nearest red red bit and。I just keep going and going and going and going until it doesn't change anymore and it looks like in this example that's that's my answer。So I got five。Five clusters。Everyone's in everyone'in a in a cluster。I'VE got two questions I'got questions I'ask you about that but before I you questions happy to take your questions or or or clarify。
14:10
On on what we'VE got there questions from you guys。Up the back you're going to have to share thank you。Yes so that's a interesting question the question is do we are we ever going to stop and it turns out yes you are you can you can show mathematically that you will stop there's some there's something that's getting smaller and smaller at each step and you can show that eventually you'LL stop because of that so the the name that's convergence in the art yes yep。So how to select the cave valueue how would you how would you have selected k just eye boiling this how many clusters do you think there are。
15:00
Four five year so one two three four five it might be might be five they might be four um you ask a very good question we we have to select k and that'that's a choice and we might get it right we might get it wrong。And that's going To Be a challenge for us that we'LL come back To But for now take it on faith that somehow we choose k based on our expert knowledge whatever that whatever that is。Okay。Other questions or queries happy to take。Okay so I might be a question。For you guys on that on that note let's let's do this so。This is just from the from the slide。We'VE got we'VE got two two data sets and my question to you is。
16:10
If I ran k means on each of them with cakes too。Which one would I need fewer fewer for which one would I need to loop around fewer fewer times okay let's take take。Thirty seconds and we'then we'LL talk about it。Two data sets I'm going to run my algorithm on each of them separately。It's going To Give me an answer。For which situation will it be will it be。Fewer inters okay all right let's have a look so。呃。I but。
17:04
So a bit of。Bit of variance in terms of what people people are thinking I can I can probably understand if you selected a why you might you might say a because it sort of looks like。It looks like the clusters are more obvious and you'd sort of expect to to maybe find them because they're more obvious compared To Be I can sort of understand that。It it turns out that。Our our correct answer it's a sort of a trick question it's it's it's going to it's going to depend on it's going to depend on where where the centers are so I didn't I didn't specify that in the in the question depends on where you start depends on where you start why is that why is that the case。
18:04
So I'm going to I'm just going to show a DEMO from this this website it's it's it's it's on one of the slides so。I can choose some clusters I'm going to do this so it's giving me some data。You look at that data how many clusters do do you see in this in this data give me number shout out a number one give me another number three okay so maybe one maybe three let's go let's go for for three so I'm going to put one center here I put another center here I put another center here so suppose this is my starting point for k meanss。And then I'm going to click I'm going to click go。Still like this click go。So this is the first deterration of Kaine everyone gets associated with your center and now moves moves a bit re signs and making some small movements but small small small small small。
19:10
Come on I came out。Now it's finished okay so you end up with three three clusters and they look pretty much like what you'd expect okay that's I'd say that's a you know thumbs up for k means for that for that scenario。Let's do it again。But this time'I'LL put the centers I'LL do something a bit more pathological I'LL start I'LL start there so these are my three three centers and。Our ran k means that's the first one and I'm going to keep iterating it's going it's moving the centers moving the center moving the points moving the centers moving the points moving the points moving the centers。And now it'looks like it's finished okay so you can see that it'given up it's given a different a different answer to to what we did the first time。
20:10
Fan for clusters。But the membership the makeup of those those three clusters is different okay so'it's given us a different answer。The back yes I'm just wondering。Can we say that。That was。Okay so going back to the poll question you're suggesting that maybe'VE got some model of randomness or or whatever and maybe maybe on average the first one yeah that sounds correct to me I guess the point is that's just going to change for each choice that that was the key point but but you're right I agree with what you're saying to。
21:01
Okay sorry key point it depends on where you start the result depends on where you stop。Happy to take questions queries there or I can run the DEMO again if you like the DEMO yeah it be To Be independent okay so the question is wouldn't we want them To Be independent at the starting point yeah it feels a little bit disturbing doesn't it depending on where you start you get a different answer non termministic in in that sense。Just hard hard to achieve yeah。You don't get it there are other clustering algorithms where you will always get the same answer um and there's there's trade offs in in using those so this is a particular feature of this of this method。That it's very very widely used this is the most popular clustering algorithm on the on the market。
22:01
Good question are queries questions yes。Thanks to start in coffee to。Result how any。Okay all right so your question is a little bit similar。I think I know you're asking you're asking because I get different answers on depending on where I start could they have bad bad consequences because I'm going I'm going To Get different patterns in my data depending on what I do the answer is。The answer is yes what do you think you do I I agree with what you're pointing out what do you think people do in practice。Practice you do something very simple you just run at multiple times and you'get multiple answers and you choose the one that you feel is the most the most appropriate okay so run multiple times get run at five times get five different answers and select the one that you think is the best according to some some criterion that you think best is it corresponds to。
23:19
Sorry if you're a biologist clustering clustering cell you'have some biological knowledge that would let you make that decision。So answer your question。Any other questions query is there。Have a play around with that that that that visualization'is quite quite quite a nice one just to point out that it all depends on how you measure distance there's other there's other ways you could you can measure distance and you you get slightly different behavior to。How can you use this well one way one way you might use this is outline detection we we talked about outline detection didn't we so I take my data set I do clustering and then。
24:17
Here I got To Clusters and you can see I'VE got the center of each cluster which is a plalu and I'this is a bubble a bubble plot so around each point there's a circle showing sort of how far away it is from the center so the the points at the edge of the cluster a further and they are more likely To Be outlis。So I think about it if I'm on the edge of my cluster I'm more likely To Be strange I'm more likely To Be an outlier compared to the people who are near the center the center of the cluster。
25:02
Feels feels reasonable。If you need the center of the cluster you're typical if you're at the edge of the cluster you're less you're less typical。So you guys right up back there your outlis people in the middle in in lies。Now of course it all depends it all depends on how many clusters you choose and you can see as I flip through these it's your outlines are going to change depending on how many clusters I select so the key depending on my key I'LL get a different a different evaluation of of water what an outlier is。Okay but you can use this this a very simple way to to find out lives just choose a k one k means find people who are furthest from the center。All right so I'm going to talk next about the key business how you might find k。
26:06
But before before I do happy to。Talk about k means a little bit more answer any more questions before I do that。No more questions okay let's keep going so okay how do you choose k I gave you one answer based on your expert knowledge。And you might say well I'm not an expert I need more help what do I do。So what what we'LL talk about next is is is is some sort of assistance an extra algorithm that might help you sort of guide you in the right direction for knowing how many clusters there are。So。This is based on visualization so we're going to create a visualization we're going to look at it and that visualization'is going to tell tell us something about a likely k a good a good k and we're going to do a heat map to visualize。
27:18
Okay so。To do this we'got we'VE got a few ingredients to do this and the first the first ingredients is that we sort of have to。Mess around with our data a little bit to start off with so my data set I'VE got three a data set of three objects three rows top I'm going to convert it I'm going to make it look like something else okay so what am I doing I'VE I'VE got three three things。Three objects what I'm to do is I'm going to take every pair of objects and see how fair apart they are okay so I'm going to compute the distance between every every pair of objects in my data set so it's a three by three。
28:10
Table of distances so the distance between object one and object one zero the same distance between object two and two zero three and three zero。And then all the other distances so the distance between object one and two is eight point seven distance between one and one and three is eighteen point seven。And and so forth so。I'VE taken my data and I'VE I'VE got all pairs and I'VE just I'VE I'VE just made a table tells me about the behavior the distance between every pair。And this is calledd Dis similarity matrix。That's what we'LL call it?Okay so it's like if we got everyone in this room and you you you know you computed you met everyone else in this room and you computed the compatibility between every every other person and yourself and that will give you a row。
29:09
For object one that will give you a row of compatibilityilities between you and everyone else in the room。And then the next row is for someone else for someone else。But lots of people in this room if I'VE got a very large data set might might take a bit of time so if'I'VE got thousands of points so I'VE going to have to do a thousand by a thousand comparisons anyway。So I do that I do that okay so I'VE I'VE taken my data set and I'VE real I'VE I'VE done these these things these pairers and now it looks like this it's called a dissimilarity matrix。And it's it's giving me information in a different way。And now all I'm going to do is is have a hateat map I'm going I'm。Color color h h cell depending on on what it's value is so if it's if it's very small if it's zero it's going To Be black。
30:08
And if it's very large it's going To Be near white okay so that that that that that matrix on the left is visualized by this picture on the right so three by three in each case and I just colored it according to the value of each ofl。Okay。So that's that's a hate map and we saw that last last time。Okay fine。It's sort of it's we're getting there but we're not we're not we're not there yet。You can sort of see when you look at this matrix there's a bit of symmetry。You know the the distance between。Three and two is the same as the distance between two and three so there's a bit of symmetry in the visualization you can that that's just that this mathematical property here however it'it's it's not it's not going To Be what we want and here here's the reason why so。
31:20
I'VE got I'VE got a data set on the left。It's got sixteen things。How many clusters are in this in this data set if you look at the left diagram how many clusters do you see four okay looks like probably is four。Based on our。Visualization but if we if we do if we do this heat map business and if we just randomly。Order the objects you know in the Rose and columns it looks like a bit of a mess okay so。This suggests that the the order in which I lay out the objects going going down or across is is important I'm going To Get a different picture depending on how I how I do this ordering okay so the ordering you can assume it will be the same for the Rose in the columns but we're going to have To Get it right。
32:19
And that's where the magic of this algorithm is going to come in。It it does a better ordering for us okay so this was one ordering objects ordered in some some particular way object five than four and ten and thirteen here we'VE got another ordering we haven't specify what it is I'm just saying it's different it's better。Maybe it starts off with object sixteen and object two an object for the object one I don't know but it's different what can you see along the diagonal you can see these big black big black blocks。
33:02
You can see how many big black blocks one two three four。So I look at this and my。My visualization is saying well there's four four big black things on the diagonal'there's likely To Be four clusters in this in this data set。So you k。For is is a is a good likely To Be a good choice。All right so that was one ordering instead I apply this algorithm it finds me another ordering I do the visualization again and it it gives me useful information when I when I do it。How is it how is it helping me well it's helping me in two in two ways it's telling me。How many clusters there's likely To Be one block two blocks three blocks four blocks and because I'VE got the objects laid out。
34:08
Across the road or the columns I can I can tell which object is in which cluster。So I can I can look at each block and I'LL be able to find which which objects are in each each block。So I'VE got a starting point for understanding what the clusters the clusters are。Here's another here's another one this is the the iist data set so you remember those flowers we saw last last lecture hundred and fifty different flowers it takes some measurements so the flowers you do a heat map or or pairs of flowers some random order this is what it looks like it's not very useful but if we can apply our approach。This is what it's going To Give us apply our algorithm。It's going To Be called that v how many black blocks can you see。
35:06
Yeah you can see all kinds of things I tell you what I can say I can see I can see one big black block here。Yeah and I can see another darker block here。SOI。If you agree with me that there's two two two two black blocks now you probably also saying well I can see other things and and it's right there are other things as well sort of blocks within blocks。Like clusters within clusters but at a macro level。I think it's probably reasonable to say it look like two two black blocks so two。Two blocks where objects are very similar to each other the distance is low okay so let's think about it this one down the bottom right what is it。
36:05
What is this thing it's a black it's a black block。What's a black block it's it's a collection of objects and it's a collection it's a visualization of distances between those objects so all of these objects。A similar to each other because it's black near zero so that's that's why it might be that's why it's a a cluster。Okay so in the Irish data set you know we we could do this and we might conclude that this'roughly speaking To Clusters um。Now if you could remember back that far there are there actually three species of flower in an irus not too and there's a longer story about that one of the species is very similar to the other species。All right I'LL go back a step and I can take questions at this point。
37:02
So。We'VE got we'VE got a data set we'VE put it into a dissimilarity matrix we reorder it we get a visualization it tells me how many clusters there's likely To Be up the right at the back yes。'just wondering like this and paintvings so。Obviously like。See theying algorithms themselves so what sort of things might be。I will come back to that question at the end let me get through it and we I'LL say something about it at the end okay track fair question yeah yeah how do we do it what's the secret source is your question I will I will cover that next very legitimate question yep。啊,我们的是。
38:04
How did I do that okay so let's go back we'VE got we'got these three objects and I'm going to。First thing I'm going to do is I'm going to take。Maybe object one and object one and compute the distance between them okay would you happy with that being zero yeah then I'm going to take object one and object two so I'm going to do ten minus five squared plus five minus ten squared plus ten minus fifteen squared take the square root of all of that and it will give me it will give me a number which is eight point eight point seven。So just that you in distance formula that we that we had a little while back plug plug it into there。Squared squared distances or with a square root after they'VE been sum up。
39:03
Other questions queries。So I'VE explained to you what what we're doing but I haven't really explained how how it does it。We'VE just got some approach it it it does these distance calculations and it it then does a reordering and we visualize that and it's a useful a useful thing for us to work with。OK。All right maybe it'LL take a Paul that's a good idea。So let me see we here。Okay we'got to Paul。So。Help a little bit to clarify if you're following following what we're doing so I'VE got to I'got。
40:07
Two two pictures of data two dimensional data。And suppose I applied my my algorithm to it To Get the blocks and the reordering can you match can you match the the data set with the visualization the block visualization。Take thirty seconds。And m。Make a selection。Feel free to discuss with anyone around you that's fine too。
41:25
At ten seconds make a selection。All right so。杯,One杯。To to a。This time there's no tricks and that's that's what we'VE got。
42:00
SHY。Just talking about it in the first in the first one。You can see three clusters they're very they're very separated aren't theyre very distinct from each other fair far away。Compact and well separated。The second one you can sort of to say three classes a bit a bit fuzzy they're all bit sort of mushed mush together。Bit of overlap not really clear what's。What's happening in the middle these middle points I don't know what cluster they're really in bit ambiguous so。This is reflected nicely in our in our in our approach this one on the right number b we can see three very clear black blocks everyone's in a block three good clusters it's likely To Be number one this one on the left it's a bit fzzier and there'three three blocks well there's sort of three blocks and I don't know what's happening down the bottom here that's probably the points in the middle and then not as black so it's not as not as distinct。
43:09
So two two way is is going To Be a better a better match okay。So that's。That's what we'got there anyone want。Ask a question or clarify。Okay。All right so the question is how do we how do we get these blocks okay what's what's the what's the algorithm machinery that would produce produce this。And as I as I said it's all about the order in which you in which you lay out the objects along the the rows or columns depending on the order you get something it looks good or you get something that doesn't doesn't look good okay so。I'm going to I'm going to there's some peo code for this and you'LL I'LL show the peo code where is it here's a peo code and you'look at this you go ah I don't like it it's horrible so。
44:10
We're not doing to our code first instead will'do a visualization and get the intuition about what's happening so suppose this is my data set I'VE got one two three four five six seven objects。And all I need to do is find an order for those for those objects so the first thing I'm going to do is I'm going to I'm going to select one and it's going To Be one at the sort of the edge of the data set okay so I'selected it's this one down the bottom the bottom left okay so I'VE selected someone on the edge。Okay now I have to so that's first in order now I have to work out who's next whose'second in the order。The person who second is going To Be the one who's closest to the people we'VE selected so far okay so this one is is closest to to what we'VE got so far。
45:05
Now I'm going to select the third one it's going To Be the person is closest to to what we'VE got so far it's going To Be this one。Same one here so now I'VE I'VE got the first four people in my in my order one two three four now I have to determine who's fifth in the order it's going To Be the object that's closest。To everyone we'VE got so far it happens To Be this one at the top。Now I select the sixth person in my order it's going To Be the one who'who's closest to what we'VE got so far it's going To Be this one and then the last person is this。Okay so all I'VE done is I'VE I'VE sort of traversed or walked through the the the data in some order I stad on the edge and then I took a greedy you called sort of a greedy type type type step I just select the next person who's who's closest to what we have so far at at each point。
46:08
And that gives me an order one two three four five six seven okay and this this is the order in which the Rose and the columns are laid out in this in this visualization one two three four five six seven。So。This is what the pseudo code is saying it's just that saying in a in a mathematical way rather than me me talking about a visualization。So。This is what's going on and the shootout code is is actually what gets executed happy to take。Questions yes。So I have to speak up second point why is the second point so that's the first why is the second point here。
47:08
Fifth three why is the fifth point。I don't know if I'VE got your question or not so this is this is the first one。I just selected someone on the edge that the second one is the one is is someone is is is out of all the remaining ones the one who's closest to one of the ones I'VE selected so far。Okay so the only person I'VE selected so far is this this this thing down the bottom the one who's closest to what I have so far has To Be this one。Are you happy happy with that。Force point to the this。Well I got four now that goes to five。So。Out of these three points I have to make a choice I'm going to compare each to them to what I got so far and the one that's closest happens To Be this one。
48:07
Okay question out the back yes the was。I still don't understand I'm sorry can you say it for me differently。So。So yes one clarified I'LL be going。Voice and then choosing the next question of the fourth。Okay so by choosing this is yeah this is why we have pseudo code instead of me waving my hands about diagrams because what I'm saying is ambiguous you're right so you have to look at the the shootr code and and effectively what you're doing is you have to select a pair of objects。And they have To Be the most similar objects and one of them is in the set of things we have so far the other is in the set of things that we haven't selected from yet so one is a black one is an open circle and it's it's the minimum if you selected that way okay so it's out of all possible pairs of。
49:19
A black pair with an open circle what's the minimum and that's the one I I select。Have I have I answered。All right think about it I'LL take one more one more question yes yes so I haven't。Okay I just I selected the starting point I said I just selected something on the edge and。Effectively a heuristic for doing that so an approximate way of doing that is I just find the two the two points for the least similar and select one of them and that will approximately give me something on the age not guaranteed but but approximately all right。I will leave you with the task of。
50:03
Going through that and mapping it to this in terms of what you have to know you will have to understand what's what what's happening here in the pudo code okay so that that that will be expected and a good way to do it is to relate it to the visualization you will also run Python code that does this as well all right let's leave it there。
51:04
Talking about we connect to five so like this one so that was what just the final step which step talking step well effectively what I do is I look at all pairs of objects I find that two of the thirdurthest depart it would be this one and this one and then I select one of them and it just happened To Be that one so just this rather this is this is the one that's close we distance close thank so just on that the last suit will be be able。
52:04
So would we be expected to we up to readpaure stuff from that memory so yeah so was in fact an exam question once where I gave that on the question and just explain what okay isn better way to find just obviously if you can if you find the distance between here expensive its really got an object yeah and then when you already have a group ofary chosen yeah and you next one you have to compare here to compare the distance between that every point of part things you yeah a very good observ so it swallow so you might have to sample from the objects is is one way to do and so throw away so you just keep keep a small number。That's that's one possible way to do it。Good observation。When you join this graph to figure out the ukalian this time each other yeah the order of the color and matter when you of graph it won for this calculation it all I'VE done is of so there'number eight point seven it just the e includingian distance between one and two。
53:16
Row one road to or you the off one remove the ID we're not using that in our calc so it's just ten minus five squared plus five minus ten squared plus ten minus fifteen square。With a square root。Plug it in and see if you get that point said if you don't let it means something something that got wrong。And I want ask is this one object one and this one is object with the gra different it will look different in general yeah so if if I if I have three at the top instead of one it would give me a different picture here。And there have any impacted influence um well it I I'd like To Get something like this because it tells me how many clusters there are。
54:00
So that's why we'VE we'VE got this probably complex looking algorithm it job is to work out what's a good order。To do visualization okay。Like。Can we know when we should use the can be plaa or when can we can we use the v。Together use them together so what what you would probably do is you you take your ass set you use v you work out the k and then you run can。I didn't quite get time to say that at the end but you're right。You the query that。对。
我来说两句