00:00
This material is made available to you by on behalf of the university of Melbourne under section one month three p of the copyright act nineteen sixty eight it may be subject to copyright for more information visit the university copyright website so so just to say um hello um I'VE I'VE been in um I'm James baileley I'VE been involved with the subject I things the third the third year now you'VE been in good good hands with with paullane she'LL be doing the rest of this this week and then I'LL be doing a block of lectures from next next week so you'LL you'LL see more of me in in time for now back To Back to Paul。So I guess we will make a start。So that's the plan for today so last week we sort of introduced the overall picture of xml and so initially we started by saying that how do we deal with so we introduced last week about um overall concept of what what is the database system what is the um structured database system and what's the power of the that kind of system versus less structure data set data format and then we introduce how we can find patterns。
01:36
Ah in three takes format so that's regular expression I will not go through them but that's what we cover last week then ah after that the next popular format is XL OK so today we'LL finish the rest of the topics on xm then introduce the next very popular data format that is Jason。Okay so have you seen this slide before。
02:04
Those ones is Friday afternoon but that's the same slide from last week right so we went through this subject called subject guide and this is my xml OK so for those ones who still remember um so basically this is a well formed xml what do I need to have in there。Come on I can't give you too much time because then I run out of time anyone say one thing you can have attribute that in greentake so an element can have attributes and come on one more。嗯。Text yes so you can see these taxag are all meaningful I I made up the the the text OK so xml gives you the power that you can create take the text that specify the elements in a meaningful way。
03:04
OK and in terms of having an xml well formed one first of all you need to have a decoration first at the first line followed by a root element and all the elements has To Be properly closed with an open tag close tag and I can have optional attributes yeah and finally the elements has To Be well nted inside the root element。Or clear so this is a data format it is not a programming language it just how you write up a file just like you write a word or you write csv file this is how you specify how you put data around structure。Okay so that's that's that's what we covered last week so。Previous the previous slide looks fine okay but it's relatively simple okay so I'm going to introduce you other elements in the xml format that that helps you to deal with more complex cases。
04:11
OK so what I'm going to introduce to you is name space。Okay so um you are free to look up you know do a Google search to find exact specifications of how you do name space but I I figure the best way to learn is by example so you look if you look at this example OK don't worry about the the top one for now just look at this X examl fragment of code。I have an element cor textbook。Right and I have another element lecturer。Okay and they saw I'm you know busy trying to make up a ex examl document that's something to do with university life or the management of university resources and human resources and books libraries etca right so this is what I'm intending to do um but then I got into a trouble if you look at the the child element of textbook and lecture I both have an element called title。
05:20
Okay so um what happens there when I say if I just the the red lines I cross out if I just say title TY for data analysis or title professor um it's confusing。Right so this is ambiguous I'm not quite sure which element I mean because the the title on the textbook is not the doesn't shouldn't have the same meaning as the title on the lecturer okay so this is when name space comes in。Okay so you can see I'VE done it the right way a sensible way a good design way where I。
06:03
Just take my words for now the name academic and book。Is the prefix that allows me to differentiate the context。OK so the way I can do that is to is by adding a special attribute。In my in my the elements in the in my elements and then so the special attribute is called X ML Ms。Okay what does it stand for have a guess yeah it's xml and short short for name is for name space。OK and look at the structure um just look at the second one and third one I start by the attribute name X l m X m l n s and then a colon。
07:01
Then a prefix sorry that should be a prefix I should change this sorry。Okay this is I can't change it so here I call it wrong name I should call it prefix OK so book and。Academic is my is my prefix that's the the the the keyword I used to differentiate context OK and then each name space is DeFine by a unique universal resource identifier so that's normally in the form of URL so I have to have a different URL to uniquely identify each name space。OK so for example in the book one I have UN Melbourne text and for the academic I have UN Melbourne stuff。Okay so after after I'VE declared these two then I can easily use STEM the way like like I'm showing you here okay the I specify my my elements the same way but my on my Tech name I prefix it with the prefix colon so that's how I I can differentiate the To Context。
08:18
Any questions so far。I hope it's clear okay now let's look at the very first one。The very first one has no prefix。Right I didn't have a colon something。Okay so when I don't do that that's my default。That means all the elements I'm not I'm not using a prefix song that'I'm referring to that default you you name space。Exactly clear so the the tool that I crossed out if I use both that means stay up from the same context that is the default context the very first u r。
09:07
Yes。Raise your hand you think you got it from the example。Be more confident okay now that's。This is the description of it OK the scope of land space you have you you know once I declare I can use it anywhere inside the elements including all descendants。Okay and when I don't have a prefix and it is a default one。No questions okay then look at this one。Okay so when I look at this one I feel like crying。
10:10
Okay so maybe I change the slides too fast in the previous one actually name space I can put it in any elements and if I redela a prefix with something else and overriting my previous decoration。Okay so if you look at the here。OK in in this in this ah really messy exam with many name spaces you can see here I declare a。Um okay I'm not good at using this I'm declaring a prefix a that refers to u r l a right。You all see that。And then somewhere along the line or so because of'DEC declared it I can use it here that means when I am declaring specifying a header element I am referring to the this context。
11:10
And as you can see after I declared I can actually use it for attribute as well。OK now let's look at b so I declare b in the root say this is a root element envelope I declare one name space and then I come to the head elements and I'm doing that again this time with a different UR URL OK so in this case if I specify a。I'm going To Give you the answer so so that means this b prefix b is overwriting this decoration inside this context。So it's like programming basic scope OK if you write the function for those ones who are used to writing more complex programs with functions if you write the function and you declare variable inside the function the same line is the global then you know the the you'VE got To Be aware of the the context that you are referring to so in this case ah if I'm referring to。
12:18
Name space b inside these heather elements and referring to this UI。Okay um okay so I'I in in the so maybe after lunch will wake up a little bit I'm going To Give you a quiz to do so could you please open your laptop or turn on your phone I'm going to show you。The exercise。
13:00
OK so referring to the same slide the lecture slide that I'LL show you so I can'maybe I can Switch back but so I'm going to activate this。OK just do this one first so inside that Macy xml there is an element month march close so this an element month value march okay I want to know which context which name space this element belongs to。OK now you can hang on hang on just give you give yourself one minute you know if you're not convince or you're not sure please talk to the the people around you。OK。Do that first please if you are not sure you can check with the people next to you before you enter the result。
14:37
So you should see question like these。Have you entered the result yet。Come on give you a try。So I'VE learned like I'm not going to show the result people are still working on it because。
15:01
I want you to if you are not sure talk to the person next to you or before or after you around you remember that you made friends in the first lecture。If you have made the first lecture you can still make friends now when we have discussion times you are allowed to make noise。So when we have the total result reaching hundred and twenty hopefully we got at least here then we review the outcome by the way this ABC is the choice it is。
16:42
Not the prefix。For more for more。
17:10
You only have time to do。OK oh my god have you the next。OK don't don't Miss with me don't put don't purposely put the wrong answer okay now let's have a look let's go back to the。
18:04
So which one are we looking at。Right。What is the parent element of that。That month element。Body yeah okay so inside。Body so basically my month doesn't have any prefix right so it is referring to a default namespace somewhere。Okay and then so I if I try and work out the scope I'm looking for my immediate um you know the closest the scope that is closest to me so for months I will look at body and inside body there a text。
19:02
Oh my god I don't have the right answer okay so ah with the so ah the the parent element is body right so I didn't have any name name name space decoration in body so then I have to look our words。Even more so then I look our the the the next scope that in closest my the body element is actually the。The root element right so the root element is envelope and because I don't have any prefix so it refers to the the default name space。In the root element。Okay so the answer is。
20:03
三。What's the answer?Let's see。It should be the root the default name space of the root element so like it right。Which is。Default right。I don't blame you because well I might taste you but actually don't blame you the thing is it's very easy to make a mistake I almost make a mistake while I'm working together with you over there because。I almost think it is a new default。But new default is just a sibling it is not inlosing the element I'm looking at。
21:04
OKSO。OK I don't blame you so don't don't disappointing the reason is I'm showing you this xml in space but it is such a terrible design okay now we have to expose you to this because you might not have a choice sometimes you know you walk into a workplace and you are inherited with this kind of document。Then you have to make sense of it you have to have the capability to make sense of it but if you are the one who is going to create a complex examl document with namespa right and if you write it like this I'm going to fail you the reason is this is really confusing and it shouldn't be like that you know a good design is like what I did earlier。Like that。OK I don't on this sometimes really really complex ones you might not be aware that you accidentally reuse a prefix but in general if you can avoid it you have to avoid it you have to make the context quite clear okay and the best practice is to actually put them in the root element。
22:18
If you can。Okay so that's that that's our name space。Um。Good so that's the end of it so you have know what is a well formed xml and when it is complex how do you differentiate elements by namespa differentiate context by namespa OK and then just to complete the picture what I'm about to tell you you just need to know as a general knowledge you don't need to know detailed technical skill about how to do this kind of thing but you need to know later on if you。Encounter this again and you can learn from there um so basically xml is very good with the stemma so you not only can you DeFine the the semantics the structures to suit your application you can even have the there is a very well DeFine the supported xml schemema to allow you to do validation OK so you don't if you if you。
23:23
DeFine a set of elements or allowed elements To Be processed by your business application then you can utilize STEM to make sure that you don't get some rubbish elements or elements that you don't know what to do with。Okay so um so because of these sometimes we differentiate like an xml can be very well formed they ma follows all the synta rules but it might not match a specific schemema。Okay so there are some XL has a lot of support to do validations and do that and you can even in parton library you can do some validation with it。
24:15
So you can just read on okay so you have。And special one for HTML as well。This is for you to know you're not going To Be。We don't we don't really worry too much about。The technical side of it for now OK and then what you need to do to is To Be able to manipulate xml files using these Python library OK you are going To Be covered this kind of programming exercise you going To Be covered and taught by the tutor in your workshop okay so I won't go over them but you if you have any questions then you know we can follow up with discussions。
25:06
And then xm we can also pass xm ls that's just some utilities there。Any questions so far。So those ones who didn't answer be earlier with this name space you know trying goings if you have a different answer talk to your classmates talk to people and try and work out why that's not the case。Okay and so xml has the sort of typical applications is is um you know if you have a special purpose and you need to reinforce the semantics really well then it's got a lot of applications there so。We will contrast the the next data format with exul later on but just for you to see。
26:03
You know this is another example of so it's not just business context or some simple things even you can use it in mathematics in chemistry you know it is you can just DeFine design design created to suit your application need。OK so this for example I'm trying to it's math ML I'm trying to represent this mathematical formula。嗯。OK so just going to so that's the end of xml question is similar to math ML。Um。That's a very good question I don't think so so math ML maybe from you。
27:05
LA is a totally different thing。Um。Can I come back and answer you that later because I don't quite I don quite know whether I don't think so because I take is specialized for representing you not to。Produce。Like professional publication kind of。He to egg no no no so it's it's not because you got special things to represent math more than things so LA take is a lot more than math you have a very powerful way to represent math formula of things in there but so I don't know quite that is xml and I don't think it is but um LA is a lot more powerful it's got a lot more other other things you can do include yet so。But I will find out the exactly the differences and I reply later um so the so the next one the next data format this is very popular very very popular now actually I should show you。
28:12
Um fifty give context so we have in in the currently we have a lot of ex examl document out out there so。OK realize I got these many things open okay just look at this diagram that's when when all these different format come to existence OK so earlier on many much more early on these data simple is not that much so we have csv in seventy eight and then about nineteen nine after ninety six xml starts。
29:04
Then Jason really become popular and being discovered by a single person about that time okay so sometimes you think why they are xml why they ared Jason actually they have different strength and weaknesses ah and the fact that some of the files might be more suitable using Jason but it is not being used so earlier data files using in xml so。'just a bit of context for you。Okay and。OK so I think if James is giving this lecture here call this Jason and I call it Jason and he doesn't really matter because the the's not really invent but that's call him the inventor doesn't really care what you call it okay so ah I know some people say ah because I have got I'VE got a friend called Jason orve got a friend you know in the teams called Jason so so that's what I don't want to pronouns JA's up to you okay um。
30:10
When I first when I first got introduced to it somebody called you know I heard it this Jason so I'm used to it so this is no right along there just To Give you a contrast。What is the right hand inside。What data format is the right hand side。XL we just spent half an in this lecture the first half an hour and the previous lecture most of the previous lecture looking at these yeah okay the color is lost for these you still recognize it um and look at the left hand side that's what the Jason looks like okay。Don't be intimidated by capital letters。
31:05
Okay so if you can see it'simpler。It's you know it's less overhead it doesn't you know I just I just have a key value pairs I don't have to have open tag close tag so in terms of structure is much simpler and the rules are very simple too so。So it is a lightweight data format it's extremely suitable for transporting。Data across Internet okay and initially it is it is seeing as like a being part of the javascript so what is javascript does javascript basically wants to interact from the client the browser to the server OK so you need a very light way to pass the the object data objects backwards and forwards and so the strength of Jason is that it is lightweight and is really simple and for lucky for all of us it is really easy to learn。
32:12
OK so as an example。You know apart from so if I look at don't worry about the the curly bracket first so if I look at the first line okay that is a key and value pair OK so that's a simple simple element in Jason is a key which is a string so you have to quote it。OK and then the value is。You see from if you look at this slide you can see you can be a string which is quoted or it can be a number。Yeah。And then a more complex structure。
33:00
And continue like this。Okay so basically in this structure I have a。So the next one a phone number okay this is my key。And the rest of this the inside this square bracket is my value。And finally this is a key this is a value。So I'LL skip this。Exam representation because you all know OK so object data let's just go element by element object data so at the moment what I'm first of all I'VE shown you an example now I'm going through the same tax the rules that makes up a valid JA Jason format OK so one first thing you need to know is that object data is in name value pairs。
34:06
You know like I show you for example this is a simple one OK。Fe by column and。Value can be any of the following things。Okay you can be a number like age twenty five you can be a floating point like price hundred and。Hundred and hundred dollars and five fifty cents okay it can be a real number a can be a string so if I double quoted even if it's one two three four it is a string。Yeah and then it can be a boolean so I can say。And my tall and false or something okay and then it can be an array so you'VE seen this before and array looks like。Is in a cur is in a square bracket。
35:04
This。OK specify my array。And so in this case I'VE got an array of like two objects。And then you can and an object is complex so you can n it so an object an object can be key value and the value can be an object itself and that's how you nce the structure yeah。And he can be a nu。That's an example okay。The race is。Square bracket so the syntax is really simple if it's an object I。Enlosed with the curly bracket and I just listing my key value pairs as I need along and if one of them is I happened to take multiple values and I do then in my in my value side I put a square bracket around them and then I can list my elements there okay and that's about it。
36:12
So with this few simple。Structural element you can build the nested structure quite easily。That's it。You happy it's really simple。That's pretty much it okay so that's why it's really easy to learn and it's quite popular um and then I should show you。OK I'LL show you I think I still have time maybe I want so go to here to see what so a Twitter is is is in a valid Twitter。Data is in Jason format so you go and have a look in this website in this URL。
37:00
And so for you。So we can also manipulate Jason files using Python so the package the library to use is called Jason。Okay so be very careful when you go when you go home and do your on test don't call you a test file Python dot p y or no don't call it Json do p y OK don't get confused with the package name call it something else call it my Json or my taste Json okay they'will save you confusion but if I import the library the library I can actually load a Json file。OK so it's called Jason low that's a function name OK and then if you look at。Ah the the whole thing in the bracket it is actually a Json string。OK so that's how you can load。
38:00
Jason file into your Python program。And then you can start working with the elements in the in the and the next one is the reverse。OK so what does the next one do。The other way round yeah that's what you said but what is the other way around。Yeah so if you look at the thumbs I'm actually trying to write it out the JA on format。Okay and what I passed on it is actually a。Data sort of data object。Okay so in this cases in array。And inside inside the array the。It's got。To you know keep two elements and then the the second one is actually a dictionary so youve got To Be so there is the one to one map between those those ah。
39:03
Python basic data structure type into a Jason。It's there a one to one match for example string to string and ah float integer to number and if it's a list or it's a tale then it becomes an array in Jason。And then if it's a dictionary then it becomes an object in if it's a if it's a dictionary and it becomes an object you know the curly bracket in Jason file。Okay so you will also practice this in your tutorial or workshop。OKSO。Basically I'VE talked about this so remember we talk about xml xml has you know it's kind of a quite strict and quite has this focus on on semantics on structures on the validation of the structures okay that's xml and Jason has more focus on I want lightweight in our speed is important and。
40:17
So they they have different purposes right so Jason is very popular still To Be used with Java script and basically if you have if you are developing the interactive web applications then Jason is the is the choice okay but a lot of like really important um data exchange server to server data exchange is still happening with using xml format。Okay because we you know for example if you are going to um you know a lot of all the financial transactions financial institutions that all the banks remitance business when they need to follow the anti money laundering law they is a specified ah valid transaction report format that they have to report to the government。
41:12
And so in this case we use xml and the scheme out to make sure that the transaction report both the government receive is is valid。Okay so they they have a different purpose there and so they should be able to complement each other quite well okay so maybe we developers love Jason but that's not the whole world。Okay。And this'is just a picture view of the syntax。Okay so。Array is a list of values common separated and object is a。Key value pair and the key is always specified string。
42:02
Cover that before OK that's just test this now I have this let me repeat my question what is this。Exam are good um okay so ah I'LL give you three minutes okay try and see whether you can convert these because it's quite simple so we don't need exam of it can I can you can I ask you to try and make this into a well formed Jason file。You are allowed to make noise now and talk to people next to you。
43:07
I showed you answer like two seconds but I covered it anybody wants to try have a goal how do I do this one。气什么?In Jason。So I can declare what how am I going to name my my my object the object is like a quite easily mapping to the root element isn't it so I can have a starting my Json object。The the key as person OK so I can start saying that I'm going to DeFine a person JA on file for person。
44:11
Too much is it too much to do to try and do one in the class。So raise your hand if you want to do it。Or you want me to show you。To raise your hand if you want me to show you an example how I write this。OK down raise your hand when you don't want to even see that。And you don't want to do it you want to do it at home。OK you。So this side is actually listed on the lecture slide so you can go and try yourself okay so you see my object has a key the objects called person and I have a key the I called a key person。
45:14
Right and then my value is actually a object again so I can my value this time is not a simple one is an object and inside the object I have two simple key value pairs for first name and last name。And then an an array of relatives yeah and finally the favorite beer。This is not the only one way to write it you can write it a different way。But I think we won't have time so I won't show you that。You can try these out yourself。
46:04
OK so and finally I need to mention is that so when I early on I I compare and contrast xml and and Jason I say like okay xml is really good when you need to you know when the the scheme because xml is richer is is not a light way but you know you can specify more things the more context inside you'got main space you'VE got so many support for passing for rendering the presentation so um it is it is。Although I said that and so the schemema the schemema is really a strong point however as you can see I'm going to what I'm what I'am about to say is that Jason also has skima。Okay and so the thing is because Jason was so lightweight and then so people absolutely use it and abuse it abuse it use it like to its full strength and sometimes it is really hard to control to hard to control the data quality of Jason file so ah retrospectively the the stemma comes along but it is not not that people don't really know much about that don't really use it much but they they one there you can。
47:19
You can actually validate。OK so I want'show you how the website to validate it but basically with Python OK you can you you have libraryaries for xml library for Jason and later you you can also find other packages that you can do the conversion between the data format。So we。Xml Json csv and HTML you know this is the picture of how you can convert the data format around。
48:00
And you will have to know how to use these two packages for that。OK so that's basically um I skipped one of the DEMO the validation but that's doesn't matter that's not important um this is the list of things that kind of concludes the the the list of like out um things that a checklist for you to see whether you'VE got the main points of the first two lectures is'good you know the the the lectures for the first topic about data formats okay you need to understand what is ah you don't need to understand you don't need to know details about what data a systems but you need to understand the difference you need To Be able to write simple um regular expressions OK you need To Be able to do all these OK so I'LL see you next time。For those still sitting there further for you To Go the subject and learn yourself。
49:25
So I the lecture this lecture previous I talked about many many things so sometimes found can deep blur like what you need to know practically what is a background information that is essential。
50:26
So okay so because you remember subject subject I created so it might design so you can decide if you if I ask you to write the yellow blue ninety nine point nine nine ah with X m l I have elements attribute so you can actually people design for example you can make balloon as a elements element and then yellow is up to you you want to make it as a child the element the the child element of blueco color or does it really just ant inside an balloon object balloon t so if I can I。
51:26
So。Right and then you can close it。And then in here so you can actually for example you can you can make it as a child element。
52:04
OK so by communication up attributerite so I I can I can with this piece of information I can it's my choice what what's more meaningful what's easy for me um to make it this or to make it as a attribute。So that that's just one piece of people and you up to it's up to you for the price as well。Yeah and some of the extreme case will be I can do that it's not useful at all but I can do that in terms of same text it doesn't it is not wrong。That is also a valid。Is close so so I'm actually not treating them as a three different piece。
53:10
If the mouth I understand。Okay I take that's okay I have here。嗯。This one。'。The thing is it is on the a context is for body so if I want this month To Be also on a context I should put an a in front of it。
54:09
But is it the most and context okay so no it's not that so if I。So this context right is also a。The scope of a is actually the entire thing yeah because I'm declaring it here so I can use it anywhere but that's only when I use it right now this month if you want to have an if you want to say it has a context of a then you really need to declare it here。Because in the a I didn't declare anything so there was nothing so the fact I use a for body is independent of what I use for month。So I can use it I can use whichever one I want。OK so when I say you can use it anywhere in the context is for just use a different example for you because I declare all these in this envelope things so all the descendants on the envelope are allowed to use any of these。
我来说两句