00:00
So hi I'm Jim curz I'm here with beent to talk to you about the latest black school in vector search for elastic search and you see。So today we're going to dig into the elastic search as a VE a debate just a little bit of history elastic search started as a search engine and the store primarily use for text search and analytics on semi structure and and structure data。So that's the uh that a picture that we that we were using until now we call it spae vector but it's quite fasant ah and with the uh advancement in machine learning uh the increasing need to and end mix that type uh we introduce ah not so recently but we introduce the what we call dances VE。
01:00
So dons VE here it's the instead of having the spouse VE which means that you have the operation context text that is translated into keywords and for h keyword you have an invested list that gives you uh the way of the documents uh so if the documents contains to CU out or earth for those vect operation its completely differents you staff from a text you translate the text into a certain number of dimension which are f floatting points and that's that that's the structure that you need to search on。Today we will explore some of the most recent announcements that we'VE implemented in that domain so we will focus on the on the downs VE side but of course we will also show that the mixing of SPA VE and DA VE is one of the main capability of elastic search。啊。So to make elastic search is a vector that it told starts with the new capability and like every new capability in elastic search and by extension in you see the library that we use for search it told starts with the data structure。
02:15
The data picture that we use that that we introduce is called yeahi called navigable small world GF it's an extension of the navigable small worldra the main seeing that you can retain from this uh from this from this char is from this diam is really the fact that the yeahical aspect of the datatructur makes the sing of the nearest na much more efficient this is the state of the data structure that is used for efficient nearest name of search and this is the data structureal that we actively participated in integrated intudu。Ben and I are com losing comer and p m c of luine and so as many of the engineers in elastic search the we start by adding the capability in rutic search and of course the next step is to make it accessible in elastic search just like any indexing data to that we support。
03:22
Over the recent shares we'VE been deeply committed to the seamless integration of this new data structureure into our system setting it as a fundamental data pictures like any of。Wind introduced the net filtering capabilities leveraging the rich filter capabilities of the elastic search d cell of the elastic search language that we use for query。Additionally and to make the elastic search platform and then to end semantic search platform we integrated the capability to create the embding directly inside the platform so instead of searching the directly with your vectors we can search your text the translation from this text to uh to a DOS VE is made to a model opens source model or a private model that we are developing the earliertic。
04:19
And the search is done as the last step。The other capabilities that we adding is really about making sure that the eyeberid search the way we call it is completely integrated and easy to use in inside elastic search which means that you can mix the searching of your dense veor and your path vector using very simple appstraction here I'm just showing how you can risk score using the doctor's veor so you could have a bm twenty five query that is getting some uh o document and you can use here a simple script to risk score your document so it's just to show that we don't always use the h n s w you can also use the dons veor as a riskoring step after search。
05:08
So that was just a glimpse of what we do but now that we have a grasp of the fundamental of our vect of such capability in machine in elastic search ED machine sorry I pass the I will leave the stage To Be we will DA into the latest announcement in this domain and provide a glimpse of our upcoming development in the future。Thank you Jim uh he said he'd leave me ten minutes and I have less than that so let's see how fast we can go through this so I'm want to talk about some of the recent advancement and some of the things we're working on now one of the one thing that's available today is SIM d operations inside of vector search so SIM d just means single instruction multiple data this means you can do one arithmetic or specialized mean your algebra calculation over multiple dimensions of data in a single CPU cycle。It's a lot of words it just means equal faster so。
06:05
This is important for vector search particularly because we do a lot of repetitive calculation especially for dot product dot product takes a lot of floating point operations and multiply floating points together and sums them together being able to do that for and sixteen times faster is a significant improvement for indexing and searching vectors so here's an example of a traditional。Kind of vector or floating point edit operation you take two numbers and you ADD them together if you were not using SIM d you would be doing this single operation per cycle with some d you get true parallelism inside of the CPU core and then you can do as many as four a or sixteen operation that had given time what makes this exceptionally powerful is that it's not just for floating point but you can do this for bite and integer and short values and。The various links of your data ends up meaning you can do more or less operations at the same time。
07:01
Micro benchmark showed this is four to sixteen times faster in elastic search it's over two times faster depending on your CPU architecture and we natively introduced this to theine so that it is tightly integrated in our stack and you get it just out of the box。Something else that we have today is multireated search over segments so elastic search has always been composed that's more than one charardd per index it's individual luose shaardd over multiple notes and we'VE always been able to search those in parallel every shaardd has more than one luosein segment segments are read only structures inside of lusine and we have more than one of those and then every segment has its own aist w graph。Before we would have to take each individual segment and we would read it one at a time and so you runtime scales linear with how many segments you have we have recently introduced the ability to run this in parallel and so we take better advantage of how many CPU you have on your server and we can do massive parallelism to reduce latency over multiple segments。
08:09
Now to level set to introduce the next thing we're working on is I want to talk about vector memory bye requirements typically per dimension of a vector you end up having about four billion options inside of a floating point number which is way too many thens embting models don't use all that information they store this very inefficiently but that's just how it is so end up getting this full fidelity but once you're talking about three hundred seven hundred fifteen hundred two thousand dimensions。You don't need that amount of fielli per dimension so you can reduce that and have really good costs to just eight and you can reduce it even farther to half bittes which would be in four which is surprising because that only means you get sixty four numbers per dimension but because of how many dimensions exist in these larger vector models。It ends up working out all right so this is something that we're working on right now in lucin。
09:04
So as I said before typically models in bed and float thirty two and for a given vector let's say let's say you have e five small which has been built by Microsoft and is a very good model it embeds three hundred and eighty six dimensions and so that's one point five k a bytes per veor which doesn't seem like a lot but when you're talking about h and s w and needing to hold these in memory that's only about five million vectors for eight gigs of RAM。If you have bite quantization you end up being able to increased by four X the amount of bite you can hold in memory。So we're building this right now in luin and it's very simple algebra to take the floating point values and transform them inTo Bite values it's like high school level outergro you I remember foil first inner outer laugh anybody remember that that's exactly what this is and some very simple statistics so what's end up getting is of four times reduction in space search ends up being just as accurate because of higher dimensionality and ends up coming out in the wash statistically。
10:08
You end about getting two times faster search because of c d and now you can actually do more dimensions more calculations at a single time perceivep u cycle。And because of how lusin is architectured。It ends up being ends up fitting perfectly in lusines read only segment architecture this means that as as you get more data we have typical merge points perfectly for segments for us to ensure that your recall isn't drop if your data shifts I don't know if anybody else that really does that we can。For hybrid search we have had this for a long time we have you can use filter results VIA GE numerical text and any combination of all these for vector and spars search。And you can combine vector and spars search in various ways one is r f which is a new capability this is provides very good simple no fine tuning results you just give us the results and we'LL combine them together and you end up getting the best of both worlds for the result sets it is very simple to think about and how it combines spars sentence retrieval and it provides a good out of a boxx experience for users but that's not all that we end up having and all that we can provide as far as combining sparse models together you end up having linear boosting and combining of your features this is something that we'VE had for a long time and you can tightly control and how you boost each individual query and you can。
11:39
Cater this for your for your users and individual needs and you can learn over time what's the correct boosting for users and end up boosting based on geo location and all this information for users so they get the most relevant results for them and this is for both dense and sparse retrival。Eh one other thing that we'VE been working on that is coming out soon is maxim in or product eh maxim in or product is a simple kind of calculation where the magnitude of the vector into adjusting the score so for coine similarity uh it is a simple arediian space calculation but for maximum or product it is non nuilian this effectively means that a vector is no longer closest to itself and distance isn't so simple to think about as a human but models handled this exceptionally well and so does h and s w so you can see in this graph between a and b a is not as close as not perfectly not is closer To Be than it is to itself because the magnitude difference。
12:45
But h and SW in the scene handle this perfectly well if you want To Get to an nittyritity there's an open source github issue that has hundreds of hundreds of comments of all of our research between us and other lusin commiters to make sure that this works and it does so this this is in lusin now and it'will be inas search soon。
13:06
And one of the thing that we'VE added recently in luoseose and we'LL be in a last of search soon is passage vectors all embedding models have token input limits this requires you to chunk long passages into vectors and this comes up okay so how do we end up handling our Meta data do we copy it for all of our passage vectors and it index them all individually。That can be wasteful why we want to do that I'm always going to filter over the same thing how am I going to combine my sparse passage search with my dense vectors because now be in twenty five scores change depending on how many documents you have or how big your documents are you also want to have your nearest passages over the nearest documents you just don't want the closest passage you want the closest document and so lucin has passage vector support and it's the end that's what we recently added。It effectively works like this it allows you it's built on luse primitives and it allows you to diversify the document score based on its nearest passage has we're searching the aist of you graph if it's natively in the scene with very little overhead and we didn't have to make any deep structural changes and it just sort of worked I kind of geted out a little bit when I first tested this and it worked out of the box and I was kind of blown away I was very excited um but it allows you to filter easily over the Meta data and combine over the top level sparse searching。
14:32
呃。I'm running at a time oh this is a very quick slide here's a bunch of stuff that I didn't touch on and apparently I need to ADD adding like a golden retriever query some time later that shy just announced today so em just picture that line in here with this slide em so optimizations for sparse spectter and ellcer we have multiple optimizations in luine that have been introduced that are coming to last search soon ller v two and hardware accelerationer infrastructure making inference of first class citizen in the stack shall I talked about that earlier and super simpl like many many simplifications inside of vector search and indexing and user experience I hope that this high overview。
15:16
Created some questions and I am eager to talk to each one of you and thank you。
我来说两句