Scalaz(48)- scalaz-stream: 深入了解-Transducer: Process1-tee-wye

   在上一篇讨论里我们介绍了Source,它的类型款式是这样的:Process[F[_],O]。Source是通过await函数来产生数据流。await函数款式如下:

def await[F[_], A, O](req: F[A])(rcv: A => Process[F, O]): Process[F, O]  

await函数的作用是:运算F从外界数据源获取数据A,如:从数据库读取记录、从网络读取网页或读取键盘鼠标输入等。获取数据A后输入函数rcv来生成Process[F,O]类型。这是一种产生数据的数据源Source模式。有了数据源Source后我们可能需要对Source提供的数据O进行加工处理,这就是transducer的功能了。我们先看看Transducer的类型款式:

type Process1[-I,+O] = Process[Env[I,Any]#Is, O]

从类型参数来看transducer不会产生任何的副作用,它的作用应该是把接收到的数据元素I加工处理后转换成O类型数据输出。transducer不会主动产生任何数据而是被动接收输入数据I,所以Process1类型的await函数被称为await1,款式如下:

/** The `Process1` which awaits a single input, emits it, then halts normally. */
def await1[I]: Process1[I, I] =
    receive1(emit)

def receive1[I, O](rcv: I => Process1[I, O]): Process1[I, O] =
    await(Get[I])(rcv)

首先可以看出await1就是await函数的特别版本:产生数据的F[A]被替换成了帮助compiler推导类型I的Get[I],也就是说await1不会主动产生数据,它的rcv是个lambda:需要提供给它一个I,它才会返回Process1[I,O]。我们来看看await1的用例:

1  import Process._
2  def multiplyBy(n: Int): Process1[Int,Int] =
3     await1[Int].flatMap { i => emit(i * n) }.repeat
4                                        //> multiplyBy: (n: Int)scalaz.stream.Process1[Int,Int]
5  def addPosfix: Process1[Int,String] =
6    receive1[Int,String]{ i => emit(i.toString + "!") }.repeat
7                                        //> addPosfix: => scalaz.stream.Process1[Int,String]

可以看出无论await1或者receive1都在被动等待一个元素i来继续进行数据转换功能。我们可以用pipe把Process1连接到一个Source上,然后对Source产生的元素进行转换处理:

1  (range(11,16).toSource pipe multiplyBy(5) |> addPosfix).runLog.run
2                                     //> res0: Vector[String] = Vector(55!, 60!, 65!, 70!, 75!)

我们也可以把一个普通函数lift成Process1:

1  import process1._
2  (range(11,16).toSource |> lift {(i: Int) => i * 5} |> addPosfix).runLog.run
3                                      //> res1: Vector[String] = Vector(55!, 60!, 65!, 70!, 75!)

上面的|>是pipe符号。实际上我们可以直接对Source输出元素进行转换来达到同样目的:

1  range(11,16).toSource.flatMap { i =>
2   emit(i * 5) }.flatMap { i =>
3   emit(i.toString + "!") }.runLog.run       //> res1: Vector[String] = Vector(55!, 60!, 65!, 70!, 75!)

虽然用更直接的方法获得相同结果,但值得注意的是现在这个Source已经是一个特殊的版本了,附加在它上面的这些元素转换特性是无法分割的了。实际上pipe就是Process组合函数,我们用它来把Source和Transducer、Transducer与Transducer对接起来。这样我们就可以保证Source和Transducer都是功能单一的函数组件了。

只要连接上一个数据源,我们就可以对它发出的元素进行转换处理。这些transduce功能函数都在process1对象里:

 1 import process1._
 2  (range(1,6).toSource pipe take(2))
 3  .runLog.run                                      //> res2: Vector[Int] = Vector(1, 2)
 4  (range(1,10).toSource |> filter {_ % 2 == 0 }
 5   |> collect {
 6     case 4 => "the number four"
 7     case 5 => "the number five"
 8     case 6 => "the number six"
 9     case 100 => "the number one hundred"
10     }
11  ).runLog.run         //> res3: Vector[String] = Vector(the number four, the number six)

基本上所有对scala标准库List使用的函数都可以对Process1施用:

 1 (range(1,6).toSource
 2   |> fold(Nil:List[Int]){ (b,a) => a :: b }
 3  ).runLog.run                            //> res5: Vector[List[Int]] = Vector(List(5, 4, 3, 2, 1))
 4 (range(1,6).toSource
 5   |> foldMap { List(_) }
 6  ).runLog.run                            //> res6: Vector[List[Int]] = Vector(List(1, 2, 3, 4, 5))
 7 (range(1,6).toSource
 8   |> foldMap { identity }
 9  ).runLog.run                            //> res7: Vector[Int] = Vector(15)
10 (range(1,6).toSource
11   |> sum
12  ).runLog.run                            //> res8: Vector[Int] = Vector(15)
13 (range(1,6).toSource
14   |> scan(0){(a,b) => a + b}
15  ).runLog.run                            //> res9: Vector[Int] = Vector(0, 1, 3, 6, 10, 15)

我们也可以把一串现成的元素插入一个Process1:

1  (range(1,6).toSource
2   |> feed(6 to 10)(lift(identity))
3   ).runLog.run                         //> res10: Vector[Int] = Vector(6, 7, 8, 9, 10, 1, 2, 3, 4, 5)
4  (range(1,6).toSource
5   |> feed(6 to 10)(lift(identity))
6   |> foldMap {identity}
7   ).runLog.run                         //> res11: Vector[Int] = Vector(55)

从上面的示范可以得出:Process1只是被动接受从上游发过来的元素,我们必须把它和上游接驳后才能发生作用,pipe就是这样一个连接器。同样原理:我们也可以用tee来连接两个数据源,然后把两个源头数据合并形成一个按左右顺序的数据流。tee的类型定义如下:

/**
   * A stream transducer that can read from one of two inputs,
   * the 'left' (of type `I`) or the 'right' (of type `I2`).
   * `Process1[I,O] <: Tee[I,I2,O]`.
   */
  type Tee[-I,-I2,+O] = Process[Env[I,I2]#T, O]

我们看到tee的类型款式很像Process1,只不过有I1,i2两个输入。如果Process1的驱动函数是await1即receive1,那么tee的就是receiveL和receiveR了:

/**
   * Awaits to receive input from Left side,
   * than if that request terminates with `End` or is terminated abnormally
   * runs the supplied `continue` or `cleanup`.
   * Otherwise `rcv` is run to produce next state.
   *
   * If  you don't need `continue` or `cleanup` use rather `awaitL.flatMap`
   */
  def receiveL[I, I2, O](rcv: I => Tee[I, I2, O]): Tee[I, I2, O] =
    await[Env[I, I2]#T, I, O](L)(rcv)

  /**
   * Awaits to receive input from Right side,
   * than if that request terminates with `End` or is terminated abnormally
   * runs the supplied continue.
   * Otherwise `rcv` is run to produce next state.
   *
   * If  you don't need `continue` or `cleanup` use rather `awaitR.flatMap`
   */
  def receiveR[I, I2, O](rcv: I2 => Tee[I, I2, O]): Tee[I, I2, O] =
    await[Env[I, I2]#T, I2, O](R)(rcv)

与await1同样,receiveL和receiveR都是await的特别版。其中L,R和上面await1的Get[I]都在Env类里: 

case class Env[-I, -I2]() {
    sealed trait Y[-X] {
      def tag: Int
      def fold[R](l: => R, r: => R, both: => R): R
    }
    sealed trait T[-X] extends Y[X]
    sealed trait Is[-X] extends T[X]
    case object Left extends Is[I] {
      def tag = 0
      def fold[R](l: => R, r: => R, both: => R): R = l
    }
    case object Right extends T[I2] {
      def tag = 1
      def fold[R](l: => R, r: => R, both: => R): R = r
    }
    case object Both extends Y[ReceiveY[I, I2]] {
      def tag = 2
      def fold[R](l: => R, r: => R, both: => R): R = both
    }
  }


  private val Left_  = Env[Any, Any]().Left
  private val Right_ = Env[Any, Any]().Right
  private val Both_  = Env[Any, Any]().Both

  def Get[I]: Env[I, Any]#Is[I] = Left_
  def L[I]: Env[I, Any]#Is[I] = Left_
  def R[I2]: Env[Any, I2]#T[I2] = Right_
  def Both[I, I2]: Env[I, I2]#Y[ReceiveY[I, I2]] = Both_

L[I1],R[I2],Get[I]都没什么实际作用,它们是为了compiler类型推导而设。tee的顺序特性是指我们可以用receiveL,receiveR来指定从那边输入元素。可以想象tee的主要作用应该是合并两个数据源发出的元素。tee的数据合并操作方式基本上是按下面这个tee函数款式进行的:

/**
   * Use a `Tee` to interleave or combine the outputs of `this` and
   * `p2`. This can be used for zipping, interleaving, and so forth.
   * Nothing requires that the `Tee` read elements from each
   * `Process` in lockstep. It could read fifty elements from one
   * side, then two elements from the other, then combine or
   * interleave these values in some way, etc.
   *
   * If at any point the `Tee` awaits on a side that has halted,
   * we gracefully kill off the other side, then halt.
   *
   * If at any point `t` terminates with cause `c`, both sides are killed, and
   * the resulting `Process` terminates with `c`.
   */
  final def tee[F2[x] >: F[x], O2, O3](p2: Process[F2, O2])(t: Tee[O, O2, O3]): Process[F2, O3]

用伪代码表示就是:leftProcess.tee(rightProcess)(teeFunction): newProcess

以下是几个常用的tee功能函数:

 /** Alternate emitting elements from `this` and `p2`, starting with `this`. */
  def interleave[F2[x] >: F[x], O2 >: O](p2: Process[F2, O2]): Process[F2, O2] =
    this.tee(p2)(scalaz.stream.tee.interleave[O2])

  /** Call `tee` with the `zipWith` `Tee[O,O2,O3]` defined in `tee.scala`. */
  def zipWith[F2[x] >: F[x], O2, O3](p2: Process[F2, O2])(f: (O, O2) => O3): Process[F2, O3] =
    this.tee(p2)(scalaz.stream.tee.zipWith(f))

  /** Call `tee` with the `zip` `Tee[O,O2,O3]` defined in `tee.scala`. */
  def zip[F2[x] >: F[x], O2](p2: Process[F2, O2]): Process[F2, (O, O2)] =
    this.tee(p2)(scalaz.stream.tee.zip)

  /**
   * When `condition` is `true`, lets through any values in `this` process, otherwise blocks
   * until `condition` becomes true again. Note that the `condition` is checked before
   * each and every read from `this`, so `condition` should return very quickly or be
   * continuous to avoid holding up the output `Process`. Use `condition.forwardFill` to
   * convert an infrequent discrete `Process` to a continuous one for use with this
   * function.
   */
  def when[F2[x] >: F[x], O2 >: O](condition: Process[F2, Boolean]): Process[F2, O2] =
    condition.tee(this)(scalaz.stream.tee.when)
 /**
   * Halts this `Process` as soon as `condition` becomes `true`. Note that `condition`
   * is checked before each and every read from `this`, so `condition` should return
   * very quickly or be continuous to avoid holding up the output `Process`. Use
   * `condition.forwardFill` to convert an infrequent discrete `Process` to a
   * continuous one for use with this function.
   */
  def until[F2[x] >: F[x], O2 >: O](condition: Process[F2, Boolean]): Process[F2, O2] =
    condition.tee(this)(scalaz.stream.tee.until)

下面是它们的具体实现方法:

/** A `Tee` which ignores all input from left. */
  def passR[I2]: Tee[Any, I2, I2] = awaitR[I2].repeat

  /** A `Tee` which ignores all input from the right. */
  def passL[I]: Tee[I, Any, I] = awaitL[I].repeat

  /** Echoes the right branch until the left branch becomes `true`, then halts. */
  def until[I]: Tee[Boolean, I, I] =
    awaitL[Boolean].flatMap(kill => if (kill) halt else awaitR[I] ++ until)

  /** Echoes the right branch when the left branch is `true`. */
  def when[I]: Tee[Boolean, I, I] =
    awaitL[Boolean].flatMap(ok => if (ok) awaitR[I] ++ when else when)

  /** Defined as `zipWith((_,_))` */
  def zip[I, I2]: Tee[I, I2, (I, I2)] = zipWith((_, _))

  /** Defined as `zipWith((arg,f) => f(arg)` */
  def zipApply[I,I2]: Tee[I, I => I2, I2] = zipWith((arg,f) => f(arg))

  /** A version of `zip` that pads the shorter stream with values. */
  def zipAll[I, I2](padI: I, padI2: I2): Tee[I, I2, (I, I2)] =
    zipWithAll(padI, padI2)((_, _))

我们用以下例子来示范这些函数的使用方法: 

 1 import tee._
 2  val source = range(1,6).toSource                 //> source  : scalaz.stream.Process[scalaz.concurrent.Task,Int] = Append(Halt(End),Vector(<function1>))
 3  val seq = emitAll(Seq("a","b","c"))              //> seq  : scalaz.stream.Process0[String] = Emit(List(a, b, c))
 4  val signalw = Process(true,true,false,true)      //> signalw  : scalaz.stream.Process0[Boolean] = Emit(WrappedArray(true, true, false, true))
 5  val signalu = Process(false,true,false,true)     //> signalu  : scalaz.stream.Process0[Boolean] = Emit(WrappedArray(false, true,false, true))
 6  
 7  source.tee(seq)(interleave).runLog.run           //> res12: Vector[Any] = Vector(1, a, 2, b, 3, c)
 8  (source interleave seq).runLog.run               //> res13: Vector[Any] = Vector(1, a, 2, b, 3, c)
 9  signalu.tee(source)(until).runLog.run            //> res14: Vector[Int] = Vector(1)
10  signalw.tee(source)(when).runLog.run             //> res15: Vector[Int] = Vector(1, 2, 3)
11  source.tee(seq)(passL).runLog.run                //> res16: Vector[Int] = Vector(1, 2, 3, 4, 5)
12  source.tee(seq)(passR).runLog.run                //> res17: Vector[String] = Vector(a, b, c)
13  (source zip seq).runLog.run                      //> res18: Vector[(Int, String)] = Vector((1,a), (2,b), (3,c))
14  (seq zip source).runLog.run                      //> res19: Vector[(String, Int)] = Vector((a,1), (b,2), (c,3))
15  (source.zipWith(seq){(a,b) => a.toString + b}).runLog.run
16                                                   //> res20: Vector[String] = Vector(1a, 2b, 3c)

与Process1同样,我们也可以对tee注入一串元素,这次我们用feedL和feedR:

/** Feed a sequence of inputs to the left side of a `Tee`. */
  def feedL[I, I2, O](i: Seq[I])(p: Tee[I, I2, O]): Tee[I, I2, O] = {...}
 /** Feed a sequence of inputs to the right side of a `Tee`. */
  def feedR[I, I2, O](i: Seq[I2])(p: Tee[I, I2, O]): Tee[I, I2, O] = {...}

用例:(好像只能用feedL。不过已经足够了。我们的目的是把一串现成的元素插入形成的流,无论从左或右都无所谓)

1 val ltee = tee.feedL(Seq(1,2,3))(id[Int])        //> ltee  : scalaz.stream.Tee[Int,Any,Int] = Append(Emit(Vector(1, 2)),Vector(<function1>))
2  halt.tee[Task,Int,Int](halt)(ltee).runLog.run    //> res21: Vector[Int] = Vector(1, 2, 3)
3  source.tee[Task,Int,Int](halt)(ltee).runLog.run  //> res22: Vector[Int] = Vector(1, 2, 3, 1, 2, 3, 4, 5)

还有一种多源头元素合并方式是wye。wye与tee相似:都是连接到左右两个数据源头。与tee不同的是通过wye合并的数据流是不确定顺序的。wye从源头接收元素的方式不按照左右顺序而是随机的。特别是当左右两个源头产生数据的速度不同时wye采取先到先收的策略,因而增加了接收顺序的不确定性。与tee相同:wye的操作基本上是在wye函数的定义上:

/**
   * Like `tee`, but we allow the `Wye` to read non-deterministically
   * from both sides at once.
   *
   * If `y` is in the state of awaiting `Both`, this implementation
   * will continue feeding `y` from either left or right side,
   * until either it halts or _both_ sides halt.
   *
   * If `y` is in the state of awaiting `L`, and the left
   * input has halted, we halt. Likewise for the right side.
   *
   * For as long as `y` permits it, this implementation will _always_
   * feed it any leading `Emit` elements from either side before issuing
   * new `F` requests. More sophisticated chunking and fairness
   * policies do not belong here, but should be built into the `Wye`
   * and/or its inputs.
   *
   * The strategy passed in must be stack-safe, otherwise this implementation
   * will throw SOE. Preferably use one of the `Strategys.Executor(es)` based strategies
   */
  final def wye[O2, O3](p2: Process[Task, O2])(y: Wye[O, O2, O3])(implicit S: Strategy): Process[Task, O3] =
    scalaz.stream.wye[O, O2, O3](self, p2)(y)(S)

wye有几个重要的数据合并操作函数:

/**
   * After each input, dynamically determine whether to read from the left, right, or both,
   * for the subsequent input, using the provided functions `f` and `g`. The returned
   * `Wye` begins by reading from the left side and is left-biased--if a read of both branches
   * returns a `These(x,y)`, it uses the signal generated by `f` for its next step.
   */
  def dynamic[I,I2](f: I => wye.Request, g: I2 => wye.Request): Wye[I,I2,ReceiveY[I,I2]] = {
    import scalaz.stream.wye.Request._
    def go(signal: wye.Request): Wye[I,I2,ReceiveY[I,I2]] = signal match {
      case L => receiveL { i => emit(ReceiveL(i)) ++ go(f(i)) }
      case R => receiveR { i2 => emit(ReceiveR(i2)) ++ go(g(i2)) }
      case Both => receiveBoth {
        case t@ReceiveL(i) => emit(t) ++ go(f(i))
        case t@ReceiveR(i2) => emit(t) ++ go(g(i2))
        case HaltOne(rsn) => Halt(rsn)
      }
    }
    go(L)
  }
/**
   * Non-deterministic interleave of both inputs. Emits values whenever either
   * of the inputs is available.
   *
   * Will terminate once both sides terminate.
   */
  def merge[I]: Wye[I,I,I] =
    receiveBoth {
      case ReceiveL(i) => emit(i) ++ merge
      case ReceiveR(i) => emit(i) ++ merge
      case HaltL(End)   => awaitR.repeat
      case HaltR(End)   => awaitL.repeat
      case HaltOne(rsn) => Halt(rsn)
    }
/**
   * Nondeterminstic interleave of both inputs. Emits values whenever either
   * of the inputs is available.
   */
  def either[I,I2]: Wye[I,I2,I \/ I2] =
    receiveBoth {
      case ReceiveL(i) => emit(left(i)) ++ either
      case ReceiveR(i) => emit(right(i)) ++ either
      case HaltL(End)     => awaitR[I2].map(right).repeat
      case HaltR(End)     => awaitL[I].map(left).repeat
      case h@HaltOne(rsn) => Halt(rsn)
    }

我们用一些例子来示范它们的用法:

1 import wye._
2  source.wye(seq)(either).runLog.run               //> res23: Vector[scalaz.\/[Int,String]] = Vector(-\/(1), \/-(a), \/-(b), \/-(c), -\/(2), -\/(3), -\/(4), -\/(5))
3  (source either seq).runLog.run                   //> res24: Vector[scalaz.\/[Int,String]] = Vector(-\/(1), \/-(a), \/-(b), \/-(c), -\/(2), -\/(3), -\/(4), -\/(5))
4  source.wye(seq)(merge).runLog.run                //> res25: Vector[Any] = Vector(1, a, b, c, 2, 3, 4, 5)
5  (source merge seq).runLog.run                    //> res26: Vector[Any] = Vector(1, a, b, c, 2, 3, 4, 5)

实际上我们也可以实现某些程度的接收顺序。我们可以用dynamic函数来要求wye从左或右提供数据元素:

1  val w = dynamic((r:Int) => Request.R, (l:String) => Request.L)
2                                                   //> w  : scalaz.stream.Wye[Int,String,scalaz.stream.ReceiveY[Int,String]] = Await(Left,<function1>,<function1>)
3  source.wye(seq)(w).runLog.run                    //> res27: Vector[scalaz.stream.ReceiveY[Int,String]] = Vector(ReceiveL(1), ReceiveR(a), ReceiveL(2), ReceiveR(b), ReceiveL(3), ReceiveR(c), ReceiveL(4))
4  val fw = dynamic((r: Int) => if (r % 3 == 0) {
5    Request.R } else {Request.L}, (l:String) => Request.L)
6                                                   //> fw  : scalaz.stream.Wye[Int,String,scalaz.stream.ReceiveY[Int,String]] = Await(Left,<function1>,<function1>)
7  source.wye(seq)(fw).runLog.run                   //> res28: Vector[scalaz.stream.ReceiveY[Int,String]] = Vector(ReceiveL(1), ReceiveL(2), ReceiveL(3), ReceiveR(a), ReceiveL(4), ReceiveL(5))

与tee同样:我们可以用feedL来把一串现成的元素插入合并流里:

1  val lwye = wye.feedL(Seq(1,2,3))(id[Int])        //> lwye  : scalaz.stream.Wye[Int,Any,Int] = Append(Emit(Vector(1, 2)),Vector(<
2                                                   //| function1>))
3  halt.wye(halt)(lwye).runLog.run                  //> res29: Vector[Int] = Vector(1, 2, 3)
4  source.wye(halt)(lwye).runLog.run                //> res30: Vector[Int] = Vector(1, 2, 3, 1, 2, 3, 4, 5)

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏yang0range

关于单例设计模式

2.使用场景:确保某个类有且只有一个对象的常见,避免产生多个对象消耗过多的资源或者某种类型的对象只应该有且只有一个。

643
来自专栏博客园

深入浅出话属性

程序的本质就是“数据+算法”,或者说用算法来操作数据来得到自己想要的结果。在程序中,数据表现为各种各样的变量,算法则表现为各种各样的函数(操作符是函数的简记法)...

1273
来自专栏刘晓杰

android中内存缓存是如何实现的

2646
来自专栏柠檬先生

extjs 基础部分

创建对象的方法:     使用new 关键字创建对象。       new  classname ([config])     使用Ext.create方法创建...

20110
来自专栏函数式编程语言及工具

泛函编程(38)-泛函Stream IO:IO Process in action

  在前面的几节讨论里我们终于得出了一个概括又通用的IO Process类型Process[F[_],O]。这个类型同时可以代表数据源(Source)和数据终端...

1797
来自专栏刘望舒

Android内存优化(三)避免可控的内存泄漏

前言 内存泄漏向来都是内存优化的重点,它如同幽灵一般存于我们的应用当中,有时它不会现身,但一旦现身就会让你头疼不已。因此,如何避免、发现和解决内存泄漏就变得尤为...

17410
来自专栏全沾开发(huā)

Generator的正确打开方式

984
来自专栏Flutter入门到实战

Kotlin 语言下设计模式的不同实现

工厂方法把创建对象的过程抽象为接口,由工厂的子类决定对象的创建,Kotlin 下的实现与 Java 一样。

641
来自专栏blackheart的专栏

[C#1] 10-事件

事件概述 CLR的事件模型建立在委托的机制之上。定义事件成员的类型允许类型(或者类型的实例)在某些特定事件发生时通知其他对象,事件为类型提供了一下三种能力: 1...

1816
来自专栏源哥的专栏

找到java代码中没有被使用的公用方法

最近,我打算对我们项目的代码进行清理,准备把一些没有被使用到的公用方法清理掉,但是我在网络找了一遍,像PMD,Findbugs等静态工具,都只能找到没有被使用的...

551

扫码关注云+社区