展平重叠范围的算法


16

我正在寻找一种展平(拆分)可能重叠的数字范围列表的好方法。问题与以下问题非常相似:最快的分割重叠日期范围的方法,还有许多其他方法。

但是,范围不仅是整数,而且我正在寻找一种可以在Javascript或Python等中轻松实现的体面算法。

示例数据: 示例数据

解决方案示例: 在此处输入图片说明

抱歉,如果这是重复的,但是我还没有找到解决方法。


您如何确定绿色在蓝色之上,而在黄色和橙色之下?是否按顺序应用了颜色范围?如果真是这样,该算法似乎很明显。只是...嗯,按顺序应用颜色范围。
罗伯特·哈维

1
是的,它们是按顺序应用的。但这就是问题所在,您将如何“应用”范围?
Jollywatt

1
您是否经常添加/删除颜色,还是需要优化查询速度?您通常会有几个“范围”?3?3000?
Telastyn 2014年

不会非常频繁地添加/删除颜色,并且会在10到20之间的范围内任意变化,精度为4位以上。这就是为什么set方法不太适合的原因,因为set必须长于1000多个项目。我使用的方法是我在Python中发布的方法。
Jollywatt

Answers:


10

从左到右走动,使用堆栈跟踪您所用的颜色。可以使用数据集中的10个数字作为断点来代替离散地图。

从一个空堆栈开始,并设置start为0,循环直到结束为止:

  • 如果堆栈为空:
    • 寻找从或之后开始的第一种颜色start,然后将其和所有排名较低的颜色推入堆栈。在展平的列表中,标记该颜色的开始。
  • 其他(如果不为空):
    • 在或之后start找到排名更高的颜色的下一个起点,并找到当前颜色的终点
      • 如果下一个颜色首先开始,则将其及其他任何方式推入堆栈。将当前颜色的结尾更新为该颜色的开头,然后将此颜色的开头添加到展平的列表中。
      • 如果没有,则当前颜色首先结束,设置start为该颜色的末尾,将其弹出堆栈,然后检查排名第二高的颜色
        • 如果start在下一个颜色的范围内,请将此颜色添加到展平的列表中,从开始start
        • 如果堆栈为空,则继续循环(返回第一个项目符号点)。

给定您的示例数据,这是一个简单的过程:

# Initial data.
flattened = []
stack = []
start = 0
# Stack is empty.  Look for the next starting point at 0 or later: "b", 0 - Push it and all lower levels onto stack
flattened = [ (b, 0, ?) ]
stack = [ r, b ]
start = 0
# End of "b" is 5.4, next higher-colored start is "g" at 2 - Delimit and continue
flattened = [ (b, 0, 2), (g, 2, ?) ]
stack = [ r, b, g ]
start = 2
# End of "g" is 12, next higher-colored start is "y" at 3.5 - Delimit and continue
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, ?) ]
stack = [ r, b, g, y ]
start = 3.5
# End of "y" is 6.7, next higher-colored start is "o" at 6.7 - Delimit and continue
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, 6.7), (o, 6.7, ?) ]
stack = [ r, b, g, y, o ]
start = 6.7
# End of "o" is 10, and there is nothing starting at 12 or later in a higher color.  Next off stack, "y", has already ended.  Next off stack, "g", has not ended.  Delimit and continue.
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, 6.7), (o, 6.7, 10), (g, 10, ?) ]
stack = [ r, b, g ]
start = 10
# End of "g" is 12, there is nothing starting at 12 or later in a higher color.  Next off stack, "b", is out of range (already ended).  Next off stack, "r", is out of range (not started).  Mark end of current color:
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, 6.7), (o, 6.7, 10), (g, 10, 12) ]
stack = []
start = 12
# Stack is empty.  Look for the next starting point at 12 or later: "r", 12.5 - Push onto stack
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, 6.7), (o, 6.7, 10), (g, 10, 12), (r, 12.5, ?) ]
stack = [ r ]
start = 12
# End of "r" is 13.8, and there is nothing starting at 12 or higher in a higher color.  Mark end and pop off stack.
flattened = [ (b, 0, 2), (g, 2, 3.5), (y, 3.5, 6.7), (o, 6.7, 10), (g, 10, 12), (r, 12.5, 13.8) ]
stack = []
start = 13.8
# Stack is empty and nothing is past 13.8 - We're done.

您所说的“通往栈的任何其他东西”是什么意思?
Guillaume07

1
@ Guillaume07当前和所选的下一个起点之间的等级。样本数据没有显示出来,但是假设黄色移到绿色之前开始-您必须将绿色和黄色同时压入堆栈,以便当黄色结束时,绿色的末尾仍位于堆栈中的正确位置因此它仍会显示在最终结果中
伊兹卡塔

另一个我不理解的想法是为什么您首先告诉“如果堆栈为空:寻找开始时或开始之前的第一种颜色”,然后在代码示例中注释为“#堆栈为空。寻找下一个起始点为0或更高”。因此,无论是过去还是以后
Guillaume07

1
@ Guillaume07是的,输入错误,正确的版本在代码块中两次(第二个是底部附近的注释,该注释以“堆栈为空”开头)。我已经编辑了要点。
Izkata

3

这个解决方案似乎是最简单的。(或者至少是最容易掌握的)

所需要的只是一个减去两个范围的函数。换句话说,将产生以下结果:

A ------               A     ------           A    ----
B    -------    and    B ------        and    B ---------
=       ----           = ----                 = ---    --

这很简单。然后,您可以简单地遍历每个范围(从最低范围开始),然后依次从每个范围中减去上方的所有范围。那里有。


这是Python中范围减法器的实现:

def subtractRanges((As, Ae), (Bs, Be)):
    '''SUBTRACTS A FROM B'''
    # e.g, A =    ------
    #      B =  -----------
    # result =  --      ---
    # Returns list of new range(s)

    if As > Be or Bs > Ae: # All of B visible
        return [[Bs, Be]]
    result = []
    if As > Bs: # Beginning of B visible
        result.append([Bs, As])
    if Ae < Be: # End of B visible
        result.append([Ae, Be])
    return result

使用此功能,其余操作可以像这样完成:(“ span”表示范围,因为“ range”是Python关键字)

spans = [["red", [12.5, 13.8]],
["blue", [0.0, 5.4]],
["green", [2.0, 12.0]],
["yellow", [3.5, 6.7]],
["orange", [6.7, 10.0]]]

i = 0 # Start at lowest span
while i < len(spans):
    for superior in spans[i+1:]: # Iterate through all spans above
        result = subtractRanges(superior[1], spans[i][1])
        if not result:      # If span is completely covered
            del spans[i]    # Remove it from list
            i -= 1          # Compensate for list shifting
            break           # Skip to next span
        else:   # If there is at least one resulting span
            spans[i][1] = result[0]
            if len(result) > 1: # If there are two resulting spans
                # Insert another span with the same name
                spans.insert(i+1, [spans[i][0], result[1]])
    i += 1

print spans

这给出[['red', [12.5, 13.8]], ['blue', [0.0, 2.0]], ['green', [2.0, 3.5]], ['green', [10.0, 12.0]], ['yellow', [3.5, 6.7]], ['orange', [6.7, 10.0]]],这是正确的。


您的输出最终与问题中的预期输出不匹配...
Izkata

@Izkata Gosh,我很粗心。那一定是另一个测试的输出。立即修复,谢谢
Jollywatt '16

2

如果数据的范围确实与样本数据相似,则可以创建一个像这样的地图:

map = [0 .. 150]

for each color:
    for loc range start * 10 to range finish * 10:
        map[loc] = color

然后只需浏览这张地图即可生成范围

curcolor = none
for loc in map:
    if map[loc] != curcolor:
        if curcolor:
            rangeend = loc / 10
        make new range
        rangecolor = map[loc]
        rangestart = loc / 10

要正常工作,值必须与样本数据中的值在相对较小的范围内。

编辑:要使用真正的浮点数,请使用地图生成高级地图,然后参考原始数据创建边界。

map = [0 .. 15]

for each color:
   for loc round(range start) to round(range finish):
        map[loc] = color

curcolor = none
for loc in map
    if map[loc] != curcolor:

        make new range
        if loc = round(range[map[loc]].start)  
             rangestart = range[map[loc]].start
        else
             rangestart = previous rangeend
        rangecolor = map[loc]
        if curcolor:
             if map[loc] == none:
                 last rangeend = range[map[loc]].end
             else
                 last rangeend = rangestart
        curcolor = rangecolor

这是一个非常好的解决方案,我之前曾遇到过。但是,我正在寻找一种可以管理任意浮动范围的更通用的解决方案...(对于563.807-770.100之类的东西来说,这不是最好的方法)
Jollywatt 2014年

1
我认为您可以通过将值取整并生成贴图来对其进行概括,但是将边缘上的位置标记为具有两种颜色。然后,当您看到具有两种颜色的位置时,请返回原始数据以确定边界。
Gort机器人,2014年

2

这是Scala中一个相对简单的解决方案。移植到另一种语言应该并不难。

case class Range(name: String, left: Double, right: Double) {
  def overlapsLeft(other: Range) =
    other.left < left && left < other.right

  def overlapsRight(other: Range) =
    other.left < right && right < other.right

  def overlapsCompletely(other: Range) =
    left <= other.left && right >= other.right

  def splitLeft(other: Range) = 
    Range(other.name, other.left, left)

  def splitRight(other: Range) = 
    Range(other.name, right, other.right)
}

def apply(ranges: Set[Range], newRange: Range) = {
  val left     = ranges.filter(newRange.overlapsLeft)
  val right    = ranges.filter(newRange.overlapsRight)
  val overlaps = ranges.filter(newRange.overlapsCompletely)

  val leftSplit  =  left.map(newRange.splitLeft)
  val rightSplit = right.map(newRange.splitRight)

  ranges -- left -- right -- overlaps ++ leftSplit ++ rightSplit + newRange
}

val ranges = Vector(
  Range("red",   12.5, 13.8),
  Range("blue",   0.0,  5.4),
  Range("green",  2.0, 12.0),
  Range("yellow", 3.5,  6.7),
  Range("orange", 6.7, 10.0))

val flattened = ranges.foldLeft(Set.empty[Range])(apply)
val sorted = flattened.toSeq.sortBy(_.left)
sorted foreach println

apply接受Set已经应用的所有范围中的a ,找到重叠部分,然后返回减去重叠部分再加上新范围和新拆分范围的新集合。 在每个输入范围内foldLeft反复调用apply


0

只需保留一组按开始排序的范围即可。添加覆盖所有内容的范围(-oo .. + oo)。要添加范围r:

let pre = last range that starts before r starts

let post = earliest range that starts before r ends

now iterate from pre to post: split ranges that overlap, remove ranges that are covered, then add r
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.