9

给定两个数组；$births包含表明某人出生的出生年份列表，以及$deaths表明某人死亡的死亡年份列表，我们如何找到人口最高的年份？

例如，给定以下数组：

$births = [1984, 1981, 1984, 1991, 1996];
$deaths = [1991, 1984];

人口最高的年份应该是1996，因为3那一年人们还活着，这是所有年份中人口最高的年份。

这是运行的数学公式：

| 出生| 死亡 人口|
| ------- | ------- | ------------ ||
| 1981年| | 1 |
| 1984年| | 2 |
| 1984年| 1984年| 2 |
| 1991年| 1991年| 2 |
| 1996 | | 3 |

假设条件

我们可以放心地假设某人出生的年份可以增加一，而某人死亡的年份可以减少一。因此，在此示例中，1984年出生2人，1984年死亡1人，这意味着该年人口增加了1。

我们还可以安全地假设死亡人数永远不会超过出生人数，并且在人口为0时不会发生死亡。

我们也可以有把握地认为在这两个年$deaths和$births永远不会是负数或浮点值（他们总是正整数大于0）。

但是，我们不能假定数组将被排序或不会有重复的值。

要求

给定这两个数组作为输入，我们必须编写一个函数以返回发生最高人口的年份。该函数可以返回0，false，""，或NULL（任何falsey值是可以接受的），如果输入的数组是空的，或者如果人口总是在0贯穿。如果最高人口发生在多年中，则该功能可能会返回达到最高人口的第一年或任何后续年份。

例如：

$births = [1997, 1997, 1997, 1998, 1999];
$deaths = [1998, 1999];

/* The highest population was 3 on 1997, 1998 and 1999, either answer is correct */

此外，包括解决方案的Big O也会有所帮助。

我这样做的最佳尝试是：

function highestPopulationYear(Array $births, Array $deaths): Int {

    sort($births);
    sort($deaths);

    $nextBirthYear = reset($births);
    $nextDeathYear = reset($deaths);

    $years = [];
    if ($nextBirthYear) {
        $years[] = $nextBirthYear;
    }
    if ($nextDeathYear) {
        $years[] = $nextDeathYear;
    }

    if ($years) {
        $currentYear = max(0, ...$years);
    } else {
        $currentYear = 0;
    }

    $maxYear = $maxPopulation = $currentPopulation = 0;

    while(current($births) !== false || current($deaths) !== false || $years) {

        while($currentYear === $nextBirthYear) {
            $currentPopulation++;
            $nextBirthYear = next($births);
        }

        while($currentYear === $nextDeathYear) {
            $currentPopulation--;
            $nextDeathYear = next($deaths);
        }

        if ($currentPopulation >= $maxPopulation) {
            $maxPopulation = $currentPopulation;
            $maxYear = $currentYear;
        }

        $years = [];

        if ($nextBirthYear) {
            $years[] = $nextBirthYear;
        }
        if ($nextDeathYear) {
            $years[] = $nextDeathYear;
        }
        if ($years) {
            $currentYear = min($years);
        } else {
            $currentYear = 0;
        }
    }

    return $maxYear;
}

上面的算法应该在多项式时间内工作，因为最坏的O(((n log n) * 2) + k)情况n是每个数组中要排序的元素k数，而出生年数（因为我们k一直知道k >= y）y是死亡年数。但是，我不确定是否有更有效的解决方案。

我的兴趣纯粹是在现有算法的基础上改进计算复杂度。内存复杂性无关紧要。运行时优化也不是。至少这不是主要问题。欢迎进行任何次要/主要的运行时优化，但这不是关键因素。

— 谢里夫
source

2

当您有可行的解决方案时，是否可以更好地适合codereview.stackexchange.com？

— 奈杰尔

1

问题是寻求最有效的解决方案，不一定是任何可行的解决方案。我认为这完全是正确的。

— 谢里夫

1

我并不是说这对SO无效（在那种情况下我会投票决定要关闭），我只是想知道您是否可能会在CR方面得到更多的回应。

— 奈杰尔

@NigelRen我看不到尝试的危害。虽然我想让这个开放几天。如果没有答案，我会悬赏。

— 谢里夫

1

因此，如果您搜索出生死亡关键字，SO本身就会遇到很多问题。廉价的改进将是改进排序：将长度的数组作为出生/死亡时间的范围（默认情况下，每个单元格都保留一个值为0的日期）。将有关出生和死亡的数字加1或减1，然后累加总和并保持找到的最大和

— grodzi

4

我认为我们可以先进行排序，然后在迭代时保持当前的数量和全局最大值，O(n log n)以O(1)腾出更多空间。我尝试使用当年作为参考点，但逻辑似乎仍然有些棘手，因此我不确定它是否已完全解决。希望它可以给出该方法的想法。

JavaScript代码（欢迎使用反示例/错误）

function f(births, deaths){
  births.sort((a, b) => a - b);
  deaths.sort((a, b) => a - b);

  console.log(JSON.stringify(births));
  console.log(JSON.stringify(deaths));
  
  let i = 0;
  let j = 0;
  let year = births[i];
  let curr = 0;
  let max = curr;

  while (deaths[j] < births[0])
    j++;

  while (i < births.length || j < deaths.length){
    while (year == births[i]){
      curr = curr + 1;
      i = i + 1;
    }
    
    if (j == deaths.length || year < deaths[j]){
      max = Math.max(max, curr);
      console.log(`year: ${ year }, max: ${ max }, curr: ${ curr }`);
    
    } else if (j < deaths.length && deaths[j] == year){
      while (deaths[j] == year){
        curr = curr - 1;
        j = j + 1;
      }
      max = Math.max(max, curr);
      console.log(`year: ${ year }, max: ${ max }, curr: ${ curr }`);
    }

    if (j < deaths.length && deaths[j] > year && (i == births.length || deaths[j] < births[i])){
      year = deaths[j];
      while (deaths[j] == year){
        curr = curr - 1;
        j = j + 1;
      }
      console.log(`year: ${ year }, max: ${ max }, curr: ${ curr }`);
    }

    year = births[i];
  }
  
  return max;
}

var input = [
  [[1997, 1997, 1997, 1998, 1999],
  [1998, 1999]],
  [[1, 2, 2, 3, 4],
  [1, 2, 2, 5]],
  [[1984, 1981, 1984, 1991, 1996],
  [1991, 1984, 1997]],
  [[1984, 1981, 1984, 1991, 1996],
  [1991, 1982, 1984, 1997]]
]

for (let [births, deaths] of input)
  console.log(f(births, deaths));

展开摘要

如果年份范围m大约在n，我们可以将范围内的每一年的计数存储起来，并具有O(n)时间复杂性。如果我们想花哨的话，我们还可以O(n * log log m)通过使用允许快速后继查找的Y型快速树来增加时间复杂度O(log log m)。

— גלעדברקן
source

1.教导我Y-fast特里的存在。关于算法：减少后无需检查最大值。仅在递增之后。最后，虽然块是不必要的：考虑对两个排序的列表进行排序：您只需要两个（i，j）的头，选择每个的头，然后推进较小的一个。if(birth_i < death_j){//increment stuff + check max} else{//decrement}; birth_i||=infty; death_j||=infty。您也可以迭代到min(birthSize, deathSize)。如果最小出生，停止。如果min是死亡（可疑..），请停止并检查(max + birth.length-i)

— grodzi

@grodzi我确实开始考虑合并排序，但是得出结论，这需要额外的处理，因为重复以及出生与死亡的顺序如何影响计数。当死亡年份与出生年份无法比拟时，最后一个while循环对我来说似乎是必要的。您是正确的，该循环中的最大值是不必要的。

— גלעדברקן

@גלעדברקן将存储桶排序用于线性时间。

— Dave

我已经在回答中说过这个想法：“如果年份范围m在n的数量级上，我们可以将每年的计数存储在该范围内，并且时间复杂度为O（n）。”

— גלעדברקן

这不是效率，我不知道为什么给你奖励哈哈哈

— Emiliano

4

我们可以使用存储桶排序在线性时间内解决此问题。假设输入的大小为n，年份范围为m。

O(n): Find the min and max year across births and deaths.
O(m): Create an array of size max_yr - min_yr + 1, ints initialized to zero. 
      Treat the first cell of the array as min_yr, the next as min_yr+1, etc...
O(n): Parse the births array, incrementing the appropriate index of the array. 
      arr[birth_yr - min_yr] += 1
O(n): Ditto for deaths, decrementing the appropriate index of the array.
      arr[death_yr - min_yr] -= 1
O(m): Parse your array, keeping track of the cumulative sum and its max value.

最大的累积最大值是您的答案。

运行时间为O（n + m），所需的额外空间为O（m）。

如果m为O（n），这是n中的线性解；也就是说，如果年数的增长没有比出生和死亡的数量更快。对于现实世界的数据几乎可以肯定是这样。

— 戴夫
source

1

您能提供一个可行的实施方案吗？

— 谢里夫

1

@Sherif实现留给读者练习……无论如何这都是微不足道的。不清楚吗？

— Dave

我会注意到，因为您的粒度是年份，所以存在一些歧义。因为我们正在有效地测量年底时的人口，并且在年中可能还有其他时间点，由于出生和死亡时间的原因，人口会更高。

— Dave

1

如果我们必须解析“大小为max_yr-min_yr + 1的数组”，那么线性时间如何？（CC @Sherif）

— גלעדברקן

1

@Dave：难点1和2的复杂度不是O（2n）吗？ 1.遍历所有出生+死亡：O(n): Find the min and max year across births and deaths 2.遍历所有出生+死亡：O(n): Parse the births+death array, incrementing the appropriate index of the array 然后您做：O（m）：解析数组，跟踪累积和及其最大值。（您无需解析此数组-您可以在增加索引2的同时跟踪MAX）

— 安东尼

3

首先将出生和死亡汇总到地图（year => population change）中，然后按键对其进行排序，然后计算出该人口的流动人口。

这应该大约为O(2n + n log n)，即n出生人数。

$births = [1984, 1981, 1984, 1991, 1996];
$deaths = [1991, 1984];

function highestPopulationYear(array $births, array $deaths): ?int
{
    $indexed = [];

    foreach ($births as $birth) {
        $indexed[$birth] = ($indexed[$birth] ?? 0) + 1;
    }

    foreach ($deaths as $death) {
        $indexed[$death] = ($indexed[$death] ?? 0) - 1;
    }

    ksort($indexed);

    $maxYear = null;
    $max = $current = 0;

    foreach ($indexed as $year => $change) {
        $current += $change;
        if ($current >= $max) {
            $max = $current;
            $maxYear = $year;
        }
    }

    return $maxYear;
}

var_dump(highestPopulationYear($births, $deaths));

— 理查德·范·韦尔岑
source

正如我所看到的：n =事件数（出生+死亡），m =事件年数（具有出生或死亡的年），这实际上是O（n + m log m）。如果n >> m-可以认为是O（n）。如果您在（例如）100年的时间内有数十亿的生与死，那么对包含100个元素（ksort($indexed)）的数组进行排序就变得无关紧要。

— Paul Spiegel

您可以使用处理出生$indexed = array_count_values($births);。

— Nigel Ren

3

我以O(n+m)[在最坏的情况下，最好的情况下O(n)] 的内存需求解决了这个问题

并且，时间复杂度为O(n logn)。

这里n & m是birthsand deaths数组的长度。

我不知道PHP或JavaScript。我已经用Java实现了它，逻辑很简单。但是我相信我的想法也可以用这些语言来实现。

技术细节：

我使用Java TreeMap结构来存储出生和死亡记录。

TreeMap插入排序后的数据（基于键）作为（键，值）对，此处的键是年份，值是出生和死亡的累积总和（负数）。

我们不需要插入在最高出生年份之后发生的死亡值。

在TreeMap中使用出生和死亡记录进行填充后，所有累积的总和都将更新，并随着其前进存储具有年份的最大人口。

样本输入和输出：1

Births: [1909, 1919, 1904, 1911, 1908, 1908, 1903, 1901, 1914, 1911, 1900, 1919, 1900, 1908, 1906]

Deaths: [1910, 1911, 1912, 1911, 1914, 1914, 1913, 1915, 1914, 1915]

Year counts Births: {1900=2, 1901=1, 1903=1, 1904=1, 1906=1, 1908=3, 1909=1, 1911=2, 1914=1, 1919=2}

Year counts Birth-Deaths combined: {1900=2, 1901=1, 1903=1, 1904=1, 1906=1, 1908=3, 1909=1, 1910=-1, 1911=0, 1912=-1, 1913=-1, 1914=-2, 1915=-2, 1919=2}

Yearwise population: {1900=2, 1901=3, 1903=4, 1904=5, 1906=6, 1908=9, 1909=10, 1910=9, 1911=9, 1912=8, 1913=7, 1914=5, 1915=3, 1919=5}

maxPopulation: 10
yearOfMaxPopulation: 1909

样本输入和输出：2

Births: [1906, 1901, 1911, 1902, 1905, 1911, 1902, 1905, 1910, 1912, 1900, 1900, 1904, 1913, 1904]

Deaths: [1917, 1908, 1918, 1915, 1907, 1907, 1917, 1917, 1912, 1913, 1905, 1914]

Year counts Births: {1900=2, 1901=1, 1902=2, 1904=2, 1905=2, 1906=1, 1910=1, 1911=2, 1912=1, 1913=1}

Year counts Birth-Deaths combined: {1900=2, 1901=1, 1902=2, 1904=2, 1905=1, 1906=1, 1907=-2, 1908=-1, 1910=1, 1911=2, 1912=0, 1913=0}

Yearwise population: {1900=2, 1901=3, 1902=5, 1904=7, 1905=8, 1906=9, 1907=7, 1908=6, 1910=7, 1911=9, 1912=9, 1913=9}

maxPopulation: 9
yearOfMaxPopulation: 1906

在这里，死亡（1914 & later）是在上一个出生年份之后发生的1913，根本没有计算在内，从而避免了不必要的计算。

对于全部10 million数据（包括出生和死亡）及以上1000 years range，该程序即将3 sec.完成。

如果用相同大小的数据100 years range，就花了1.3 sec。

所有输入都是随机抽取的。

— User_67128
source

1

$births = [1984, 1981, 1984, 1991, 1996];
$deaths = [1991, 1984];
$years = array_unique(array_merge($births, $deaths));
sort($years);

$increaseByYear = array_count_values($births);
$decreaseByYear = array_count_values($deaths);
$populationByYear = array();

foreach ($years as $year) {
    $increase = $increaseByYear[$year] ?? 0;
    $decrease = $decreaseByYear[$year] ?? 0;
    $previousPopulationTally = end($populationByYear);
    $populationByYear[$year] = $previousPopulationTally + $increase - $decrease;
}

$maxPopulation = max($populationByYear);
$maxPopulationYears = array_keys($populationByYear, $maxPopulation);

$maxPopulationByYear = array_fill_keys($maxPopulationYears, $maxPopulation);
print_r($maxPopulationByYear);

这将说明可能存在并列年份，以及某人的死亡年份与某人的出生不对应。

— 克姆恩克尔
source

该答案未尝试提供OP要求的学术上的Big O解释。

— mickmackusa

0

明智的存储currentPopulation和currentYear计算。从对数组$births和$deaths数组进行排序开始是一个很好的主意，因为气泡排序并不是一项繁重的任务，但可以减少一些麻烦：

<?php

$births = [1997, 1999, 2000];
$deaths = [2000, 2001, 2001];

function highestPopulationYear(array $births, array $deaths): Int {

    // sort takes time, but is neccesary for futher optimizations
    sort($births);
    sort($deaths);

    // first death year is a first year where population might decrase 
    // sorfar max population
    $currentYearComputing = $deaths[0];

    // year before first death has potential of having the biggest population
    $maxY = $currentYearComputing-1;

    // calculating population at the begining of the year of first death, start maxPopulation
    $population = $maxPop = count(array_splice($births, 0, array_search($deaths[0], $births)));

    // instead of every time empty checks: `while(!empty($deaths) || !empty($births))`
    // we can control a target time. It reserves a memory, but this slot is decreased
    // every iteration.
    $iterations = count($deaths) + count($births);

    while($iterations > 0) {
        while(current($births) === $currentYearComputing) {
            $population++;
            $iterations--;
            array_shift($births); // decreasing memory usage
        }

        while(current($deaths) === $currentYearComputing) {
            $population--;
            $iterations--;
            array_shift($deaths); // decreasing memory usage
        }

        if ($population > $maxPop) {
            $maxPop = $population;
            $maxY = $currentYearComputing;
        }

        // In $iterations we have a sum of birth/death events left. Assuming all 
        // are births, if this number added to currentPopulation will never exceed
        // current maxPoint, we can break the loop and save some time at cost of
        // some memory.
        if ($maxPop >= ($population+$iterations)) {
            break;
        }

        $currentYearComputing++;
    }

    return $maxY;
}

echo highestPopulationYear($births, $deaths);

不太热衷于跳入Big O的事情，就留给您了。

同样，如果您重新发现currentYearComputing每个循环，则可以将循环更改为if语句，然后只剩下一个循环。

    while($iterations > 0) {

        $changed = false;

        if(current($births) === $currentYearComputing) {
            // ...
            $changed = array_shift($births); // decreasing memory usage
        }

        if(current($deaths) === $currentYearComputing) {
            // ...
            $changed = array_shift($deaths); // decreasing memory usage
        }

        if ($changed === false) {
            $currentYearComputing++;
            continue;
        }

— 耶戈
source

数组移位是内存的好选择，但不是性能的好选择，请查看此cmljnelson.blog/2018/10/16/phps-array_shift-performance

— Emiliano

您始终可以对降序进行排序，以减量代替增量，并以pop代替移位。

— yergo

0

我对此解决方案非常满意，复杂度Big O为n + m

<?php
function getHighestPopulation($births, $deaths){
    $max = [];
    $currentMax = 0;
    $tmpArray = [];

    foreach($deaths as $key => $death){
        if(!isset($tmpArray[$death])){
            $tmpArray[$death] = 0;    
        }
        $tmpArray[$death]--;
    }
    foreach($births as $k => $birth){
        if(!isset($tmpArray[$birth])){
            $tmpArray[$birth] = 0;
        }
        $tmpArray[$birth]++;
        if($tmpArray[$birth] > $currentMax){
            $max = [$birth];
            $currentMax = $tmpArray[$birth];
        } else if ($tmpArray[$birth] == $currentMax) {
            $max[] = $birth;
        }
    }

    return [$currentMax, $max];
}

$births = [1997, 1997, 1997, 1998, 1999];
$deaths = [1998, 1999];

print_r (getHighestPopulation($births, $deaths));
?>

— 埃米利亚诺
source

不$tmpArray--应该$tmpArray[$death]--吗？还请进行测试$births=[1997,1997,1998]; $deaths=[];-它是否会返回1998？

— Paul Spiegel

是的，你是对的。

— 艾米利亚诺

该代码不仅在复杂的边缘情况下失败，而且甚至在给定输入数组的最简单情况下也失败，$births = [3,1,2,1,3,3,2]并且$deaths = [2,3,2,3,3,3]我希望以2最高的人口年份返回，但是您的代码会返回1。实际上，您的代码未能通过我的15个单元测试中的9个。我不仅不能接受这个最有效的答案，而且我什至不能接受它是一个有效的答案，因为它根本不起作用。

— 谢里夫

您没有仔细阅读问题，因此无法提供一个很好的答案。您在这里做出一个假设，即我告诉过您不要做（对数组进行排序）。因此，请删除关于我如何将悬赏奖励给无效回答的问题中的冒犯性评论，这在某种程度上是“ 解决方法 ”。

— 谢里夫

0

解决问题的最简单明了的方法之一。

$births = [1909, 1919, 1904, 1911, 1908, 1908, 1903, 1901, 1914, 1911, 1900, 1919, 1900, 1908, 1906];
$deaths = [1910, 1911, 1912, 1911, 1914, 1914, 1913, 1915, 1914, 1915];

/* for generating 1 million records

for($i=1;$i<=1000000;$i++) {
    $births[] = rand(1900, 2020);
    $deaths[] = rand(1900, 2020);
}
*/

function highestPopulationYear(Array $births, Array $deaths): Int {
    $start_time = microtime(true); 
    $population = array_count_values($births);
    $deaths = array_count_values($deaths);

    foreach ($deaths as $year => $death) {
        $population[$year] = ($population[$year] ?? 0) - $death;
    }
    ksort($population, SORT_NUMERIC);
    $cumulativeSum = $maxPopulation = $maxYear = 0;
    foreach ($population as $year => &$number) {
        $cumulativeSum += $number;
        if($maxPopulation < $cumulativeSum) {
            $maxPopulation = $cumulativeSum;
            $maxYear = $year;
        }
    }
    print " Execution time of function = ".((microtime(true) - $start_time)*1000)." milliseconds"; 
    return $maxYear;
}

print highestPopulationYear($births, $deaths);

输出：

复杂度：

O(m + log(n))

— 罗纳克·杜特
source

100万条记录的执行时间只是29.64 milliseconds

— Ronak Dhoot

如问题中所述，我不是在进行运行时优化，但应注意，此处的Big O计算略有偏差。另外，您的代码也有些破损。在许多极端情况下，它都会失败。

— 谢里夫

查找人口最多的年份（最有效的解决方案）

假设条件

要求