如何在目录中查找包含UTF-8 BOM（字节顺序标记）的所有文件？

8

在Windows上，我需要在包含UTF-8 BOM（字节顺序标记）的目录中找到所有文件。哪个工具可以做到？

它可以是PowerShell脚本，某些文本编辑器的高级搜索功能或其他功能。

windows search utf-8

15

这是PowerShell脚本的示例。它在C:路径中查找前3个字节为的任何文件0xEF, 0xBB, 0xBF。

Function ContainsBOM
{   
    return $input | where {
        $contents = [System.IO.File]::ReadAllBytes($_.FullName)
        $_.Length -gt 2 -and $contents[0] -eq 0xEF -and $contents[1] -eq 0xBB -and $contents[2] -eq 0xBF }
}

get-childitem "C:\*.*" | where {!$_.PsIsContainer } | ContainsBOM

是否需要“ ReadAllBytes”？也许只读取前几个字节会更好？

有道理。这是仅读取前3个字节的更新版本。

Function ContainsBOM
{   
    return $input | where {
        $contents = new-object byte[] 3
        $stream = [System.IO.File]::OpenRead($_.FullName)
        $stream.Read($contents, 0, 3) | Out-Null
        $stream.Close()
        $contents[0] -eq 0xEF -and $contents[1] -eq 0xBB -and $contents[2] -eq 0xBF }
}

get-childitem "C:\*.*" | where {!$_.PsIsContainer -and $_.Length -gt 2 } | ContainsBOM

— vcsjones
source

1

凉。在我将其标记为答案之前，是否需要“ ReadAllBytes”？也许只读取前几个字节会更好？

— Borek Bernard '04年

@Borek请参见编辑。

— vcsjones 2012年

2

这挽救了我的一天！还了解到get-childitem -recurse还要处理子目录。

— diynevala 2015年

我想知道是否有办法使用上述脚本删除BOM表？

— tom_mai78101

2

附带说明一下，这是一个PowerShell脚本，可用于从源文件中剥离UTF-8 BOM字符：

$files=get-childitem -Path . -Include @("*.h","*.cpp") -Recurse
foreach ($f in $files)
{
(Get-Content $f.PSPath) | 
Foreach-Object {$_ -replace "\xEF\xBB\xBF", ""} | 
Set-Content $f.PSPath
}

— 斯科特·史密斯
source

我只是得到了许多文件，这些文件的不同之处仅在于有些文件具有BOM表，而有些文件没有。您的回答正是我需要全部清理干净的。谢谢！

— Tevya

1

如果您在具有受限特权的企业计算机（如我）上并且无法运行Powershell脚本，则可以使用带有PythonScript插件的便携式Notepad ++ 来执行以下任务：

import os;
import sys;
filePathSrc="C:\\Temp\\UTF8"
for root, dirs, files in os.walk(filePathSrc):
    for fn in files:
      if fn[-4:] != '.jar' and fn[-5:] != '.ear' and fn[-4:] != '.gif' and fn[-4:] != '.jpg' and fn[-5:] != '.jpeg' and fn[-4:] != '.xls' and fn[-4:] != '.GIF' and fn[-4:] != '.JPG' and fn[-5:] != '.JPEG' and fn[-4:] != '.XLS' and fn[-4:] != '.PNG' and fn[-4:] != '.png' and fn[-4:] != '.cab' and fn[-4:] != '.CAB' and fn[-4:] != '.ico':
        notepad.open(root + "\\" + fn)
        console.write(root + "\\" + fn + "\r\n")
        notepad.runMenuCommand("Encoding", "Convert to UTF-8 without BOM")
        notepad.save()
        notepad.close()

信用转到https://pw999.wordpress.com/2013/08/19/mass-convert-a-project-to-utf-8-using-notepad/

— 洪龙
source