Rails：什么是验证链接（URL）的好方法？

125

我想知道如何最好地验证Rails中的URL。我当时在考虑使用正则表达式，但不确定这是否是最佳实践。

而且，如果我要使用正则表达式，有人可以向我推荐一个吗？我还是Regex的新手。

— 杰伊
source

相关：stackoverflow.com/questions/1805761/...

— 乔恩·施耐德

151

验证URL是一项棘手的工作。这也是一个非常广泛的要求。

您到底想做什么？您要验证URL的格式，存在性还是什么？根据您想做什么，有几种可能性。

正则表达式可以验证URL的格式。但是，即使是复杂的正则表达式也无法确保您正在处理有效的URL。

例如，如果采用简单的正则表达式，则可能会拒绝以下主机

http://invalid##host.com

但它将允许

http://invalid-host.foo

如果您考虑现有的TLD，则它是有效的主机，但不是有效的域。的确，如果您要验证主机名而不是域，则该解决方案将起作用，因为以下主机名是有效的主机名

http://host.foo

以及以下一个

http://localhost

现在，让我给您一些解决方案。

如果要验证域，则需要忽略正则表达式。目前最好的解决方案是公共后缀列表，该列表由Mozilla维护。我创建了一个Ruby库来根据Public Suffix List解析和验证域，它称为PublicSuffix。

如果要验证URI / URL的格式，则可能要使用正则表达式。而不是搜索一个，而是使用内置的Ruby URI.parse方法。

require 'uri'

def valid_url?(uri)
  uri = URI.parse(uri) && !uri.host.nil?
rescue URI::InvalidURIError
  false
end

您甚至可以决定使其更具限制性。例如，如果您希望URL为HTTP / HTTPS URL，则可以使验证更加准确。

require 'uri'

def valid_url?(url)
  uri = URI.parse(url)
  uri.is_a?(URI::HTTP) && !uri.host.nil?
rescue URI::InvalidURIError
  false
end

当然，您可以对此方法进行大量改进，包括检查路径或方案。

最后但并非最不重要的一点是，您还可以将此代码打包到验证器中：

class HttpUrlValidator < ActiveModel::EachValidator

  def self.compliant?(value)
    uri = URI.parse(value)
    uri.is_a?(URI::HTTP) && !uri.host.nil?
  rescue URI::InvalidURIError
    false
  end

  def validate_each(record, attribute, value)
    unless value.present? && self.class.compliant?(value)
      record.errors.add(attribute, "is not a valid HTTP URL")
    end
  end

end

# in the model
validates :example_attribute, http_url: true

— 西蒙妮·卡莱蒂（Simone Carletti）
source

1

请注意，该课程URI::HTTPS适用于https uris（例如：URI.parse("https://yo.com").class => URI::HTTPS

— tee

12

URI::HTTPS继承自URI:HTTP，这就是我使用的原因kind_of?。

— 西蒙妮·卡列蒂

1

迄今为止，最完整的解决方案可以安全地验证URL。

— Fabrizio Regini 2014年

4

URI.parse('http://invalid-host.foo')返回true，因为URI是有效的URL。另请注意，这.foo是一个有效的TLD。iana.org/domains/root/db/foo.html

— Simone Carletti

1

@jmccartie请阅读全文。如果您关心该方案，则应使用最终代码，其中还包括类型检查，而不仅仅是该行。您在帖子结尾之前停止了阅读。

— Simone Carletti 2015年

101

我在模型中使用了一根衬垫：

validates :url, format: URI::regexp(%w[http https])

我认为它足够好并且易于使用。此外，它在理论上应该等效于Simone的方法，因为它在内部使用了非常相同的正则表达式。

— Matteo Collina
source

17

不幸的是'http://'匹配上述模式。参见：URI::regexp(%w(http https)) =~ 'http://'

— David J.

15

同样的网址http:fake也将有效。

— nathanvda 2012年

54

遵循Simone的想法，您可以轻松创建自己的验证器。

class UrlValidator < ActiveModel::EachValidator
  def validate_each(record, attribute, value)
    return if value.blank?
    begin
      uri = URI.parse(value)
      resp = uri.kind_of?(URI::HTTP)
    rescue URI::InvalidURIError
      resp = false
    end
    unless resp == true
      record.errors[attribute] << (options[:message] || "is not an url")
    end
  end
end

然后使用

validates :url, :presence => true, :url => true

在您的模型中。

— 耶尔芬诺
source

1

我应该在哪里上这节课？在初始化器中？

— deb 2012年

3

我引用@gbc的话：“如果将自定义验证器放在app / validators中，它们将自动加载而无需更改config / application.rb文件。” （stackoverflow.com/a/6610270/839847）。请注意，下面来自Stefan Pettersson的回答表明，他也将类似的文件保存在“ app / validators”中。

— bergie3000

4

这只会检查url是否以http：//或https：//开头，这不是正确的URL验证

— maggix 2013年

1

如果您可以负担得起URL，则结束：class OptionalUrlValidator <UrlValidator def validate_each（record，attribute，value）如果value.blank是否返回true？返回超级终点

— 肮脏的亨利

1

这不是一个很好的验证：URI("http:").kind_of?(URI::HTTP) #=> true

— smathy

28

还有validate_url gem（这只是Addressable::URI.parse解决方案的一个不错的包装）。

只需添加

gem 'validate_url'

到Gemfile，然后在模型中

validates :click_through_url, url: true

— 多尔琴科
source

@ЕвгенийМасленков可能也一样，因为根据规范它是有效的，但是您可能要检查github.com/sporkmonger/addressable/issues。同样在一般情况下，我们发现没有人遵循该标准，而是使用简单的格式验证。

— dolzenko 2014年

13

这个问题已经回答了，但是到底我提出了我正在使用的解决方案。

regexp与我遇到的所有网址都可以正常工作。setter方法要注意是否未提及协议（假设http：//）。

最后，我们尝试获取页面。也许我应该接受重定向，而不仅仅是HTTP 200 OK。

# app/models/my_model.rb
validates :website, :allow_blank => true, :uri => { :format => /(^$)|(^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(([0-9]{1,5})?\/.*)?$)/ix }

def website= url_str
  unless url_str.blank?
    unless url_str.split(':')[0] == 'http' || url_str.split(':')[0] == 'https'
        url_str = "http://" + url_str
    end
  end  
  write_attribute :website, url_str
end

和...

# app/validators/uri_vaidator.rb
require 'net/http'

# Thanks Ilya! http://www.igvita.com/2006/09/07/validating-url-in-ruby-on-rails/
# Original credits: http://blog.inquirylabs.com/2006/04/13/simple-uri-validation/
# HTTP Codes: http://www.ruby-doc.org/stdlib/libdoc/net/http/rdoc/classes/Net/HTTPResponse.html

class UriValidator < ActiveModel::EachValidator
  def validate_each(object, attribute, value)
    raise(ArgumentError, "A regular expression must be supplied as the :format option of the options hash") unless options[:format].nil? or options[:format].is_a?(Regexp)
    configuration = { :message => I18n.t('errors.events.invalid_url'), :format => URI::regexp(%w(http https)) }
    configuration.update(options)

    if value =~ configuration[:format]
      begin # check header response
        case Net::HTTP.get_response(URI.parse(value))
          when Net::HTTPSuccess then true
          else object.errors.add(attribute, configuration[:message]) and false
        end
      rescue # Recover on DNS failures..
        object.errors.add(attribute, configuration[:message]) and false
      end
    else
      object.errors.add(attribute, configuration[:message]) and false
    end
  end
end

— 斯蒂芬·佩特森
source

真整洁！感谢您的投入，通常有很多解决问题的方法；人们分享自己的作品真是太好了。

— 周杰伦

6

只是想指出，根据Rails安全指南，您应该在该正则表达式中使用\ A和\ z而不是$ ^

— Jared

1

我喜欢。快速建议通过将正则表达式移到验证器中来使代码变干，正如我想象的那样，您希望它在各个模型之间保持一致。奖励：允许您将第一行放在validate_each下。

— Paul Pettengill 2013年

如果url需要很长时间并且超时怎么办？显示超时错误消息或无法打开页面的最佳选择是什么？

— user588324 2014年

这将永远不会通过安全审核，您正在使服务器戳一个任意的URL

— Mauricio

12

您也可以尝试valid_url gem，它允许没有方案的URL，检查域区域和ip-hostnames。

将其添加到您的Gemfile中：

gem 'valid_url'

然后在模型中：

class WebSite < ActiveRecord::Base
  validates :url, :url => true
end

— 罗曼·拉洛维兹
source

这非常好，特别是没有方案的URL，这是URI类令人惊讶地涉及的。

— Paul Pettengill 2015年

我对这个宝石能够浏览基于IP的URL并检测虚假URL的能力感到惊讶。谢谢！

— 绿野仙踪

10

只是我的2美分：

before_validation :format_website
validate :website_validator

private

def format_website
  self.website = "http://#{self.website}" unless self.website[/^https?/]
end

def website_validator
  errors[:website] << I18n.t("activerecord.errors.messages.invalid") unless website_valid?
end

def website_valid?
  !!website.match(/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-=\?]*)*\/?$/)
end

编辑：更改正则表达式以匹配参数url。

— 拉费伯
source

1

感谢您的投入，总是很高兴看到不同的解决方案

— 周杰伦

顺便说一句，您的正则表达式将拒绝带有查询字符串的有效URL，例如http://test.com/fdsfsdf?a=b

— MikDiet 2015年

2

我们将此代码投入生产，并在.match regex行的无限循环中不断超时。不知道为什么，请谨慎对待某些危险情况，并希望听听其他人对为什么会发生这种情况的想法。

— toobulkeh 2015年

10

对我有用的解决方案是：

validates_format_of :url, :with => /\A(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w\.-]*)*\/?\Z/i

我确实尝试使用您附加的一些示例，但是我支持url如下：

注意使用A和Z，因为如果使用^和$，则会从Rails验证程序看到此警告安全性。

 Valid ones:
 'www.crowdint.com'
 'crowdint.com'
 'http://crowdint.com'
 'http://www.crowdint.com'

 Invalid ones:
  'http://www.crowdint. com'
  'http://fake'
  'http:fake'

— Heriberto Perez
source

1

尝试使用"https://portal.example.com/portal/#"。在Ruby 2.1.6中，评估挂起。

— 老职业

您说得对，似乎在某些情况下此正则表达式需要永远解决:(

— heriberto perez

1

显然，没有一个涵盖所有情况的正则表达式，这就是为什么我只使用简单的验证来结束的原因：验证：url，format：{with：URI.regexp}，如果：Proc.new {| a | a.url.present？}

— heriberto perez

5

我最近遇到了同样的问题（我需要在Rails应用程序中验证URL），但是我不得不应付Unicode URL的其他要求（例如http://кц.рф）。

我研究了几种解决方案，并发现了以下问题：

第一个也是最建议的事情是使用 URI.parse。查看Simone Carletti的答案以获取详细信息。可以，但是不适用于unicode网址。
我看到的第二种方法是Ilya Grigorik的方法：http ://www.igvita.com/2006/09/07/validating-url-in-ruby-on-rails/基本上，他尝试向url; 如果有效，那是有效的...
我发现的第三个方法（也是我更喜欢的一种）是一种类似于URI.parse但使用addressablegem而不是URIstdlib的方法。此处详细介绍了这种方法：http : //rawsyntax.com/blog/url-validation-in-rails-3-and-ruby-in-general/

— 塞弗林
source

是的，但从Addressable的角度来看Addressable::URI.parse('http:///').scheme # => "http"还是

Addressable::URI.parse('Съешь [же] ещё этих мягких французских булок да выпей чаю')

完全可以的：(

— smileart

4

这是David James发布的验证器的更新版本。它已经由本杰明·弗莱舍（Benjamin Fleischer）出版。同时，我推送了一个更新的fork，可以在这里找到。

require 'addressable/uri'

# Source: http://gist.github.com/bf4/5320847
# Accepts options[:message] and options[:allowed_protocols]
# spec/validators/uri_validator_spec.rb
class UriValidator < ActiveModel::EachValidator

  def validate_each(record, attribute, value)
    uri = parse_uri(value)
    if !uri
      record.errors[attribute] << generic_failure_message
    elsif !allowed_protocols.include?(uri.scheme)
      record.errors[attribute] << "must begin with #{allowed_protocols_humanized}"
    end
  end

private

  def generic_failure_message
    options[:message] || "is an invalid URL"
  end

  def allowed_protocols_humanized
    allowed_protocols.to_sentence(:two_words_connector => ' or ')
  end

  def allowed_protocols
    @allowed_protocols ||= [(options[:allowed_protocols] || ['http', 'https'])].flatten
  end

  def parse_uri(value)
    uri = Addressable::URI.parse(value)
    uri.scheme && uri.host && uri
  rescue URI::InvalidURIError, Addressable::URI::InvalidURIError, TypeError
  end

end

...

require 'spec_helper'

# Source: http://gist.github.com/bf4/5320847
# spec/validators/uri_validator_spec.rb
describe UriValidator do
  subject do
    Class.new do
      include ActiveModel::Validations
      attr_accessor :url
      validates :url, uri: true
    end.new
  end

  it "should be valid for a valid http url" do
    subject.url = 'http://www.google.com'
    subject.valid?
    subject.errors.full_messages.should == []
  end

  ['http://google', 'http://.com', 'http://ftp://ftp.google.com', 'http://ssh://google.com'].each do |invalid_url|
    it "#{invalid_url.inspect} is a invalid http url" do
      subject.url = invalid_url
      subject.valid?
      subject.errors.full_messages.should == []
    end
  end

  ['http:/www.google.com','<>hi'].each do |invalid_url|
    it "#{invalid_url.inspect} is an invalid url" do
      subject.url = invalid_url
      subject.valid?
      subject.errors.should have_key(:url)
      subject.errors[:url].should include("is an invalid URL")
    end
  end

  ['www.google.com','google.com'].each do |invalid_url|
    it "#{invalid_url.inspect} is an invalid url" do
      subject.url = invalid_url
      subject.valid?
      subject.errors.should have_key(:url)
      subject.errors[:url].should include("is an invalid URL")
    end
  end

  ['ftp://ftp.google.com','ssh://google.com'].each do |invalid_url|
    it "#{invalid_url.inspect} is an invalid url" do
      subject.url = invalid_url
      subject.valid?
      subject.errors.should have_key(:url)
      subject.errors[:url].should include("must begin with http or https")
    end
  end
end

请注意，仍然有一些奇怪的HTTP URI被解析为有效地址。

http://google  
http://.com  
http://ftp://ftp.google.com  
http://ssh://google.com

这是涵盖示例的addressable宝石问题。

— 京东
source

3

我在上面的lafeber解决方案上使用了一些细微的变化。它不允许主机名中的连续点（例如www.many...dots.com）：

%r"\A(https?://)?[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]{2,6}(/.*)?\Z"i

URI.parse似乎要求使用方案前缀，在某些情况下，这不是您可能想要的前缀（例如，如果要允许用户以诸如这样的形式快速拼写网址twitter.com/username）

— 佛朗哥
source

2

我一直在使用“ activevalidators” gem，而且效果很好（不仅用于URL验证）

你可以在这里找到

全部都记录在案，但是基本上，一旦添加了gem，您就需要在初始化器中添加以下几行：/config/environments/initializers/active_validators_activation.rb

# Activate all the validators
ActiveValidators.activate(:all)

（注意：如果只想验证特定类型的值，则可以用：url或：what替换：all）

然后回到模型中

class Url < ActiveRecord::Base
   validates :url, :presence => true, :url => true
end

现在重新启动服务器，应该是这样

— Arnaud Bouchot
source

2

如果您想要简单的验证和自定义错误消息：

  validates :some_field_expecting_url_value,
            format: {
              with: URI.regexp(%w[http https]),
              message: 'is not a valid URL'
            }

— 卡列布
source

1

您可以使用以下方法验证多个网址：

validates_format_of [:field1, :field2], with: URI.regexp(['http', 'https']), allow_nil: true

— 达米安·罗奇（Damien Roche）
source

1

如果没有该方案，您将如何处理URL（例如www.bar.com/foo）？

— 克雷格2014年

1

https://github.com/perfectline/validates_url是一个不错的简单宝石，它将为您做几乎所有的事情

— 斯图尔恰尼
source

1

最近，我遇到了同样的问题，并且找到了解决有效网址的方法。

validates_format_of :url, :with => URI::regexp(%w(http https))
validate :validate_url
def validate_url

  unless self.url.blank?

    begin

      source = URI.parse(self.url)

      resp = Net::HTTP.get_response(source)

    rescue URI::InvalidURIError

      errors.add(:url,'is Invalid')

    rescue SocketError 

      errors.add(:url,'is Invalid')

    end



  end

validate_url方法的第一部分足以验证url格式。第二部分将通过发送请求来确保URL存在。

— 迪尔纳瓦兹
source

如果网址指向的资源非常大（例如，多个GB），该怎么办？

— 乔恩·施耐德

@JonSchneider可以使用http头请求（如此处）代替get。

— wvengen

1

我喜欢在URI模块上添加有效的猴子补丁吗？方法

内 config/initializers/uri.rb

module URI
  def self.valid?(url)
    uri = URI.parse(url)
    uri.is_a?(URI::HTTP) && !uri.host.nil?
  rescue URI::InvalidURIError
    false
  end
end

— 布莱尔·安德森（Blair Anderson）
source

0

作为一个模块

module UrlValidator
  extend ActiveSupport::Concern
  included do
    validates :url, presence: true, uniqueness: true
    validate :url_format
  end

  def url_format
    begin
      errors.add(:url, "Invalid url") unless URI(self.url).is_a?(URI::HTTP)
    rescue URI::InvalidURIError
      errors.add(:url, "Invalid url")
    end
  end
end

然后include UrlValidator在您要验证网址的任何模型中使用。仅包括选项。

— MCB
source

0

URL验证不能简单地通过使用正则表达式来处理，因为网站的数量在不断增长，新的域名命名方案也在不断涌现。

就我而言，我只是编写了一个自定义验证器来检查是否成功响应。

class UrlValidator < ActiveModel::Validator
  def validate(record)
    begin
      url = URI.parse(record.path)
      response = Net::HTTP.get(url)
      true if response.is_a?(Net::HTTPSuccess)   
    rescue StandardError => error
      record.errors[:path] << 'Web address is invalid'
      false
    end  
  end
end

我正在使用验证path我的模型的属性record.path。我还通过使用将错误推送到相应的属性名称record.errors[:path]。

您可以简单地用任何属性名称替换它。

然后，我只需在模型中调用自定义验证器即可。

class Url < ApplicationRecord

  # validations
  validates_presence_of :path
  validates_with UrlValidator

end

— 诺曼·乌尔·雷曼（Noman Ur Rehman）
source

如果网址指向的资源非常大（例如，多个GB），该怎么办？

— 乔恩·施耐德

0

您可以为此使用正则表达式，对我来说，这很有效：

(^|[\s.:;?\-\]<\(])(ftp|https?:\/\/[-\w;\/?:@&=+$\|\_.!~*\|'()\[\]%#,]+[\w\/#](\(\))?)(?=$|[\s',\|\(\).:;?\-\[\]>\)])

— spirito_libero
source